Skip to main content
All CollectionsAnalytics and statistics
Analytics filtering techniques
Analytics filtering techniques

How Omny Studio filters and measures analytic metrics

Written by Del
Updated over a week ago

Omny Studio offers a range of analytics reports for publishers to understand what, when, where, and how their audio content is being downloaded and played. The analytics system measures content downloads, podcast RSS subscribers, and content consumption over a wide range of publishing endpoints.

Due to the nature of popular podcast players like Apple Podcasts, background caching, progressive downloads, and limited listener identifiers, podcast download server requests may not accurately reflect the number of unique people that have played the content, thus the data requires processing.

To ensure metrics can be consistently defined and measured across the industry, the IAB has released guidelines on how podcast analytics should be filtered and measured. The actual implementation of these guidelines will vary from provider to provider due to practical technical or design differences. This document outlines the various filtering techniques Omny Studio employs.

Note: Because we continuously refine our filtering parameters as well as experiment with methodologies to improve the accuracy of our metrics, this document may be amended from time to time to reflect any changes in our approach.

Download analytics filtering

Download analytics tracks downloads of any published audio files including but not limited to podcast players, embed players, and third-party apps and websites. The following filtering is applied to the log files we collect and parse from the various CDNs we support.

Ignore non-GET HTTP requests 

We do not count HTTP requests with a method other than “GET” when calculating downloads requests.

We do not consider these downloads because we observed some podcast players will use HTTP request method “HEAD” to download file metadata without downloading any audio content.

Ignore non-successful responses

We do not count HTTP requests that did not have a successful response. Specifically, we only count responses with a HTTP status code or 200 OK  or 206 Partial Content.

We do not consider HTTP requests where the response was redirected, not modified, or errored as downloads.

Ignore non-playback HTTP range requests

We do not count HTTP range requests with a range of 0-1, a range of 0 bytes or an invalid range. (We will count requests without a range specified)

We do not consider these downloads because we observed some podcast players will use a 0-1 request to check if the server supports range requests without downloading any audio content.

Ignore known bots and spiders

We do not count download requests with an application user-agent that is identified to be a known bot or spider application.

We do not consider these downloads because bots and spiders regularly download files for indexing purposes and do not correlate with people listening.

We utilize the open-source “UA-Parser” user-agent database enhanced with additional proprietary data to parse user-agents. This database is regularly updated to detect new podcast applications and bots as they are documented.

Ignore invalid or banned user-agents

We do not count download requests without user-agents or if it matches a list of user-agents we’ve identified to be intentionally or unintentionally problematic applications.

We do not consider these downloads because we observed some mobile players and applications generate an excessive number (e.g. 100s) of download requests that do not correlate with people listening.

Ignore cloud service provider or banned IPs

We do not count download requests from a list of IPs we’ve identified to be servers of third-party services caching content and cloud service providers.

We do not consider these downloads because third-party services are downloading files for caching and mirroring purposes and do not correlate with people listening.

Our database of cloud service provider IP ranges include:

  • Amazon AWS

  • Cloudflare

  • DigitalOcean

  • Fastly

  • Google Cloud

  • Microsoft Azure

  • OVH

  • Triton Digital

  • TuneIn

This database of IP ranges is regularly updated with the official lists published by the providers and IP ranges registered by the provider on the American Registry for Internet Numbers (ARIN).

Ignore duplicate downloads by unique sessions

We do not count duplicate download requests from an identifiable unique session (defined below) in a 24 hour UTC window (from UTC midnight to midnight). Multiple downloads inside the deduplication window will only be counted once.

Due to the limited listener identifiers available from podcast apps, we combine and hash the following data to identify unique sessions at best effort:

  • UTC date

  • IP address (IPv4 or IPv6* if available)

  • User-agent

  • Episode ID

* IPv6 address are truncated to the first 64 bytes

Ignore partial downloads with less than a minute of audio content

We do not count download requests where the amount of data transferred in an identifiable unique session (defined above) is less than a minute's worth of audio content.

We calculate the minimum data transferred threshold by multiplying the MP3 bitrate by 60 seconds plus the size of any metadata and ID3 headers (at the start of the file). For audio files shorter than 60 seconds, the threshold is the size of the entire file.

For each unique session, we combine the number of bytes successfully delivered from server to client across one or multiple requests. When the session's total bytes delivered is equal to or greater than the minimum threshold, it is counted as a download with the timestamp of session's earliest request.

Cached files on other platforms

Some syndicated platforms and third-party services (e.g. Google Play Podcast and Spotify) may cache files on their own platforms. Plays on these platforms will not register a download on our server. As per the IAB guidelines we recommended platforms do not cache episodes, and always fetch from the enclosure URL.

Where possible we will ingest and display these metrics separately to downloads on our analytics dashboards.

Third-party players, applications and platforms

The IAB podcast measurement guidelines recommend third-party podcast players

  1. Do not implement auto-play. This will result in a bad user experience for the user with audio they were not expecting to hear.

  2. Do not pre-load - unless the intent was clearly to play the podcast.

  3. For a full download, ask for the entire file in one go. For a progressive download, ask for the file in slices (byte range). This way a full download can be distinguished from a progressive download.

  4. Do not modify the enclosure URL when requesting media, don’t add extra parameters.

  5. Do not cache podcast episodes on your servers. Always download the latest episode from the enclosure URL for every app user wanting to listen.

  6. Use the GUID -- as opposed to episode URL, title, publication date, etc. -- to identify new episodes in the RSS feed that should be automatically downloaded to a user’s device. The GUID is designed to be persistent through changes to hosting environment, titles, etc.

  7. Employ an “automatic download unsubscribe” behaviour (e.g. - stop auto downloads after 5 episodes of non-listens).

  8. Do not automatically download all episodes (e.g. back catalog episodes) by default. This creates unnecessary drain on the publishers’ servers and consumes users’ bandwidth.

They also make the following recommendations about user agents structure of a podcast player, the user agent should

  1. Provide enough details in the user-agent header to allow it to be consistently differentiated from the user agent of other devices. Whenever possible, this should be applied to both RSS feeds and audio files.

  2. Avoid adding unnecessary information (such as injecting user or session IDs) to the user agent string, and in encoding practices.

  3. Platforms are recommended to submit their user agent header value to the IAB Spiders and Bots inclusion list so that it is not considered a bot and can be a signal used to determine the device information.

  4. If the app or platform does employ the use of bots to index content, it is recommended to specify a user-agent that is distinct from the application user-agent and includes the word “bot” to clearly identify its use case.

Consumption analytics filtering

Consumption analytics tracks playback behaviour of audio content in Omny Studio embed players and third-party players who has implemented our consumption analytics player API.

We utilize client-side tracking of player events such as play, pause and seek to generate behavioural reports such as how many people played, how long they played and which parts of the content they played.

Identifying playback sessions

We use a globally unique identifier (GUID) to identify unique playback sessions.

Repeated pausing and seeking inside the same content will not count as a new playback session. However reloading the player or changing content (in a list of content) will be considered a new session even if the user has already played the content before.

Ignore sessions shorter than 10 seconds

We ignore any playback sessions with a total duration of less than 10 seconds. 

We include non-consecutive sessions such as two segments of 0:00-0:05 and 0:30-0:36. This session will be counted since the total duration of all segments was 11 seconds.

We do not count sessions shorter than 10 seconds because we believe these short plays may be unintentional or accidental plays.

Did this answer your question?