This post covers a dozen best practices we’ve developed at Twitch on the design and engineering of product instrumentation via events. Better instrumentation leads to better analytics and better decisions for the whole company. While there are resources covering this topic, they tend to be scarce and introductory. Our data staff has accrued a lot of experience over the years, so we thought it’d be worth sharing our own design patterns and best practices.
Send events from the backend. In most modern apps, front-end clients facing the end user, like web or mobile apps, send API requests to backend servers. Sending events from the backend is more reliable because the backend runs trusted code in a trusted environment. The frontend, on the other hand, can be tampered with, simulated by robots, and lose connection. Sending events from the backend also saves time: events need to be implemented only once for all clients hitting that API. Sometimes, however, sending events from the frontend is unavoidable. The table below lists some examples when it’s preferable to use the frontend or the backend.
|Recommendations served||Modals and screens displayed|
|Microservice response time||Experienced latency|
|Text of comments||Clicks and hovering|
On the front-end, forward backend values verbatim. If firing from the front-end, avoid translating or converting values passed by the backend. This drastically reduces the amount of coordination required between client teams. For example, user IDs and Twitch channel names are great to use verbatim in all events, and don’t need any translation table or conversion scheme like lower casing or removing special characters. When a creator changes their display name, all clients will seamlessly pass the new name.
Do not reinvent the wheel. Look at the existing data documentation, and ask fellow data analysts and engineers if an existing event fits your tracking needs. For example, if an event already exists for page loads, see if you can use it as-is, or at most add properties to it, but avoid adding a new one. This also highlights the importance of data governance and having a holistic data dictionary.
Send standard fields in all frontend client events. On our web platform for example, every front-end event passes the current page location and the user ID. This makes it easy to split by location on the site, or to join with a user dimension table and filter by country.
Future-proof and look outside your silo. If you foresee potential use cases for your events in the near future, or other products being able to leverage your events, design with those in mind. Renaming and retrofitting events and fields is painful and time-consuming. For example, if we launch a feature allowing viewers to search for any channel, an event like
search_for_channel could be re-used in the future to search for games. The event could be simply called
search, with a field
search_content_type taking values “channel” or “game” or even “any”.
Descriptive and unambiguous names. Descriptive and concise event names are really worth spending time thinking about. For example, at Twitch,
content is a vague and ambiguous field name. It could relate to the game being watched, the video bitrate, or the email subject of a marketing campaign. This again highlights the importance of data governance.
Use snake_case, not CamelCase, and avoid dashes. SQL ignores caps and requires escaping dashes in table names via double quotes.
Prefix event names with the product domain. Twitch frontends and backends fire hundreds of unique events for dozens of teams. Using the same prefix for events concerning the same product makes it easy to find related events when a data catalog is sorted alphabetically.
|Bad event name||Better event name|
CTRs are probably the most common type of metrics. At Twitch for example, we compute CTRs for carousel recommendations, signups, and notifications. Although these CTRs rely on different events and cover different product areas, their formulas all consist of a numerator and a denominator.
displayevent, and numerator from a
action_typetaking values “display” or “click”.
displayevent, and extract the numerator from URL parameters (see Netflix example below).
At Twitch, such events often include fields like
recommender_id. Linkedin also passes the page location.
Example of URL parameters: Netflix on web.
After clicking on the 3rd item of the 5th carousel, the URL to the movie page is this:
tctx contains the carousel number=5, position in carousel=3, and a UUID of the previous page load.
trackID might be the user_id.
In a way, funnel tracking is a generalization of CTR tracking. Conceptually, these are a series of steps that need to be tied together. For example, an advertising funnel could rely on events for opportunity, request, impression, and click, all tied with the same UUID. The flowchart below details how these events could fit together.
Each step should have a unique entity ID or a contextual UUID to join all steps together.
funnel_stepstoring the step name, or each step fires its own event.
This is a special case of CTR tracking, where the front-end does not know which ID the backend will assign after the action has completed. For example, when uploading a video, the video ID is generated by the backend after the video has started uploading.
Twitch has clients on multiple platforms, like web, mobile, and console. Navigation events and fields tend to vary slightly.
On web, the page load event should track the URL of the previous page, via the HTTP header. This enables Sankey diagrams of navigation paths in the app.
This is about the regular lifecycle of complex objects like video collections or user accounts. Using events to track object lifecycle may seem redundant with production databases, but this redundancy can be useful. Moreover, database snapshots only happen at discrete points in time, whereas events enable reconstructing the database at any point in time.
By “long activity” we mean activity that takes place over minutes, hours, or even days.
seconds_elapsedfield and sessionization via max(seconds_elapsed). However, it can be confusing to people not familiar with the data trying naively to count(*).
This is useful when tracking N-to-1 relationships such as a comment tree.
parent_idfield in the events tracking child creation or update.
Collections can consist of sets, ordered lists, hash maps, and so on. Production databases often track creation, deletion, and other metadata about a collection, via fields like
last_updated_at for example. If it’s possible to use snapshots of production databases to capture the information of interest, then it’s always better to use those. However, databases don’t always record all we need, for example when an item is added to or removed from a collection, and by whom. In these cases, we must use events.
To create and delete collections: see object lifecycle.
To add an item to a collection: fire event
mycollection_add_myitem, with fields
mycollection_result_list, a JSON array or comma-separated list.
For example: collection a7852cb2 has items 1a7fbcde and 2bc9d6ab. Adding item 3bc7db8c to it, in first position, triggers this event: 45678,1,’a7852cb2’,’3bc7db8c,1a7fbcde,2bc9d6ab’
To remove an item from a collection: fire
mycollection_remove_myitem with fields
Instrumenting events in a consistent and reliable way can be challenging. We hope the best practices we shared in this article will be as useful to you as they were to us! And if this kind of work sounds interesting to you, have a look at our data engineer and data analyst open positions.
Thanks to Brian Eng and Nicholas Ngorok for reviewing this article.