Language

Language not available Language not available Language not available

This post is not available in your language. Here are some other options:

Product instrumentation best practices

May 7 2021 - By Thomas Debeauvais

This post covers a dozen best practices we’ve developed at Twitch on the design and engineering of product instrumentation via events. Better instrumentation leads to better analytics and better decisions for the whole company. While there are resources covering this topic, they tend to be scarce and introductory. Our data staff has accrued a lot of experience over the years, so we thought it’d be worth sharing our own design patterns and best practices.

General best practices

Send events from the backend. In most modern apps, front-end clients facing the end user, like web or mobile apps, send API requests to backend servers. Sending events from the backend is more reliable because the backend runs trusted code in a trusted environment. The frontend, on the other hand, can be tampered with, simulated by robots, and lose connection. Sending events from the backend also saves time: events need to be implemented only once for all clients hitting that API. Sometimes, however, sending events from the frontend is unavoidable. The table below lists some examples when it’s preferable to use the frontend or the backend.

BackendFront-end
Recommendations servedModals and screens displayed
Microservice response timeExperienced latency
Text of commentsClicks and hovering

On the front-end, forward backend values verbatim. If firing from the front-end, avoid translating or converting values passed by the backend. This drastically reduces the amount of coordination required between client teams. For example, user IDs and Twitch channel names are great to use verbatim in all events, and don’t need any translation table or conversion scheme like lower casing or removing special characters. When a creator changes their display name, all clients will seamlessly pass the new name.

Do not reinvent the wheel. Look at the existing data documentation, and ask fellow data analysts and engineers if an existing event fits your tracking needs. For example, if an event already exists for page loads, see if you can use it as-is, or at most add properties to it, but avoid adding a new one. This also highlights the importance of data governance and having a holistic data dictionary.

Send standard fields in all frontend client events. On our web platform for example, every front-end event passes the current page location and the user ID. This makes it easy to split by location on the site, or to join with a user dimension table and filter by country.

Future-proof and look outside your silo. If you foresee potential use cases for your events in the near future, or other products being able to leverage your events, design with those in mind. Renaming and retrofitting events and fields is painful and time-consuming. For example, if we launch a feature allowing viewers to search for any channel, an event like search_for_channel could be re-used in the future to search for games. The event could be simply called search, with a field search_content_type taking values “channel” or “game” or even “any”.

Descriptive and unambiguous names. Descriptive and concise event names are really worth spending time thinking about. For example, at Twitch, content is a vague and ambiguous field name. It could relate to the game being watched, the video bitrate, or the email subject of a marketing campaign. This again highlights the importance of data governance.

Use snake_case, not CamelCase, and avoid dashes. SQL ignores caps and requires escaping dashes in table names via double quotes.

Prefix event names with the product domain. Twitch frontends and backends fire hundreds of unique events for dozens of teams. Using the same prefix for events concerning the same product makes it easy to find related events when a data catalog is sorted alphabetically. 

Bad event nameBetter event name
video-playplayback_start
messagechat_send
tryCreatingClipclip_create_attempt

Click Through Rate

CTRs are probably the most common type of metrics. At Twitch for example, we compute CTRs for carousel recommendations, signups, and notifications. Although these CTRs rely on different events and cover different product areas, their formulas all consist of a numerator and a denominator.

Best practices

At Twitch, such events often include fields like carousel_rank, position_in_carousel, and recommender_id. Linkedin also passes the page location.

Example of URL parameters: Netflix on web. 

After clicking on the 3rd item of the 5th carousel, the URL to the movie page is this:

https://www.netflix.com/watch/80223779?trackId=12345678&tctx=5,3,d681a17d-c5bc-4830-84d6-f0c1e78a6d1e-166054377

Parameter tctx contains the carousel number=5, position in carousel=3, and a UUID of the previous page load. trackID might be the user_id.

Funnels, flows, and lifecycles

In a way, funnel tracking is a generalization of CTR tracking. Conceptually, these are a series of steps that need to be tied together. For example, an advertising funnel could rely on events for opportunity, request, impression, and click, all tied with the same UUID. The flowchart below details how these events could fit together.

Best practices

Intent vs completion

This is a special case of CTR tracking, where the front-end does not know which ID the backend will assign after the action has completed. For example, when uploading a video, the video ID is generated by the backend after the video has started uploading.

Best practices

In-app navigation and third-party referrals

Twitch has clients on multiple platforms, like web, mobile, and console. Navigation events and fields tend to vary slightly.

Best practices

Object lifecycle

This is about the regular lifecycle of complex objects like video collections or user accounts. Using events to track object lifecycle may seem redundant with production databases, but this redundancy can be useful. Moreover, database snapshots only happen at discrete points in time, whereas events enable reconstructing the database at any point in time.

Best practices

Example:

Long activity

By “long activity” we mean activity that takes place over minutes, hours, or even days.

Best practices

Short activity (~seconds)

Best practices

Parent relationship

This is useful when tracking N-to-1 relationships such as a comment tree.

Best practices

Collections

Collections can consist of sets, ordered lists, hash maps, and so on. Production databases often track creation, deletion, and other metadata about a collection, via fields like created_by and last_updated_at for example. If it’s possible to use snapshots of production databases to capture the information of interest, then it’s always better to use those. However, databases don’t always record all we need, for example when an item is added to or removed from a collection, and by whom. In these cases, we must use events.

To create and delete collections: see object lifecycle.

To add an item to a collection: fire event mycollection_add_myitem, with fields myitem_id, new_position, myset_id, and mycollection_result_list, a JSON array or comma-separated list.

For example: collection a7852cb2 has items 1a7fbcde and 2bc9d6ab. Adding item 3bc7db8c to it, in first position, triggers this event: 45678,1,’a7852cb2’,’3bc7db8c,1a7fbcde,2bc9d6ab’

To remove an item from a collection: fire mycollection_remove_myitem with fields myitem_id, old_pos, mycollection_id, mycollection_result_list.

Final thoughts

Instrumenting events in a consistent and reliable way can be challenging. We hope the best practices we shared in this article will be as useful to you as they were to us! And if this kind of work sounds interesting to you, have a look at our data engineer and data analyst open positions.

Thanks to Brian Eng and Nicholas Ngorok for reviewing this article.

In other news