Segment-based rate control of video encoder for live ABR streaming
Contact: Tarek Amara, Video Processing Engineer, amatarek@twitch.tv Yueshi Shen, Principal Research Engineer, yshen@twitch.tv
Adaptive Bitrate (ABR) streaming is becoming the most popular and successful technology to enable reliable delivery of live and on-demand video over public Internet. In ABR streaming, the video content is split into small segments of two to 10 seconds in length. Every video segment is encoded at multiple resolutions and bitrates (renditions) and saved in small media files at the web servers or CDN.
On the receiving side, the video player checks the user’s internet bandwidth and decides on the most appropriate rendition at which to download and play back in order to:
Maximize the video quality by downloading the highest bitrate and resolution renditions
Keep continuous playback interruption-free by making sure the bitrate of media file is below the viewer’s internet speed (see Figure 1)
In order to achieve these two goals, the video player needs to know the bitrate of segments of all renditions to decide which one to download every time. Therefore, all segments within a rendition are generally the same size (although the HLS spec permits exceeding the average size by 10%) and the player can be notified of this size when a user tunes in to any video and starts streaming (the master manifest file downloaded by the video player at the beginning of video streaming contains the bitrate information per rendition).
Encoding the video (and audio) at Constant Bitrate (CBR) mode and sending segments of equal durations are the easiest and most common ways to make sure the segment size is the same for every video rendition. However CBR encoding limits the capabilities of video encoding and doesn’t generate the best video quality. Variable bitrate (VBR) encoding mode, on the other hand, generates better and more stable quality, saves bits when not needed, and achieves a constant quality over time. However, VBR mode can generate segments of random sizes and will cause buffering issues as the player cannot predict the segment sizes of every rendition to decide on which one to download.
Capped VBR is a mode used for ABR streaming, where the bitrate is variable (VBR) but capped at a certain high limit (10% of target bitrate). This limits the size of every segment and enables the player to estimate the coming segments’ sizes. Capped VBR limits the VBR capabilities as it causes capping the bits spend on harder content and compromises video quality. In order for ABR streaming to take most advantage of VBR encoding mode, the video encoder needs to be ABR aware. This means it must run in full VBR mode but also guarantees that the sizes of all segments per rendition are constant and equal to the average target bitrate (see Figure 2).
H.264/AVC is the by far the most commonly used video encoding standard for OTT delivery. Most available encoders were designed for the broadcast market where CBR or Statmux modes are in demand. The same encoders are used for ABR and don’t take advantage of VBR encoding for OTT. The video team at Twitch is constantly working on optimizing the encoding capabilities to keep pushing better video quality and a better watching experience for our end user.
To get researchers and academics involved in our research and development, Twitch is proposing an open-to-all research project, part of ICME 2017 Grand Challenge. The project encourages researchers and developers to come up with different rate-control algorithms that are most suitable for ABR live streaming and running in VBR mode, i.e. removing encoding buffer related constraints and generating fixed/capped segment sizes while achieving the best video quality. The development is recommended to be based on the open-source video encoder x264, but any H.264/HEVC encoder code base is acceptable.
The detailed description and the requirements of the project can be found at ICME 2017 Grand Challenges. You can also read Twitch’s project proposal here.