How Video Works on the Web
How does video work on the web?
You probably interact with video online every day but do you really understand how it works? I thought I did. Until recently when I was implementing video messaging into a web app and I quickly realized how little I knew. A simple question, “What format should a video be in so any device can watch it?”, brought me down the rabbit hole that is video on the web. In this article I provide a high level overview of video topics. Specifically highlighting some vocabulary that was confusing and priming your brain to go deeper on any of these topics if you want.
We will cover:
- Anatomy of a video: Codecs and Container Formats
- Streaming vs Progressive Downloads
- Transcoding
- Browsers
Anatomy of a video: Codecs and Container Formats
The layout of a multimedia container.
If a video is an MP4 that means the container format is MP4. The container format decides how the data inside the file is organized. The container format does not indicate how the actual audio or video data is encoded or compressed. Examples of container formats are WebM , MP4 and Matroska.
History: One of the very first multimedia file formats was the Interchange File Format (IFF) developed by Electronic Arts in 1985. The format’s design was partly inspired by the format Apple’s Macintosh’s were using for their clipboard. You can check out the original IFF spec which is actually a pretty interesting read as far as technical documents go.
There are 3 things inside the container: metadata, video data and audio data. Metadata tells us a lot about what is
going on in the container. Here is the output from mediainfo test.mkv
for a video on my computer:
General
Complete name : test.mkv
Format : Matroska
Format version : Version 4
File size : 792 KiB
Writing application : Chrome
Writing library : Chrome
IsTruncated : Yes
FileExtension_Invalid : mkv mk3d mka mks
Video ID : 2
Format : AVC
Format/Info : Advanced Video Codec
Codec ID : V_MPEG4/ISO/AVC
Width : 640 pixels
Height : 480 pixels
Display aspect ratio : 4:3
Frame rate mode : Variable
Language : English
Default : Yes
Forced : No
Audio ID : 1
Format : Opus Codec
ID : A_OPUS
Channel(s) : 1 channel
Channel layout : C
Sampling rate : 48.0 kHz
Bit depth : 32 bits
Compression mode : Lossy
Delay relative to video : 59 ms
Language : English
Default : Yes
Forced : No
We can see that the container format is Matroska, and the video data is in Advanced Video Coding (AVC) format and the audio data is in Opus format. These video and audio formats are known as codecs. The codec (an amalgam of the words coder and decoder) is the algorithm that is used to encode and decode the media data. Examples of audio codecs are AAC and Opus. Examples of video codecs are AVC/H.264 , HEVC/H.265 and VP9. There are many other codecs out there, however, unless you are doing very specific codec work (like trying to improve Netflix’s encoding) then you can just stick with the widely used and supported ones. I will not attempt to describe the details of any codecs here as that is very far out of my wheelhouse, but the main things to understand are:
- Different container formats can hold different codecs
- Browsers can only play a subset of all codecs and formats
Streaming vs Progressive Downloads
When we use a simple container format like MP4 and point a video element at it the browser will begin a progressive download this means the browser will start downloading the video into memory from start to finish. If a user tries to seek to different spot in the video the browser will request that part of the file from the server and continue downloading from there. This method is memory intensive on the clients machine because the browser will attempt to hold the entire video in memory. This makes progressive download unsuitable for longer videos. To avoid buffering the video in memory what you want is a stream.
A streaming protocol, like HTTP Live Streaming (HLS), utilizes the same containers and codecs that you would find in a regular video file, but it will chop the data into bite size chunks. So instead of a single file your video is represented as a directory with a manifest file and the chunks of data. To play a stream the browser reads the manifest file to find the locations of the chunks, then begins requesting the data. The browser will play the data as soon as it is received and does not keep already played chunks in memory. Therfore, the memory impact on the client is the same for a long video as a short video.
An optimization that streaming protocols support is Adaptive Bitrate Streaming (ABR). With ABR, we create multiple different versions of those data chunks, each encoded at a different bitrate (lower bitrate means lower quality). The browser will request data chunks at the highest bitrate it’s internet connection can handle without having choppy video. If the browser is experiencing choppy video, it will request data at a lower bitrate to smooth out the video.
Transcoding
The process of taking a video from one format to another is known as transcoding. If you want to convert a user uploaded WebM video into an adaptive bitrate HLS stream you will need to transcode. Transcoding works by decoding the video to a raw (uncompressed) format then encoding in the desired format. The principle of garbage in, garbage out applies here, you can never transcode to a higher quality than what you started with. The standard tool for transcoding is ffmpeg which has a huge amount of options and can run pretty much anywhere. However, if you have a large amount of videos maybe you don’t want to deal with having to run ffmpeg as a service you instead could use a hosted 3rd party solution. AWS offers their MediaConvert service which hooks in nicely with S3 and CloudFront. There are also companies who solely do transcoding like Zencoder which could also be a good option.
Browsers
Browsers are picky about what streaming protocols, container formats and codecs they are willing to work with. This is really where you need to pay attention as a web developer. A useful resource is the Mozilla Developer Network ( MDN) media type and format guide, which has information on browser support.
This looks pretty bad to your users.
Note: Browsers implement the function canPlayType which takes one parameter, a string of a MIME type, and returns a string response which tells you if it can play the video. Due to the diverse nature of container formats and codecs, the browser will only give one of three responses: "" (empty string, meaning no the browser can’t play the video), “maybe” and “probably”.
To answer the original question, “What format should a video be in so any device can watch it?”, the best answer we can give is an MP4 container format with the codecs Advanced Audio Coding (AAC) for audio and AVC/H.264 for video. For streaming, the HLS protocol is supported by every major browser. If you want to go deeper into the rabbit role of formats and codecs, many of their specifications are open online. To learn more about best practices around using video on the web this article from Google’s Web Fundamentals series is great. Hopefully this article gave you a better understanding of the basics of video on the web.