Design YouTube
This chapter is about designing a video sharing platform such as youtube. Its solution can be applied to also eg designing Netflix, Hulu.
Step 1 - Understand the problem and establish design scope
- C: What features are important?
- I: Upload video + watch video
- C: What clients do we need to support?
- I: Mobile apps, web apps, smart TV
- C: How many DAUs do we have?
- I: 5mil
- C: Average time per day spend on YouTube?
- I: 30m
- C: Do we need to support international users?
- I: Yes
- C: What video resolutions do we need to support?
- I: Most of them
- C: Is encryption required?
- I: Yes
- C: File size requirement for videos?
- I: Max file size is 1GB
- C: Can we leverage existing cloud infra from Google, Amazon, Microsoft?
- I: Yes, building everything from scratch is not a good idea.
Features, we’ll focus on:
- Upload videos fast
- Smooth video streaming
- Ability to change video quality
- Low infrastructure cost
- High availability, scalability, reliability
- Supported clients - web, mobile, smart TV
Back of the envelope estimation
- Assume product has 5mil DAU
- Users watch 5 videos per day
- 10% of users upload 1 video per day
- Average video size is 300mb
- Daily storage cost needed - 5mil * 10% * 300mb = 150TB
- CDN Cost, assuming 0.02$ per GB - 5mil * 5 videos * 0.3GB * 0.02$ = USD 150k per day
Step 2 - Propose high-level design and get buy-in
As previously discussed, we won’t be building everything from scratch.
Why?
- In a system design interview, choosing the right technology is more important than explaining how the technology works.
- Building scalable blob storage over CDN is complex and costly. Even big tech don’t build everything from scratch. Netflix uses AWS and Facebook uses Akamai’s CDN.
Here’s our system design at a high-level:
- Client - you can watch youtube on web, mobile and TV.
- CDN - videos are stored in CDN.
- API Servers - Everything else, except video streaming goes through the API servers. Feed recommendation, generating video URL, updating metadata db and cache, user signup.
Let’s explore high-level design of video streaming and uploading.
Video uploading flow
- Users watch videos on a supported client
- Load balancer evenly distributes requests across API servers
- All user requests go through API servers, except video streaming
- Metadata DB - sharded and replicated to meet performance and availability requirements
- Metadata cache - for better performance, video metadata and user objects are cached
- A blob storage system is used to store the actual videos
- Transcoding/encoding servers - transform videos to various formats (eg MPEG, HLS, etc) which are suitable for different devices and bandwidth
- Transcoded storage stores result files from transcoding
- Videos are cached in CDN - clicking play streams the video from CDN
- Completion queue - stores events about video transcoding results
- Completion handler - a set of workers which pull event data from completion queue and update metadata cache and database
Let’s now explore the flow of uploading videos and video metadata. Metadata includes info about video URL, size, resolution, format, etc.
Here’s how the video uploading flow works:
- Videos are uploaded to original storage
- Transcoding servers fetch videos from storage and start transcoding
- Once transcoding is complete, two steps are executed in parallel:
- Transcoded videos are sent to transcoded storage and distributed to CDN
- Transcoding completion events are queued in completion queue, workers pick up the events and update metadata database & cache
- API servers inform user that uploading is complete
Here’s how the metadata update flow works:
- While file is being uploaded, user sends a request to update the video metadata - file name, size, format, etc.
- API servers update metadata database & cache
Video streaming flow
Whenever users watch videos on YouTube, they don’t download the whole video at once. Instead, they download a little and start watching it while downloading the rest. This is referred to as streaming. Stream is served from closest CDN server for lowest latency.
Some popular streaming protocols:
- MPEG-DASH - “Moving Picture Experts Group”-“Dynamic Adaptive Streaming over HTTP”
- Apple HLS - “HTTP Live Streaming”
- Microsoft Smooth Streaming
- Adobe HTTP Dynamic Streaming (HDS)
You don’t need to understand those protocols in detail. It is important to understand, though, that different streaming protocols support different video encodings and playback players.
We need to choose the right streaming protocol to support our use-case.
Step 3 - Design deep dive
In this part, we’ll deep dive into the video uploading and video streaming flows.
Video transcoding
Video transcoding is important for a few reasons:
- Raw video consumes a lot of storage space.
- Many browsers have constraints on the type of videos they can support. It is important to encode a video for compatibility reasons.
- To ensure good UX, you ought to serve HD videos to users with good network connection and lower-quality formats for the ones with slower connection.
- Network conditions can change, especially on mobile. It is important to be able to automatically switch video formats at runtime for smooth UX.
Most transcoding formats consist of two parts:
- Container - the basket which contains the video file. Recognized by the file extension, eg .avi, .mov, .mp4
- Codecs - Compression and decompression algorithms, which reduce video size while preserving quality. Most popular ones - H.264, VP9, HEVC.
Directed Acyclic Graph (DAG) model
Transcoding video is computationally expensive and time-consuming. In addition to that, different creators have different inputs - some provide thumbnails, others do not, some upload HD, others don’t.
In order to support video processing pipelines, dev customisations, high parallelism, we adopt a DAG model:
Some of the tasks applied on a video file:
- Ensure video has good quality and is not malformed
- Video is encoded to support different resolutions, codecs, bitrates, etc.
- Thumbnail is automatically added if a user doesn’t specify it.
- Watermark - image overlay on video if specified by creator
Video transcoding architecture
Preprocessor
The preprocessor’s responsibilities:
- Video splitting - video is split in group of pictures (GOP) alignment, ie arranged groups of chunks which can be played independently
- Cache - intermediary steps are stored in persistent storage in order to retry on failure.
- DAG generation - DAG is generated based on config files specified by programmers.
Example DAG configuration with two steps:
DAG Scheduler
DAG scheduler splits a DAG into stages of tasks and puts them in a task queue, managed by a resource manager:
In this example, a video is split into video, audio and metadata stages which are processed in parallel.
Resource manager
Resource manager is responsible for optimizing resource allocation.
- Task queue is a priority queue of tasks to be executed
- Worker queue is a queue of available workers and worker utilization info
- Running queue contains info about currently running tasks and which workers they’re assigned to
How it works:
- task scheduler gets highest-priority task from queue
- task scheduler gets optimal task worker to run the task
- task scheduler instructs worker to start working on the task
- task scheduler binds worker to task & puts task/worker info in running queue
- task scheduler removes the job from the running queue once the job is done
Task workers
The workers execute the tasks in the DAG. Different workers are responsible for different tasks and can be scaled independently.
Temporary storage
Multiple storage systems are used for different types of data. Eg temporary images/video/audio is put in blob storage. Metadata is put in an in-memory cache as data size is small.
Data is freed up once processing is complete.
Encoded video
Final output of the DAG. Example output - funny_720p.mp4
.
System Optimizations
Now it’s time to introduce some optimizations for speed, safety, cost-saving.
Speed optimization - parallelize video uploading
We can split video uploading into separate units via GOP alignment:
This enables fast resumable uploads if something goes wrong. Splitting the video file is done by the client.
Speed optimization - place upload centers close to users
This can be achieved by leveraging CDNs.
Speed optimization - parallelism everywhere
We can build a loosely coupled system and enable high parallelism.
Currently, components rely on inputs from previous components in order to produce outputs:
We can introduce message queues so that components can start doing their task independently of previous one once events are available:
Safety optimization - pre-signed upload URL
To avoid unauthorized users from uploading videos, we introduce pre-signed upload URLs:
How it works:
- client makes request to API server to fetch upload URL
- API servers generate the URL and return it to the client
- Client uploads the video using the URL
Safety optimization - protect your videos
To protect creators from having their original content stolen, we can introduce some safety options:
- Digital right management (DRM) systems - Apple FairPlay, Google Widevine, Microsoft PlayReady
- AES encryption - you can encrypt a video and configure an authorization policy. It is decrypted on playback.
- Visual watermarking - image overlay on top of video which contains your identifying information, eg company name.
Cost-saving optimization
CDN is expensive, as we’ve seen in our back of the envelope estimation.
We can piggyback on the fact that video streams follow a long-tail distribution - ie a few popular videos are accessed frequently, but everything else is not.
Hence, we can store popular videos in CDN and serve everything else from high capacity storage servers:
Other cost-saving optimizations:
- We might not need to store many encoded versions for less popular videos. Short videos can be encoded on-demand.
- Some videos are only popular in certain regions. We can avoid distributing them in all regions.
- Build your own CDN. Can make sense for large streaming companies like Netflix.
Error Handling
For a large-scale system, errors are unavoidable. To make a fault-tolerant system, we need to handle errors gracefully and recover from them.
There are two types of errors:
- Recoverable error - can be mitigated by retrying a few times. If retrying fails, a proper error code is returned to the client.
- Non-recoverable error - system stops running related tasks and returns proper error code to the client.
Other typical errors and their resolution:
- Upload error - retry a few times
- Split video error - entire video is passed to server if older clients don’t support GOP alignment.
- Transcoding error - retry
- Preprocessor error - regenerate DAG
- DAG scheduler error - retry scheduling
- Resource manager queue down - use a replica
- Task worker down - retry task on different worker
- API server down - they’re stateless so requests can be redirected to other servers
- Metadata db/cache server down - replicate data across multiple nodes
- Master is down - Promote one of the slaves to become master
- Slave is down - If slave goes down, you can use another slave for reads and bring up another slave instance
Step 4 - Wrap up
Additional talking points:
- Scaling the API layer - easy to scale horizontally as API layer is stateless
- Scale the database - replication and sharding
- Live streaming - our system is not designed for live streams, but it shares some similarities, eg uploading, encoding, streaming. Notable differences:
- Live streaming has higher latency requirements so it might demand a different streaming protocol
- Lower requirement for parallelism as small chunks of data are already processed in real time
- different error handling, as there is a timeout after which we need to stop retrying
- Video takedowns - videos that violate copyrights, pornography, any other illegal acts need to be removed either during upload flow or based on user flagging.