Procella: YouTube's super-system for analytics data storage
Linear Digressions
Mon, July 6, 2020
Podchat Summary
In this podcast episode, the hosts delve into the technical details of Procella, a system built by YouTube to handle four different types of analytics use cases. They discuss the challenges of building a unified system to handle reporting and dashboarding, embedded statistics, monitoring, and ad hoc analysis. The hosts provide insights into the split between storage and compute, the split between real-time and batch data storage, and the architecture of the system. They also highlight the optimizations and performance metrics of Procella and provide a link to the white paper for further reading. Tune in to learn more about this fascinating system and how it is helping YouTube handle its analytics needs.
Original Show Notes
This is a re-release of an episode that originally ran in October 2019. If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that data) and went a different way: they built one analytics data system to serve them all. Procella, the system they built, is the topic of our episode today: by deconstructing the system, we dig into the four motivating uses of this system, the complexity they had to introduce to service all four uses simultaneously, and the impressive engineering that has to go into building something that “just works.”
Made with ☕️ in SF/SD.
© 2023 Spyglass Search, Inc.