Reading Notes for [All Abroad the Databus! – Linkedin’s Scalable Consistent Change Data Capture Platform]

Why do we need Databus? There is no one type of data management system that meets every needs. In most cases we will have a primary source-of-truth system and some other data systems. But we need to maintain the consistency between the primary system and other systems. There are two possible type of solutions: Application-driven … Read more

Reading Notes for [On Brewing Fresh Espresso: LinkedIn’s Distributed Data Serving Platform]

Why Espresso? RDBMS has some shortages and it costs a lot both in terms of licensing and hardware costs. Relational Database installation requires costly, specialized hardware and extensive caching to meet scale and latency requirements. Adding capacity requires a long planing cycle. Cannot do it with 100% uptime. Data model (Or schema) don’t readily map … Read more

Reading Notes for [Kafka, a Distributed Messaging System for Log Processing]

Why Kafka? Lots of “log” data generated every day, including user activities like login, page views, clicks, likes, and other queries machine metrics like CPU, memory usage. This is not only for offline analytics, but also very useful in online services. Usage may includes search relevance recommendation performance ad targeting and reporting security things. The traditional … Read more