Mastering Kafka Streams and ksqlDB: Building Real-Time Data Systems by Example Mitch Seymour
Build stateless and stateful stream processing applications using Kafka Streams and ksqlDB
Perform advanced stateful operations, including windowed joins and aggregations
Understand how stateful processing works under the hood
Learn about ksqlDB’s data integration features, powered by Kafka Connect
Work with different types of collections in ksqlDB and perform push and pull queries
Deploy your Kafka Streams and ksqlDB applications to production
From the Preface
Who Should Read This Book
This book is for data engineers who want to learn how to build highly scalable stream processing applications for moving, enriching, and transforming large amounts of data in real time. These skills are often needed to support business intelligence initiatives, analytic pipelines, threat detection, event processing, and more. Data scientists and analysts who want to upgrade their skills by analyzing real-time data streams will also find value in this book, which is an exciting departure from the batch processing space that has typically dominated these fields. Prior experience with Apache Kafka is not required, though some familiarity with the Java programming language will make the Kafka Streams tutorials easier to follow.
For data engineers and data scientists, there’s never a shortage of technologies that are competing for our attention. Whether we’re perusing our favorite subreddits, scanning Hacker News, reading tech blogs, or weaving through hundreds of tables at a tech conference, there are so many things to look at that it can start to feel overwhelming.
But if we can find a quiet corner to just think for a minute, and let all of the buzz fade into the background, we can start to distinguish patterns from the noise. You see, we live in the age of explosive data growth, and many of these technologies were created to help us store and process data at scale. We’re told that these are modern solutions for modern problems, and we sit around discussing “big data” as if the idea is avant-garde, when really the focus on data volume is only half the story.
Technologies that only solve for the data volume problem tend to have batch-oriented techniques for processing data. This involves running a job on some pile of data that has accumulated for a period of time. In some ways, this is like trying to drink the ocean all at once. With modern computing power and paradigms, some technologies actually manage to achieve this, though usually at the expense of high latency.
Instead, there’s another property of modern data that we focus on in this book: data moves over networks in steady and never-ending streams. The technologies we cover in this book, Kafka Streams and ksqlDB, are specifically designed to process these continuous data streams in real time, and provide huge competitive advantages over the ocean-drinking variety. After all, many business problems are time-sensitive, and if you need to enrich, transform, or react to data as soon as it comes in, then Kafka Streams and ksqlDB will help get you there with ease and efficiency.
Learning Kafka Streams and ksqlDB is also a great way to familiarize yourself with the larger concepts involved in stream processing. This includes modeling data in different ways (streams and tables), applying stateless transformations of data, using local state for more advanced operations (joins, aggregations), understanding the different time semantics and methods for grouping data into time buckets/windows, and more. In other words, your knowledge of Kafka Streams and ksqlDB will help you distinguish and evaluate different stream processing solutions that currently exist and may come into existence sometime in the future.
I’m excited to share these technologies with you because they have both made an impact on my own career and helped me accomplish technological feats that I thought were beyond my own capabilities. In fact, by the time you finish reading this sentence, one of my Kafka Streams applications will have processed nine million events. The feeling you’ll get by providing real business value without having to invest exorbitant amounts of time on the solution will keep you working with these technologies for years to come, and the succinct and expressive language constructs make the process feel more like an art form than a labor. And just like any other art form, whether it be a life-changing song or a beautiful painting, it’s human nature to want to share it. So consider this book a mixtape from me to you, with my favorite compilations from the stream processing space available for your enjoyment: Kafka Streams and ksqlDB, Volume 1.
Информация о книге | |
Автор | Mitch Seymour |
Обложка | Мягкий |
Язык издания | Английский |
Год издания | 2021 |
Бумага | Офсетная |
Страниц | 432 |
Тематика | Языки и системы программирования |