The awesome open source Apache Kafka, which currently meets the demands of todays technological landscape, enabling event stream processing, real-time data pipelines, and data integration at scale.
Where did it come from?
This was originally created to process the real-time data feeds at LinkedIn all the way back in 2011. Since this initial instantiation, Kafka developed from message-queue to a complete event streaming platform which is able to handle over 1 million message per second, otherwise the trillion messages per day! 🤯
Why use Kafka?
As of this writing, Kafka has been adopted by over 80% of the top Fortune 500 companies, spanning from multitude of industries. This is no mere coincidence that Kafka is the chosen technology for distributed streaming systems for many architects and developers. Let’s go through what Kafka provides:
- Super charged throughout ⚡️
- Can hand high velocity and million messages per second!
- Scalability
- Kafka cluster can reach up to a thousand brokers
- Ability to scale up and down capacity on both storage and processing
- Low latency
- Message delivery as low as 2ms whilst handling high volumes of messages
- Permanent Storage
- Stream storage is included, this stored in a secure, durable, reliable and fault tolerant cluster
- High Availability
- Extendable clusters over availability zones and geographically available, with little risk of data loss
How does Kafka work?
Apache Kafka is a combination of a storage layer and a compute layer. This supports a real-time data ingestion, streaming data pipelines and storage across distributed systems. Therefore, enabling the processing of real-time data and the interoperability it provides when being implemented into different infrastructures.
“The core module and Kafka Streams Scala wrapper are currently in Scala, Clients (as of ~0.9), Kafka Connect and main Streams API are in Java.”
Stack Overflow – Jin Lee
Real-time processing at scale
The Kafka Stream API, is lightweight library, which can undertake on the fly processing, that can let you aggregate, create windowing parameters, joins of data within a stream etc.
Storage that is durable and persistent
Kafka provides a commit log of all the messages, this can be useful as a “source of truth“ when building data intensive applications. This data can also be distributed across multiple nodes to create high availability within a single data centre spanning across availability zones.
Publish and Subscribe
Kafka implements the publish and subscribe messaging pattern to achieve decoupling from other applications that use it. In the centre of this, Kafka provides an immutable commit log, therefore from this you can subscribe and publish to an incredible amount of applications.
What are Kafka uses?
Typically it is built for real time streaming data pipelines, and real time streaming apps.
Data pipelines
A data pipeline is ingesting data from source(s) into Kafka and then from Kafka to a target(s).
Stream processing
Streams processing can include:
- Filters
- Joins
- Maps
- Aggregations
- And many more
These stream processes are leveraged by enterprises to relieve other areas of the infrastructure that may have in the past have to handle these processes in mass with some type of batch processing, that is generally slower.
Streaming analytics
Kafka can be combined with technologies like Druid which can be combined to make a significant streaming analytics manager (SAM).
Druid can consume data from Kafka where they have been buffered in the Kafka brokers prior, which therefore provides analytical queries on that data.
Streaming (Extract Transform Load) ETL
Kafka combines features such as Kafka Connect source and sink connectors to consume and produce data from/to any other database, application, API, Single Message Transform (SMT) this is an optional Kafka Connect feature, Kafka Streams for continuous data processing in real-time at scale.
Event-Driven Microservices
Due to Kafka ability to solve problems such as scalability, efficiency, and speed, this is a well fitted technology to be used within a microservices architecture as these are some of the key challenges microservices have to overcome as they are naturally highly networked and chatty. Finally, this also includes inter-service communication while preserving ultra low latency and fault tolerance, which again is attractive to a microservice architecture.
The developing future of Apache Kafka
Due to Apache Kafka being widely adopted, there is now a market for this, and now there are upcoming platforms that are being created, that have this baked in and therefore sold to other technology companies; that require more governance or the need get Kafka connected to there existing infrastructure quickly by using connectors that have already been developed out the box. Some example platforms include:
Summary
Apache Kafka is undoubtedly an important technology for architects and developers to take seriously, if there organisations are moving towards scaling, increasing event streaming throughput or lower latency etc. It is also worth considering some of the newly available off the shelf products out there that helps organisation jump the integration gap more quickly, or to provide a more standardised approach that can be reused and governed in line with regulatory requirements if necessary, as there is a need for speed for businesses to adopt this when they are experiencing scaling demands that need addressing soon without the need to reinvent the wheel!
📚 Further Reading & Related Topics
If you’re exploring Kafka and its value in distributed streaming systems, these related articles will provide deeper insights:
• Spring Boot and Messaging Systems: Integrating with RabbitMQ and Kafka – Learn how to integrate Kafka with Spring Boot to build efficient and scalable messaging systems for your applications.
• Exploring the JVM: Bridging Low-Level Interactions with the Constants API – Discover how Kafka leverages Java Virtual Machine (JVM) optimizations for high-throughput and low-latency distributed streaming, and how you can benefit from understanding JVM-level interactions in system architecture.









Leave a comment