What is Apache Kafka
Event streaming platform
Apache Kafka is a distributed event streaming platform designed to handle large volumes of data in real time.
Core Concepts
- Topic — category for organizing messages
- Partition — topic division for parallelism
- Producer — message sender
- Consumer — message receiver
- Broker — Kafka server
- Consumer Group — group of consumers for load balancing
Benefits of Kafka
- High throughput (millions of messages/sec)
- Horizontal scaling
- Long-term message storage
- Delivery guarantees (at-least-once, exactly-once)
- Fault tolerance through replication
Applications
- Event-Driven Architecture — microservices communication
- Logging — centralized log collection
- Data streaming — real-time analytics
- ETL pipelines — data integration
- IoT — device data processing
Ecosystem
- Kafka Connect — database connectors
- Kafka Streams — stream processing in Java
- ksqlDB — SQL for data streams
- Schema Registry — Avro/JSON schema management