tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Messaging > Apache Kafka > Kafka Topics, Partitions and Consumer Groups

Kafka Topics, Partitions and Consumer Groups

Author: Venkata Sudhakar

A Kafka topic is a category or feed name to which records are published. Topics in Kafka are multi-subscriber, meaning a topic can have zero, one, or many consumers that subscribe to the data written to it. Topics are further divided into partitions, which are the fundamental unit of parallelism in Kafka. Each partition is an ordered, immutable sequence of records that is continually appended to. The number of partitions determines the maximum degree of parallelism for both producers writing to and consumers reading from a topic.

Replication makes Kafka fault-tolerant. Each partition can be replicated across multiple Kafka brokers. One broker is designated the leader for each partition and handles all reads and writes. The other brokers are followers that replicate the leader passively. If a leader broker fails, one of the followers is automatically promoted to leader. A replication factor of 3 means each partition exists on 3 brokers, tolerating 2 broker failures. The replication factor should never exceed the number of brokers in the cluster.

The below example shows how to create, describe, and manage Kafka topics using the kafka-topics CLI tool, including setting partition count and replication factor.


It gives the following output,

Created topic order-events.

Topic: order-events  Partitions: 3  Replication Factor: 3
Topic: order-events  Partition: 0  Leader: 1  Replicas: 1,2,3  Isr: 1,2,3
Topic: order-events  Partition: 1  Leader: 2  Replicas: 2,3,1  Isr: 2,3,1
Topic: order-events  Partition: 2  Leader: 3  Replicas: 3,1,2  Isr: 3,1,2

# Isr = In-Sync Replicas (followers fully caught up with leader)

order-events
payment-events
user-events

Consumer groups are how Kafka enables parallel, scalable consumption. All consumers sharing the same group.id form one logical consumer. Kafka assigns each partition to exactly one consumer in the group - no partition is consumed by two consumers in the same group simultaneously. The below example demonstrates consumer group mechanics by running two consumer instances in the same group.


It gives the following output,

GROUP                    TOPIC         PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG  CONSUMER-ID
order-processing-group   order-events  0          1250            1252            2    consumer-1
order-processing-group   order-events  1          980             980             0    consumer-1
order-processing-group   order-events  2          1100            1103            3    consumer-2

# Consumer-1 is assigned partitions 0 and 1
# Consumer-2 is assigned partition 2
# LAG = messages not yet consumed (log-end - current-offset)

# After reset to earliest:
new offset: 0
new offset: 0
new offset: 0

How to choose the right number of partitions:

The number of partitions is the maximum degree of parallelism for consumers. If you have 3 partitions and add a 4th consumer to the group, the 4th consumer sits idle because all partitions are already assigned. A good rule of thumb is to set partitions equal to the expected maximum number of consumers you will ever need for that topic. More partitions increase throughput but also increase metadata overhead on the broker and increase end-to-end latency. For most use cases, 3 to 12 partitions per topic is a reasonable range.


 
  


  
bl  br