Skip to content

KafkaConsumer runs 8x slower since v1.4.5 #1888

Closed
@jutley

Description

@jutley

My org uses the prometheus-kafka-consumer-group-exporter project, which depends on this library. All this project does is read through the __consumer_offsets topic and generate metrics about the consumer groups. The latest version fails to read through the topic quickly enough to keep up.

With some digging, I found that this performance change started at this commit: 8c07925. I verified this with an experiment where I ran the following script against this commit, the commit preceding it (7a99013), and master:

from kafka import KafkaConsumer
import schedule
import datetime

consumer_config = {
    'bootstrap_servers': <redacted>,
    'auto_offset_reset': 'earliest',
    'group_id': None
}

consumer = KafkaConsumer(
    '__consumer_offsets',
    **consumer_config
)

iterations=0

def print_status():
    print(datetime.datetime.now(), iterations)

schedule.every(5).seconds.do(print_status)

while True:
    for message in consumer:
        iterations = iterations + 1
        schedule.run_pending()

I ran three 2 minute trials against each commit to test the throughput of the consumer. Here are the results:
image

As you can see, this commit cause the consumer to run 8 times slower (2,426,802 messages vs. 300,187 messages). This has not improved since then.

I do not understand the details around this commit, but it has rendered this project unusuable on the latest version of kafka-python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions