Description
My org uses the prometheus-kafka-consumer-group-exporter project, which depends on this library. All this project does is read through the __consumer_offsets
topic and generate metrics about the consumer groups. The latest version fails to read through the topic quickly enough to keep up.
With some digging, I found that this performance change started at this commit: 8c07925. I verified this with an experiment where I ran the following script against this commit, the commit preceding it (7a99013), and master:
from kafka import KafkaConsumer
import schedule
import datetime
consumer_config = {
'bootstrap_servers': <redacted>,
'auto_offset_reset': 'earliest',
'group_id': None
}
consumer = KafkaConsumer(
'__consumer_offsets',
**consumer_config
)
iterations=0
def print_status():
print(datetime.datetime.now(), iterations)
schedule.every(5).seconds.do(print_status)
while True:
for message in consumer:
iterations = iterations + 1
schedule.run_pending()
I ran three 2 minute trials against each commit to test the throughput of the consumer. Here are the results:
As you can see, this commit cause the consumer to run 8 times slower (2,426,802 messages vs. 300,187 messages). This has not improved since then.
I do not understand the details around this commit, but it has rendered this project unusuable on the latest version of kafka-python.