Skip to content

Interval for background heartbeats unreliable #1672

Closed
@berupp

Description

@berupp

Hi,

This applies to 1.4.1, 1.4.3 and 1.4.4:

One of our python applications has issues with the consumer disconnecting from the broker:

[W 181210 02:02:13 base:964] Heartbeat session expired, marking coordinator dead
[W 181210 02:02:13 base:698] Marking the coordinator dead (node 1002) for group cg_0: Heartbeat session expired.

After some investigation, I ran a packet capture to verify that the heartbeat is actually sent at the configured interval (10s in our case)

The results were quite surprising. Here is the timeline for Heartbeats:
16.63
26.72
36.79
51.91
72.02
112.20
132.33
142.43
177.62
187.69
197.77
222.94
233.01
243.10
283.40

Not only are the heartbeats completely inconsistent, there are gaps of > 40 seconds.
Our session_timeout was 30 seconds, which explained the consumer disconnects.

I raised the session_timeout to 3 minutes, but still eventually missed heartbeats during a soak, leading to consumer disconnect.

I ran a simultaneous capture of two other apps written in Golang and Java. Those have 3 second heartbeats configured and where consistently spot on.

Is this something that could be fixed, or simply a limitation due to GIL?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions