Skip to content

kafka consumer hangs after errors  #1989

Closed
@arjunsingri

Description

@arjunsingri

We see the following errors in our kafka consumer (based on gevent) and the consumer hangs after this point. It stops receiving kafka messages after this. Restarting the consumer always helps.

I can help provide any stack traces using gdb and strace output as well. Please let me know. This is a large deployment of kafka consumers and it is hurting us pretty badly right now. Please help.

2019-12-21 13:34:12,434 ERROR           conn                    _recv:1052      4657    139673879961424 DummyThread-1   <BrokerConnection node_id=coordinator-6 host=host1:9092 <connected> [IPv4 ('host1', 9092)]>: Error receiving network data closing socket
Traceback (most recent call last):
  File "/opt/ns/nsenv/lib/python2.7/site-packages/kafka/conn.py", line 1034, in _recv
    data = self._sock.recv(self.config['sock_chunk_bytes'])
  File "/opt/ns/nsenv/lib/python2.7/site-packages/gevent/_socket2.py", line 277, in recv
    return sock.recv(*args)
error: [Errno 104] Connection reset by peer
2019-12-21 13:34:12,442 INFO            conn                    close:863       4657    139673879961424 DummyThread-1   <BrokerConnection node_id=coordinator-6 host=host1:9092 <connected> [IPv4 ('host1', 9092)]>: Closing connection. KafkaConnectionError: [Errno 104] Connection reset by peer
2019-12-21 13:34:12,442 WARNING         client_async            _conn_state_change:327  4657    139673879961424 DummyThread-1   Node coordinator-6 connection failed -- refreshing metadata
2019-12-21 13:34:12,442 ERROR           base                    _failed_request:493     4657    139673879961424 DummyThread-1   Error sending OffsetCommitRequest_v2 to node coordinator-6 [KafkaConnectionError: [Errno 104] Connection reset by peer]
2019-12-21 13:34:12,442 WARNING         base                    coordinator_dead:714    4657    139673879961424 DummyThread-1   Marking the coordinator dead (node coordinator-6) for group configservice_nyc1nsgwpool2-2_US-NYC1: KafkaConnectionError: [Errno 104] Connection reset by peer.
2019-12-21 13:34:12,443 ERROR           base                    _failed_request:493     4657    139673879961424 DummyThread-1   Error sending HeartbeatRequest_v1 to node coordinator-6 [KafkaConnectionError: [Errno 104] Connection reset by peer]
2019-12-22 22:10:31,776 INFO            conn                    close:863       4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=3 host=host2:9092 <connected> [IPv4 ('host1', 9092)]>: Closing connection. KafkaConnectionError: Socket EVENT_READ without in-flight-requests
2019-12-22 22:10:31,777 WARNING         client_async            _conn_state_change:327  4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    Node 3 connection failed -- refreshing metadata
2019-12-22 22:10:31,831 INFO            conn                    connect:374     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=13 host=host2:9092 <connecting> [IPv4 ('host1', 9092)]>: connecting to host1:9092 [('host2', 9092) IPv4]
2019-12-22 22:10:31,901 INFO            conn                    connect:403     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=13 host=host2t:9092 <connecting> [IPv4 ('host1', 9092)]>: Connection complete.
2019-12-22 22:18:04,300 INFO            conn                    connect:374     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=3 host=host2:9092 <connecting> [IPv4 ('host1', 9092)]>: connecting to host1:9092 [('host2', 9092) IPv4]
2019-12-22 22:18:04,371 INFO            conn                    connect:403     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=3 host=host2:9092 <connecting> [IPv4 ('host1', 9092)]>: Connection complete.
2019-12-22 22:24:32,048 INFO            client_async            _maybe_close_oldest_connection:951      4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    Closing idle connection 13, last active 540001 ms ago
2019-12-22 22:24:32,049 INFO            conn                    close:863       4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=13 host=host2:9092 <connected> [IPv4 ('host1', 9092)]>: Closing connection.
2019-12-29 21:18:05,167 INFO            conn                    connect:374     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=4 host=host1:9092 <connecting> [IPv4 ('host1', 9092)]>: connecting to host1:9092 [('host2', 9092) IPv4]
2019-12-29 21:18:05,238 INFO            conn                    connect:403     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=4 host=host2:9092 <connecting> [IPv4 ('host1', 9092)]>: Connection complete.
2019-12-29 21:27:05,240 INFO            client_async            _maybe_close_oldest_connection:951      4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    Closing idle connection 4, last active 540000 ms ago
2019-12-29 21:27:05,241 INFO            conn                    close:863       4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=4 host=host2:9092 <connected> [IPv4 ('host1', 9092)]>: Closing connection.
2020-01-12 05:08:06,923 INFO            conn                    connect:374     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=2 host=host1:9092 <connecting> [IPv4 ('host1', 9092)]>: connecting to host1:9092 [('host2', 9092) IPv4]
2020-01-12 05:08:06,994 INFO            conn                    connect:403     4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=2 host=host2:9092 <connecting> [IPv4 ('host1', 9092)]>: Connection complete.
2020-01-12 05:17:06,994 INFO            client_async            _maybe_close_oldest_connection:951      4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    Closing idle connection 2, last active 540000 ms ago
2020-01-12 05:17:06,995 INFO            conn                    close:863       4657    139673876999056 netskope-producer-nyc1nsgwpool22-4657-network-thread    <BrokerConnection node_id=2 host=host2:9092 <connected> [IPv4 ('host2', 9092)]>: Closing connection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions