Verbose logging on messages signalling data loss in producer #1553

sibiryakov · 2018-07-19T08:26:32Z

Recently, I've spend some human days trying to find where in our multi step pipeline we were having a data loss. It turned out one of our producers were generating batches too often resulting to their expiration. But there were no signs of any problems in logs. This PR attempts to fix this issue.

This change is

dpkp · 2018-08-31T13:09:58Z

You can also achieve this in application code by adding error logging to your produce messages via an callback / errback. Would that approach work for you instead?

sibiryakov · 2018-09-20T16:45:51Z

Yes, I can @dpkp but this is awkward and non-obvious. Look from the user prospective: When user is starting to use the library it is assuming that Producer is reliable, and default settings are sufficient. Then he pushes too much small messages and it could take him days to debug where the problem is, depending on the infrastructure complexity. From my experience when messages are silently dropped in Kafka-based sequence of workers it is extremely hard to find where the problem is, and implies checking the whole sequence.

So my opinion that even there is a way to be notified of the problem, it will be rarely used and therefore result to a bad user experience.

jeffwidman · 2018-10-31T06:22:53Z

I used to be against this PR, but after thinking about it more (and trying to support many dev teams that aren't familiar with the intricacies of this library), I do agree that the default user experience is better with this.

New users will look for an errback/callback mechanism to let them handle retries or dumping the message into another database for further analysis. But for checking if an application is healthy, the first thing I see most devs do is check the logs and if no errors then they assume everything is fine...

sibiryakov · 2018-11-02T11:30:56Z

But for checking if an application is healthy, the first thing I see most devs do is check the logs and if no errors then they assume everything is fine...

And this is natural. I consider design poor if it requires code to be rewritten to diagnose that there is a data drop happens.

dpkp · 2018-11-10T20:47:04Z

+1 -- thanks!!

raising logging level on messages signalling data loss

3747931

sibiryakov changed the title ~~Verbose logging on messages signalling data loss~~ Verbose logging on messages signalling data loss in producer Jul 19, 2018

jeffwidman force-pushed the master branch from cdfbb95 to 4d13713 Compare October 29, 2018 07:56

dpkp merged commit cd47701 into dpkp:master Nov 10, 2018

jeffwidman mentioned this pull request Jan 3, 2019

KafkaProducer - flush 'success' after 30 seconds but no message sent #1667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Verbose logging on messages signalling data loss in producer #1553

Verbose logging on messages signalling data loss in producer #1553

Uh oh!

sibiryakov commented Jul 19, 2018 •

edited by dpkp

Loading

Uh oh!

dpkp commented Aug 31, 2018

Uh oh!

sibiryakov commented Sep 20, 2018 •

edited

Loading

Uh oh!

jeffwidman commented Oct 31, 2018 •

edited

Loading

Uh oh!

sibiryakov commented Nov 2, 2018 •

edited

Loading

Uh oh!

dpkp commented Nov 10, 2018

Uh oh!

Uh oh!

Verbose logging on messages signalling data loss in producer #1553

Verbose logging on messages signalling data loss in producer #1553

Uh oh!

Conversation

sibiryakov commented Jul 19, 2018 • edited by dpkp Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpkp commented Aug 31, 2018

Uh oh!

sibiryakov commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffwidman commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sibiryakov commented Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dpkp commented Nov 10, 2018

Uh oh!

Uh oh!

sibiryakov commented Jul 19, 2018 •

edited by dpkp

Loading

sibiryakov commented Sep 20, 2018 •

edited

Loading

jeffwidman commented Oct 31, 2018 •

edited

Loading

sibiryakov commented Nov 2, 2018 •

edited

Loading