Skip to content

RUST-886 Use lossy UTF-8 decoding for responses to insert and update commands #601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

kmahar
Copy link
Contributor

@kmahar kmahar commented Mar 23, 2022

These changes work around a somewhat commonly encountered issue in the server where long error messages containing multi-byte characters are truncated incorrectly such that they are no longer valid UTF-8.

From digging through all of the related tickets for the server and drivers, this only ever seems to be observed in duplicate key errors, which I believe can only result from insert and update commands, so I've limited this PR to those two operations. It seems possible it could come up elsewhere, but I figured we could start with this and expand our usage of the lossy decoding later if the need arises.

Before my changes, all of the errors below, which are now duplicate key errors, would instead be something like

Error { kind: InvalidResponse { message: "invalid utf-8 sequence of 1 bytes from index 264" }, labels: {}, wire_version: Some(15) }

@kmahar kmahar marked this pull request as ready for review March 23, 2022 18:41
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


// a document containing a long string with multi-byte unicode characters. taken from a user
// repro in RUBY-2560.
let long_unicode_str_doc = doc! {"name": "(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻(╯°□°)╯︵ ┻━┻"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

┬─┬ノ( º _ ºノ)

Copy link
Contributor

@patrickfreed patrickfreed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kmahar kmahar merged commit 0a50512 into mongodb:master Mar 23, 2022
@kmahar kmahar deleted the RUST-886/handle-invalid-utf8-in-errors branch March 23, 2022 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants