Skip to content

Strings/numbers distinction fix (following JSON::PP) #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vdudouyt
Copy link

Dear Aristotle,

First of all, thanks a lot for bringing this software to public.

Recently, we've run into an issue in one of our customer's production environment causing qBittorrent to fail with invalid 'name' of torrent (possible exploit attempt) whenever a torrent generated by our CMS is getting added. After investigating for a while, it was found out that this only happens for torrents containing files with as least one path component looking like a number (e.g. year) due to an implicit conversion to i causing qBittorrent to confuse.

In it's turn, this led me to the following notice in Bencode docs:

BUGS AND LIMITATIONS

Strings and numbers are practically indistinguishable in Perl, so C<bencode()>
has to resort to a heuristic to decide how to serialise a scalar. This cannot
be fixed.

Actually, it is not true that strings and numbers are practically indistinguishable in Perl - although it's correct that it is not somewhat normally needed unless for someone trying to build up a serializer for a format sensitive to that.

So, please let me share a simple yet effective solution found in JSON::PP source code with you. We're already using it in production in order to avoid this problem.

-       :  ( not ref         ) ? ( m/\A (?: 0 | -? [1-9] \d* ) \z/x ? 'i' . $_ . 'e' : length . ':' . $_ )
+       :  ( not ref         ) ? ( ("" & $_) ne "" ? 'i' . $_ . 'e' : length . ':' . $_ )

The drawback of this approach which may make you potentially reluctant to approve this PR is that in this way, we have to remove use warnings; to avoid unnecessary warnings in runtime. Although personally I find that a small sacrifice. In any way, I'm always open to discussion.

Best Regards,

Valentin

@ap
Copy link
Owner

ap commented May 14, 2024

Hi, thank you for the kind words and thank you very much for the patch. The approach makes sense to me and consolidating on how other modules do this is useful, so this will be going in in some form. I appreciate you taking the time to submit this upstream.

I see the details differently though:

  1. My least important objection is that this is still a heuristic. It figures out whether a scalar has either been created as a number or undergone any numeric operations, but that doesn’t necessarily say whether the value was meant as a string or number. If you read a numeric value from a text file, this test will identify it as a string. And if you take a string that happens to consist of only digits and pass it to something that uses an arithmetic operator on it (maybe in a test), that string will subsequently be identified by this test as a number (the same problem you are already having, albeit by a different route). These cases are less likely to cause misidentifications in practice compared to the existing test, so the new heuristic is better overall, but it doesn’t eliminate misidentifications – it’s still a heuristic.

    There’s also https://metacpan.org/pod/feature#The-'bitwise'-feature whose implications on this code I’ve not yet fully understood. (TODO)

  2. It’s not a sufficiently equivalent test. The regular expression expressly matched only valid Bencode integers. Your proposed patch will (unless I’m misreading it) cause floating point values with a fractional part to be output as (invalid) Btorrent integers. (TODO need tests for that case if I don’t already have some)

  3. It’s unnecessary to remove use warnings. All it takes is adding a defined check to the condition. Or scoping the no warnings with a do block. (Also, lifting only the uninitialized warning rather than just disabling all of them.)

    Hang on though. Why is it necessary to turn off undef warnings? In your patch the new expression replaces a pattern match. A pattern match against undef produces a warning too. If the code wasn’t emitting warnings before and did emit warnings afterwards… why? If that is actually the case then something deeper seems wrong.

    Or are we talking about other warnings? Which ones?

  4. Either way, the big issue to me is that this is a breaking change. It’s a good starting point to know that you have this patch in production and it’s working for you – but the patch does change what things get serialized as integers and I’m not sure that change is going to be correct for everyone else’s uses too. Probably it will quietly fix bugs for a lot of people, but I want to minimize the case where it does the opposite also. So this needs a way of transitioning people over.

No matter where I differ though, I’m happy you brought this to my attention, so thank you for that and for taking the time to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants