Skip to content

uucore: format: num_parser: Use ExtendedBigDecimal #7556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 1, 2025

Conversation

drinkcat
Copy link
Contributor

@drinkcat drinkcat commented Mar 24, 2025

Stacks on top of #7514.

The main direct advantage of this change is that printf parser is a bit more accurate (more corner cases are covered), and can now print floats with arbitrary precision.

A follow-up CL will move seq to use this parsing functions as well, so that we do not have to maintain and fix 2 different set of parsing functions like we do now.

  • Changes the integer parsing code to use ExtendedBigDecimal internally
  • Add support for scientific number parsing.
  • Fix a few corner cases (leading +, no binary floats, etc.)

printf "%.24f %.24a\n" 0.1 0.1 after this PR:

0.100000000000000000000000 0xc.cccccccccccccccccccccccdp-7

main used to print (f64 precision)

0.100000000000000005551115 0xc.cccccccccccd000000000000p-7

More precise than what coreutils can do (f80 precision on x86):

0.100000000000000000001355 0xc.ccccccccccccccd000000000p-7

uucore: format: num_parser: Make it clear that scale can only be positive

After scratching my head a bit about why the hexadecimal code works,
seems better to do make scale an u64 to clarify.

Note that this may u64 may exceed i64 capacity, but that can only
happen if the number of digits provided > 2**63 (impossible).

uucore: format: num_parser: Parse exponent part of floating point numbers

Parse numbers like 123.15e15 and 0xfp-2, and add some tests
for that.

parse is becoming more and more of a monster: we should consider
splitting it into multiple parts.

Fixes #7474.

uucore: format: num_parser: Fix large hexadecimal float parsing

Large numbers can overflow u64 when doing 16u64.pow(scale):
do the operation on BigInt/BigDecimal instead.

Also, add a test. Wolfram Alpha can help confirm the decimal number
is correct (16-16**-21).

uucore: format: Use ExtendedBigDecimal in argument code

Provides arbitrary precision for float parsing in printf.

Also add a printf test for that.

uucore: format: num_parser: Add parser for ExtendedBigDecimal

Very simple as the f64 parser actually uses that as intermediary
value.

Add a few tests too.

uucore: format: num_parser: Disallow binary number parsing for floats

Fixes #7487.

Also, add more tests for leading zeros not getting parsed as octal
when dealing with floats.

uucore: format: num_parser: allow leading + sign when parsing

Leading plus signs are allowed for all formats.

Add tests (including some tests for negative i64 values, and mixed
case special values that springed to mind).

Fixes #7473.

uucore: format: num_parser: Turn parser into a trait

We call the function extended_parse, so that we do not clash
with other parsing functions in other traits.

  • Also implement parser for ExtendedBigDecimal (straightforward).
  • Base doesn't need to be public anymore.
  • Rename the error to ExtendedParserError.

uucore: format: num_parser: Fold special value parsing in main parsing function

uucore: format: num_parser: Use ExtendedBigDecimal for internal representation

ExtendedBigDecimal already provides everything we need, use that
instead of a custom representation.

@drinkcat drinkcat marked this pull request as ready for review March 24, 2025 08:52
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/misc/stdbuf is no longer failing!

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat drinkcat force-pushed the parse-bigdecimal branch 2 times, most recently from 3d8ef6b to 7cf6b2b Compare March 25, 2025 12:26
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the number parsing functionality to use ExtendedBigDecimal, allowing arbitrary precision formatting and improved handling of corner cases in float parsing. Key changes include:

  • Updating printf parsing to use ExtendedBigDecimal for increased precision.
  • Adding support for scientific notation and fixing corner cases (leading plus sign, no binary floats, large hexadecimal floats).
  • Refactoring functions and traits (such as replacing get_f64 with get_extended_big_decimal) to consistently work with ExtendedBigDecimal.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

File Description
tests/by-util/test_printf.rs Added a test to verify high precision float formatting using ExtendedBigDecimal
src/uucore/src/lib/features/format/spec.rs Updated parsing logic to obtain ExtendedBigDecimal from arguments
src/uucore/src/lib/features/format/extendedbigdecimal.rs Added Default implementation for ExtendedBigDecimal
src/uucore/src/lib/features/format/argument.rs Refactored argument parsing to use ExtendedBigDecimal and ExtendedParserError

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat
Copy link
Contributor Author

(network flake in CI, unrelated)

drinkcat added 10 commits March 31, 2025 10:04
…sentation

ExtendedBigDecimal already provides everything we need, use that
instead of a custom representation.
We call the function extended_parse, so that we do not clash
with other parsing functions in other traits.

- Also implement parser for ExtendedBigDecimal (straightforward).
- Base doesn't need to be public anymore.
- Rename the error to ExtendedParserError.
Leading plus signs are allowed for all formats.

Add tests (including some tests for negative i64 values, and mixed
case special values that springed to mind).

Fixes uutils#7473.
Fixes uutils#7487.

Also, add more tests for leading zeros not getting parsed as octal
when dealing with floats.
Very simple as the f64 parser actually uses that as intermediary
value.

Add a few tests too.
Provides arbitrary precision for float parsing in printf.

Also add a printf test for that.
Large numbers can overflow u64 when doing 16u64.pow(scale):
do the operation on BigInt/BigDecimal instead.

Also, add a test. Wolfram Alpha can help confirm the decimal number
is correct (16-16**-21).
…bers

Parse numbers like 123.15e15 and 0xfp-2, and add some tests
for that.

`parse` is becoming more and more of a monster: we should consider
splitting it into multiple parts.

Fixes uutils#7474.
…tive

After scratching my head a bit about why the hexadecimal code works,
seems better to do make scale an u64 to clarify.

Note that this may u64 may exceed i64 capacity, but that can only
happen if the number of digits provided > 2**63 (impossible).
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre requested a review from Copilot March 31, 2025 10:40
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the numerical parsing and formatting in uucore to use ExtendedBigDecimal, resulting in increased precision and support for scientific notation while addressing various corner cases.

  • Converts integer and float parsing to use ExtendedBigDecimal for arbitrary precision.
  • Adds support for scientific number parsing and corrects edge cases (e.g., leading '+' signs, prevention of binary float parsing).
  • Updates tests and adjusts error handling to align with the new parser design.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

File Description
tests/by-util/test_printf.rs Adds a new test for high-precision float output using ExtendedBigDecimal.
src/uucore/src/lib/features/format/spec.rs Refactors float parsing to use get_extended_big_decimal, removing earlier conversion from f64.
src/uucore/src/lib/features/format/extendedbigdecimal.rs Introduces the Default trait implementation for ExtendedBigDecimal.
src/uucore/src/lib/features/format/argument.rs Updates argument extraction to use ExtendedBigDecimal and its associated parsing functions.

@RenjiSann RenjiSann merged commit ace92dc into uutils:main Apr 1, 2025
105 of 106 checks passed
@RenjiSann
Copy link
Collaborator

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants