Skip to content

printf: trim leading whitespace when parsing numeric values #7512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2025

Conversation

dlrobertson
Copy link
Contributor

Trim leading whitespace from numeric input to printf.

This is what seq does here. Note that the rust docs for trim_start state the following:

‘Whitespace’ is defined according to the terms of the Unicode Derived Core Property White_Space, which includes newlines.

So technically the following assert would pass

  assert_eq!(Ok(-2), ParsedNumber::parse_i64("\n\t -0x2"));

It looks like this does not currently impact printf or seq, but is probably something to be aware of.

$ cargo run printf "%f\n" " \n 0x1"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s
     Running `target/debug/coreutils printf '%f\n' ' \n 0x1'`
printf: '\ \\n\ 0x1': expected a numeric value
0.000000

Fixes: #7505

@drinkcat
Copy link
Contributor

  1. That change should actually change printf behaviour (seq currently uses totally different parsing code, I'm trying to unify them...). Not sure why you don't see any change, but you really want to test that cargo run printf "%f\n" " 0x0" works. And maybe add an end-to-end test in test_printf.rs too.

  2. You really only want to trim spaces only, not other characters. At least that's what GNU coreutils seems to do: (I didn't test other unicode spaces ,-P):

$ env printf "%f\n" "\t0x1"
printf: ‘\\t0x1’: expected a numeric value
0.000000
$ env printf "%f\n" "\n0x1"
printf: ‘\\n0x1’: expected a numeric value
0.000000
$ env printf "%f\n" " 0x1"
1.000000

@dlrobertson
Copy link
Contributor Author

dlrobertson commented Mar 20, 2025

  1. That change should actually change printf behaviour (seq currently uses totally different parsing code, I'm trying to unify them...). Not sure why you don't see any change, but you really want to test that cargo run printf "%f\n" " 0x0" works.

Sorry, my PR text was unclear. This PR does change printf behavior and fixes the issue, but I just wanted to make sure reviewers were aware that technically ParsedNumber::parse_<some integer> would trim all whitespace. So there is a semantic question there of is rejecting newlines the responsibility of parse_<some integer> or the responsibility of the caller etc? printf already throws errors for other whitespace like \n and \t.

And maybe add an end-to-end test in test_printf.rs too.

Yup! Good suggestion.

  1. You really only want to trim spaces only, not other characters. At least that's what GNU coreutils seems to do: (I didn't test other unicode spaces ,-P):
$ env printf "%f\n" "\t0x1"
printf: ‘\\t0x1’: expected a numeric value
0.000000
$ env printf "%f\n" "\n0x1"
printf: ‘\\n0x1’: expected a numeric value
0.000000
$ env printf "%f\n" " 0x1"
1.000000

👍 awesome. The current patch ends up with the same output, but there is still the question of should ParsedNumber::parse also reject non ascii 0x20 whitespace.

@drinkcat
Copy link
Contributor

  1. That change should actually change printf behaviour (seq currently uses totally different parsing code, I'm trying to unify them...). Not sure why you don't see any change, but you really want to test that cargo run printf "%f\n" " 0x0" works.

Sorry, my PR text was unclear. This PR does change printf behavior and fixes the issue, but I just wanted to make sure reviewers were aware that technically ParsedNumber::parse_<some integer> would trim all whitespace. So there is a semantic question there of is rejecting newlines the responsibility of parse_<some integer> or the responsibility of the caller etc? printf already throws errors for other whitespace like \n and \t.

Oh! I misunderstood yes ,-) Do you know what code is rejecting non-whitespace leading characters in printf? It's not super obvious from a very quick glance I had...

I feel I'd rather have the logic in the common parse function, seq will soon use that too, and I think we'll want to convert timeout to use the common parsing code too (#7475)... But if the rejection happens in some other shared code, maybe we can argue.

And maybe add an end-to-end test in test_printf.rs too.

Yup! Good suggestion.

  1. You really only want to trim spaces only, not other characters. At least that's what GNU coreutils seems to do: (I didn't test other unicode spaces ,-P):
$ env printf "%f\n" "\t0x1"
printf: ‘\\t0x1’: expected a numeric value
0.000000
$ env printf "%f\n" "\n0x1"
printf: ‘\\n0x1’: expected a numeric value
0.000000
$ env printf "%f\n" " 0x1"
1.000000

👍 awesome. The current patch ends up with the same output, but there is still the question of should ParsedNumber::parse also reject non ascii 0x20 whitespace.

Great ,-)

@dlrobertson
Copy link
Contributor Author

Highly surprising, but seq and printf should actually both ignore all real whitespace 🤯

Did some digging and ran some examples with GNU coreutils:

$ seq 1 " $(printf '\n\t 0x2')"
1
2
$ printf "%f\n" "$(printf '\n\t 0x2')"
2.000000

I've learned something new 😆 I did not expect this to work.

Escape sequences are another story:

$ printf "%f\n" "\t\n 0x5"
-bash: printf: \t\n 0x5: invalid number
0.000000
$ seq 1 "\t\n 0x2"
seq: invalid floating point argument: ‘\\t\\n 0x2’
Try 'seq --help' for more information.

So I'll update the tests with some additions that cover more real whitespace types and add some tests to test_printf.rs.

Copy link

GNU testsuite comparison:

GNU test failed: timeout/timeout.log. timeout/timeout.log is passing on 'main'. Maybe you have to rebase?

Copy link
Contributor

@drinkcat drinkcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice find!

For some reason I thought the expansion of special characters like \t happened on the shell side ,-) Now I know ,-)

// Initial minus sign
let (negative, unsigned) = if let Some(input) = input.strip_prefix('-') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but the lack of symmetry is not so nice, maybe.

I wonder if you want to do this intead:

let (negative, unsigned) = if let Some(trimmed_input) = trimmed_input.strip_prefix('-') {
            (true, trimmed_input)
} else {
            (false, trimmed_input)
}

Or just shadow input above:

let input = input.trim_ascii_start();

new_ucmd!()
.args(&["%f", "\u{2029}0.1"])
.fails()
.stderr_contains("expected a numeric value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from that interesting discovery, do you want to test that printf doesn't actually expand \t macro?

i.e. .args(&["%f", "\\t0.1"]) fails. You'll need to add a comment to explain too, I'm sure others will get confused about this one too.

Copy link

GNU testsuite comparison:

GNU test failed: misc/stdbuf.log. misc/stdbuf.log is passing on 'main'. Maybe you have to rebase?
GNU test failed: timeout/timeout.log. timeout/timeout.log is passing on 'main'. Maybe you have to rebase?

@drinkcat
Copy link
Contributor

Don't forget to run cargo fmt and cargo clippy.

Otherwise, LGTM!

(turns out I'll need this one to pass seq tests with the common parsing methods!)

Trim leading whitespace from numeric input to printf.
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/tail/inotify-dir-recreate is no longer failing!

@sylvestre sylvestre merged commit 39706be into uutils:main Mar 24, 2025
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

printf: Should allow leading spaces when parsing numbers
3 participants