-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
printf: Consistently handle negative widths/precision #7620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
528d1a2
to
f56ad61
Compare
GNU testsuite comparison:
|
f56ad61
to
43310fe
Compare
GNU testsuite comparison:
|
}; | ||
match next { | ||
FormatArgument::SignedInt(n) if !n.is_negative() => *n, | ||
FormatArgument::Unparsed(s) if !s.starts_with('-') => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!s.starts_with('-')
Can the string start with spaces? I think you should probably parse first...
Actually, you might be better off moving that logic to resolve_asterisk_precision
, I feel it'd be easier to handle there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, you even want to handle negative values < i64::MIN
... Still need to figure out how to handle leading white spaces though (and please add a test for that ,-P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, GNU coreutils doesn't handle such large values, so maybe we don't have to either:
env printf "|%.*d|" " -7777777777777777777777777" "10"
|printf: ‘ -7777777777777777777777777’: Numerical result out of range
10|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I didn't realize GNU printf gave that error for significantly negative precision's - actually, it seems like I can remove get_i64_non_negative and still handle the case in the GNU tests with a precision of -2147483649.
Ignoring whitespace is unexpected for me, I don't think that's universal or documented in man printf
? Either way, this is now tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring whitespace is unexpected for me, I don't think that's universal or documented in
man printf
? Either way, this is now tested.
Yeah, there are a lot of undocumented corner cases, especially when it comes to parsing/formatting. I'm still new here, but I think the idea is to be as compatible as possible with GNU coreutils (unless GNU behavior cannot really be justified as "correct"), that means reading the manual, but also feeding creative inputs to the GNU implementation and seeing how it behaves.
Thanks for fixing and testing that.
.succeeds() | ||
.stdout_only("|0|"); | ||
|
||
new_ucmd!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding a comment here that explains that negative widths get aligned to the left? It's a bit tricky to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, just pushed (also bumped the width to 3 to make it a little more visible
GNU testsuite comparison:
|
@@ -349,7 +349,7 @@ impl Spec { | |||
Self::EscapedString => { | |||
let s = args.get_str(); | |||
let mut parsed = Vec::new(); | |||
for c in parse_escape_only(s.as_bytes(), OctalParsing::default()) { | |||
for c in parse_escape_only(s.as_bytes(), OctalParsing::ThreeDigits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, this doesn't break stuff like this right?
env printf "%b" "\\05llo" | hexdump -C
00000000 05 6c 6c 6f |.llo|
env printf "%b" "\\5llo" | hexdump -C
00000000 05 6c 6c 6f |.llo|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't break it (this is with GNU coreutils 9.6 in /bin):
$ ./target/debug/printf '%b' "\\05llo"|od -c
0000000 005 l l o
0000004
$ ./target/debug/printf '%b' "\\5llo"|od -c
0000000 005 l l o
0000004
$ /bin/printf '%b' "\\05llo"|od -c
0000000 005 l l o
0000004
$ /bin/printf '%b' "\\5llo"|od -c
0000000 005 l l o
0000004
I added some tests to show that octal characters are still accepted at different lengths (1-3 numbers normally and 1-4 with leading zero for %b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid screenshot.
They are terrible for accessibility and search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reminder, I replaced the screenshot
I think this is mostly ok (just one small question on |
GNU testsuite comparison:
|
LGTM thanks! I don't think the cargo clippy issues are your fault... (#7640) |
Thanks for the reviews! Yeah, although I guess the clippy warnings are a good way to find out there's new rust version :) |
GNU testsuite comparison:
|
impressive, bravo! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
@@ -459,11 +459,11 @@ fn parse( | |||
|
|||
// If nothing has been parsed, check if this is a special value, or declare the parsing unsuccessful | |||
if let Some((0, _)) = chars.peek() { | |||
if integral_only { | |||
return Err(ExtendedParserError::NotNumeric); | |||
return if integral_only { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet
this might be a new clippy warning
cc @samueltardieu ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't redundant_else_block
trigger on the original code (this covers if { …; return …; } else { … }
)?
Concerning the lint you're proposing, this could be a new lint indeed, see rust-lang/rust-clippy#14545 (I haven't found an existing proposal, maybe this is a duplicate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My IDE actually warned about it, I didn't know it wasn't already in clippy: https://www.jetbrains.com/help/inspectopedia/RsLift.html#locating-this-inspection
@@ -484,6 +490,21 @@ fn resolve_asterisk_maybe_negative<'a>( | |||
} | |||
} | |||
|
|||
fn resolve_asterisk_precision<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to add a comment explaining what this function is doing
and write unit tests (if not too complex)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, added doc and unit tests. Thanks for the review!
Also allows character constants with " instead of ', and for interpolated values with %b to use \0XXX notation for octal bytes
GNU testsuite comparison:
|
This pull request fixes the tests/printf/printf.sh tests. The main discrepancies between the uutil's implementation and GNU's appear to be the handling of negative widths or precisions when provided for
%*.*
- this was implemented for strings, but this PR brings other format specifications to also handle negative widths/precisions.This PR also allows character constants with " instead of '. I had trouble finding any documentation about this; as far as I can tell GNU printf treats " and ' the same in the situations tested.
There is also a bug fix for interpolated values with %b to use \0XXX notation for octal bytes - this comes in for something like
printf '\0001'
, which should be the null byte and an ASCII '1', vsprintf '%b' '\0001'
, which is the single 0x01 byte because interpolated values use \0XXX formatted octal bytes.