Skip to content

Style: checked integer arithmetic? #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sffc opened this issue Jun 13, 2020 · 14 comments · Fixed by #154
Closed

Style: checked integer arithmetic? #126

sffc opened this issue Jun 13, 2020 · 14 comments · Fixed by #154
Assignees
Labels
C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library
Milestone

Comments

@sffc
Copy link
Member

sffc commented Jun 13, 2020

ICU has had numerous bugs involving integer overflow wrapping around and causing crashes when indexing into arrays, etc. Rust integer types have functions like checked_add that allow you to perform failure logic if an integer overflow would occur. I was wondering whether we should adopt a best practice to avoid "raw" integer arithmetic and always use the "checked" versions.

Should we make an exception for + on two usizes, especially if one of the usizes is fixed (like i += 1)?

@Manishearth
Copy link
Member

Rust panics on overflow in debug mode, so overflow is not as common a bug in well-tested rust code in my experience. It's typical practice to use regular integer types and test in debug mode.

We should definitely use checked methods in code where overflow may occur based on user input.

@sffc
Copy link
Member Author

sffc commented Jun 17, 2020

Should we use checked arithmetic when doing something like pulling items from a user-provided iterator? The iterator could in theory be longer than usize, for example.

@sffc sffc added this to the 2020 Q2 milestone Jun 17, 2020
@Manishearth
Copy link
Member

When does the iterator length come up when pulling from an iterator? You just pull. It's only when you collect into a vector that it matters, and iirc .collect() handles this

@sffc sffc added the discuss Discuss at a future ICU4X-SC meeting label Jun 17, 2020
@sffc
Copy link
Member Author

sffc commented Jun 17, 2020

It came up for me a couple days ago when I was pulling items from an int iterator and wanted to keep a usize tally of the number of zeros versus non-zeros.

@zbraniecki
Copy link
Member

enumerate?

@Manishearth
Copy link
Member

Ah yeah in that case you should.

@sffc
Copy link
Member Author

sffc commented Jun 17, 2020

Ah yeah in that case you should.

OK. So in that case, going back to my original question: what do you think about requiring that all arithmetic is explicit about which type of arithmetic you want? Rust standard library gives you several choices:

  • checked_add: returns None if overflow occurred
  • overflowing_add: returns the wrapped number and a boolean if overflow occurred
  • saturating_add: stops at the max integer instead of overflowing
  • wrapping_add: returns the wrapped number without a boolean

I assume wrapping_add is the default if you use the + operator. Is that correct? If so, is calling the wrapping_add function a zero-cost abstraction over the + operator?

@Manishearth
Copy link
Member

what do you think about requiring that all arithmetic is explicit about which type of arithmetic you want?

No, this is highly unusual in Rust. I think we should do this for select cases where user-provided numbers come into play.

@sffc
Copy link
Member Author

sffc commented Jun 18, 2020

Do we rely on human code review to catch arithmetic that could cause an unchecked overflow?

It seems that the Rust style is generally to enforce correctness via static analysis of code. I would expect that there would be, e.g., a Clippy warning that we could enable that forbids the use of an unchecked + operator. Why is this different?

@Manishearth
Copy link
Member

Do we rely on human code review to catch arithmetic that could cause an unchecked overflow?

Typically debug assertions in tests catch these. Fuzzing helps.

Why is this different?

It tends to be that most cases of doing addition are going to be okay, so the tradeoff is between a cognitive overload with low marginal payoff. Nor does this have the complexity-compounding issues that memory safety has.

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jun 18, 2020
@sffc sffc self-assigned this Jun 18, 2020
@sffc sffc added C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library labels Jun 18, 2020
@EvanJP
Copy link
Contributor

EvanJP commented Jun 18, 2020

Using checked_add everywhere might add code to the point of worry, so I am in favor of having human code reviews/occasional usage.

rust-lang/rust#9469 (From 2013) brings up the difference in generated code when using + vs. checked_add

@sffc
Copy link
Member Author

sffc commented Jun 19, 2020

Additional data on the code size of checked_add: I wrote the following functions:

#[no_mangle]
pub fn checked_add(a: i32) -> i32 {
	match a.checked_add(100) {
		Some(v) => v,
		None => 0,
	}
}

#[no_mangle]
pub fn saturating_add(a: i32) -> i32 {
	a.saturating_add(100)
}

#[no_mangle]
pub fn wrapping_add(a: i32) -> i32 {
	a.wrapping_add(100)
}

#[no_mangle]
pub fn default_add(a: i32) -> i32 {
	a + 100
}

Here is the WASM output:

  (func $checked_add (type 1) (param i32) (result i32)
    (local i32)
    i32.const 0
    local.get 0
    i32.const 100
    i32.add
    local.tee 1
    local.get 0
    i32.const -1
    i32.gt_s
    local.tee 0
    local.get 0
    local.get 1
    i32.const -1
    i32.gt_s
    i32.ne
    i32.and
    select)
  (func $saturating_add (type 1) (param i32) (result i32)
    (local i32)
    i32.const 2147483647
    i32.const -2147483648
    local.get 0
    i32.const 100
    i32.add
    local.tee 1
    i32.const 0
    i32.lt_s
    select
    local.get 1
    local.get 0
    i32.const -1
    i32.gt_s
    local.tee 0
    local.get 0
    local.get 1
    i32.const -1
    i32.gt_s
    i32.ne
    i32.and
    select)
  (func $wrapping_add (type 1) (param i32) (result i32)
    local.get 0
    i32.const 100
    i32.add)
  (func $default_add (type 1) (param i32) (result i32)
    local.get 0
    i32.const 100
    i32.add)

Bytes of WASM:

  • saturating_add: 49
  • checked_add: 33
  • wrapping_add: 10
  • default_add: 10

So, the safe methods are bigger, but not as big as they were in 2013.

To reproduce, see sffc/rust-wasm-i18n@f31c30f

@sffc
Copy link
Member Author

sffc commented Jun 19, 2020

My attempt at decompiling checked_add by hand:

pub fn checked_add(a: i32) -> i32 {
  let b = a + 100;
  let c1 = (a > -1);
  let c2 = (b > -1);
  if c1 && c1 != c2 {
    // overflow
    return 0;
  } else {
    // no overflow
    return b;
  }
}

I've convinced myself that this is correct as long as I am adding a positive constant.

If you don't tell the compiler that the thing you are adding is guaranteed to be positive, the code gets a bit longer, from 33 bytes to 44 bytes:

  (func $checked_add (type 1) (param i32 i32) (result i32)
    (local i32)
    i32.const 0
    local.get 0
    local.get 1
    i32.add
    local.tee 2
    local.get 0
    i32.const -1
    i32.gt_s
    local.tee 0
    local.get 1
    i32.const -1
    i32.gt_s
    i32.eq
    local.get 0
    local.get 2
    i32.const -1
    i32.gt_s
    i32.ne
    i32.and
    select)

In any case, I think this is a good enough reason to not recommend using checked_add everywhere.

@hsivonen
Copy link
Member

We should require checked math for the functions that take input size and compute the worst-case output size. While safe-only Rust code would catch the issue of a too short buffer (due to overflow) at the time of trying to index into the buffer, when the application code is C or C++, an unexpectedly short buffer could be a problem for operations performed on the application side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants