We currently run benchmarks a minimum of 5 times, which is too much when iterating on slow benchmarks. Add `--num N` to override the default logic.