-
Notifications
You must be signed in to change notification settings - Fork 60
Respect min
and max
of inputs to create more precise repro scripts
#4535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Respect min
and max
of inputs to create more precise repro scripts
#4535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the reproducibility of fusion definitions by preserving the original input range information instead of using generic fake inputs. The key changes include:
- Addition of the new dataclass InputReproducer to capture tensor metadata (min/max, shape, strides, etc.).
- Replacement of fake_inputs with last_input_reproducers in the execute and last_repro_script methods.
- Updated type annotations and diffs in the repro_script_for logic to utilize the new dataclass.
is_contiguous: bool | ||
|
||
def __init__(self, tensor: torch.Tensor) -> None: | ||
if type(tensor) is not torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using isinstance(tensor, torch.Tensor) instead of comparing types directly to better support tensor subclasses.
if type(tensor) is not torch.Tensor: | |
if not isinstance(tensor, torch.Tensor): |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I want to make sure tensor is compatible with torch.aminmax
. What I really want to filter out is tensor subclasses that we cannot reconstruct with torch.testing.make_tensor
, torch.rand
, or torch.randint
. Thus in short isinstance
wouldn't be sufficing
0a8a727
to
4873dd2
Compare
python/nvfuser/__init__.py
Outdated
|
||
fake_mode = FakeTensorMode() | ||
self.fake_inputs = [fake_mode.from_tensor(inp) for inp in inputs] | ||
self.last_input_reproducers = [InputReproducer(tensor) for tensor in inputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need additional information, such as requires_grad
or storage_offset
, for example? In Thunder, when this API is used for reporting, it also generates a corresponding Torch operator function for the NVFuser region. To ensure correctness, it constructs input tensors for this function that include properties like requires_grad and storage_offset (which were previously retrieved from FakeTensor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose not to remove fake_inputs
and to add last_input_reproducers
with storage_offset and requires_grad
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
Signed-off-by: Masaki Kozuki <[email protected]>
bb17536
to
6472dca
Compare
Related: #4529
When an nvfuser fusion definition takes signed int tensors as inputs, it's possible that those int tensors have a solid value range to sample from e.g.
[0, vocab_size)
, for embedding, theFusionDefinition.last_repro_script
seems to have lost such range information as it usesFusionDefinition.fake_inputs
that is generated by fakifying the input tensors, leading to a loss of actual values.Inspired by lightning-thunder's
ExampleInputMetaData
, I rather store the range info in a dataclass that can give us a string ofmake_tensor
or equivalents to get tensors of the same value range.For the same fusion definition as #4529, running the fusion definition as follows
both
fd.last_repro_script()
and thefd.execute
return