Skip to content

Respect min and max of inputs to create more precise repro scripts #4535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

crcrpar
Copy link
Collaborator

@crcrpar crcrpar commented May 28, 2025

Related: #4529

When an nvfuser fusion definition takes signed int tensors as inputs, it's possible that those int tensors have a solid value range to sample from e.g. [0, vocab_size), for embedding, the FusionDefinition.last_repro_script seems to have lost such range information as it uses FusionDefinition.fake_inputs that is generated by fakifying the input tensors, leading to a loss of actual values.

Inspired by lightning-thunder's ExampleInputMetaData, I rather store the range info in a dataclass that can give us a string of make_tensor or equivalents to get tensors of the same value range.

For the same fusion definition as #4529, running the fusion definition as follows

inputs = [
    torch.randint(0, 10, (16,), dtype=torch.int32, device='cuda:0'),
    torch.testing.make_tensor((151936, 2048), dtype=torch.bfloat16, device='cuda:0'),
    torch.testing.make_tensor((2048,), dtype=torch.bfloat16, device='cuda:0'),
    torch.testing.make_tensor((2560, 2048), dtype=torch.bfloat16, device='cuda:0'),
    torch.testing.make_tensor((2560,), dtype=torch.bfloat16, device='cuda:0'),
]
fd.execute(inputs, save_repro_inputs=True, print_repro=True)

both fd.last_repro_script() and the fd.execute return

inputs = [
    torch.testing.make_tensor((16,), dtype=torch.int32, device="cuda:0", low=0, high=7,),
    torch.testing.make_tensor((151936, 2048), dtype=torch.bfloat16, device="cuda:0", low=-9.0, high=8.9375,),
    torch.testing.make_tensor((2048,), dtype=torch.bfloat16, device="cuda:0", low=-9.0, high=8.9375,),
    torch.testing.make_tensor((2560, 2048), dtype=torch.bfloat16, device="cuda:0", low=-9.0, high=8.9375,),
    torch.testing.make_tensor((2560,), dtype=torch.bfloat16, device="cuda:0", low=-9.0, high=8.9375,),
]

@crcrpar crcrpar requested a review from Copilot May 28, 2025 19:45
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the reproducibility of fusion definitions by preserving the original input range information instead of using generic fake inputs. The key changes include:

  • Addition of the new dataclass InputReproducer to capture tensor metadata (min/max, shape, strides, etc.).
  • Replacement of fake_inputs with last_input_reproducers in the execute and last_repro_script methods.
  • Updated type annotations and diffs in the repro_script_for logic to utilize the new dataclass.

is_contiguous: bool

def __init__(self, tensor: torch.Tensor) -> None:
if type(tensor) is not torch.Tensor:
Copy link
Preview

Copilot AI May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using isinstance(tensor, torch.Tensor) instead of comparing types directly to better support tensor subclasses.

Suggested change
if type(tensor) is not torch.Tensor:
if not isinstance(tensor, torch.Tensor):

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

@crcrpar crcrpar May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I want to make sure tensor is compatible with torch.aminmax. What I really want to filter out is tensor subclasses that we cannot reconstruct with torch.testing.make_tensor, torch.rand, or torch.randint. Thus in short isinstance wouldn't be sufficing

@crcrpar crcrpar force-pushed the store-value-ranges-for-last-repro-script branch from 0a8a727 to 4873dd2 Compare May 29, 2025 09:00

fake_mode = FakeTensorMode()
self.fake_inputs = [fake_mode.from_tensor(inp) for inp in inputs]
self.last_input_reproducers = [InputReproducer(tensor) for tensor in inputs]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need additional information, such as requires_grad or storage_offset, for example? In Thunder, when this API is used for reporting, it also generates a corresponding Torch operator function for the NVFuser region. To ensure correctness, it constructs input tensors for this function that include properties like requires_grad and storage_offset (which were previously retrieved from FakeTensor).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose not to remove fake_inputs and to add last_input_reproducers with storage_offset and requires_grad

@crcrpar crcrpar force-pushed the store-value-ranges-for-last-repro-script branch from bb17536 to 6472dca Compare June 2, 2025 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants