Replacing ShardedTensor with DTensor for RW sharding #2147

iamzainhuda · 2024-06-20T19:57:01Z

Summary:
This is the first part of migration TorchRec state dict checkpointing from ShardedTensor to DTensor. It sets up the necessary infra to support additional sharding schemes. The general approach is to keep ShardedTensor paths and remove them once all sharding types are supported on DTensor. This includes ShardingPlan and ShardedTensor dataclasses such as ShardedTensorMetadata. Those will be migrated in a separate diff with ParameterSharding

NOTE: This version of LocalShardsWrapper does not support empty shards, that is added in the next diff enabling CW. D57063512

This diff includes:

LocalShardsWrapper torch.tensor subclass to be used with DTensor
Changes in TorchRec state_dict load and creation to use DTensor for Row Wise path in both EmbeddingCollection and EmbeddingBagCollection
Changes to DCP to support LocalShardsWrapper for saving and reading (WriteItems and ReadItems)
Added DTensor paths to callsites where ShardedTensors are expected.

LocalShardsWrapper supports the following torch ops:

torch.ops._c10d_functional.all_gather_into_tensor.default
aten._to_copy.default
aten.view.default
aten.equal.default
aten.detach.default

With extensibility to add more as required by use cases.

See https://docs.google.com/document/d/16Ptl50mGFJW2cljdF2HQ6FwsiA0scwbAbjx_4dhabJw/edit?usp=drivesdk for more info regarding design and approach.

Reviewed By: XilunWu

Differential Revision: D54375878

facebook-github-bot · 2024-06-20T19:57:16Z

This pull request was exported from Phabricator. Differential Revision: D54375878

facebook-github-bot · 2024-06-20T20:02:57Z

This pull request was exported from Phabricator. Differential Revision: D54375878

Summary: Pull Request resolved: pytorch#2147 **This is the first part of migration TorchRec state dict checkpointing from ShardedTensor to DTensor. It sets up the necessary infra to support additional sharding schemes. The general approach is to keep ShardedTensor paths and remove them once all sharding types are supported on DTensor. This includes ShardingPlan and ShardedTensor dataclasses such as ShardedTensorMetadata. Those will be migrated in a separate diff with ParameterSharding** NOTE: This version of LocalShardsWrapper does not support empty shards, that is added in the next diff enabling CW. D57063512 **This diff includes:** + LocalShardsWrapper torch.tensor subclass to be used with DTensor + Changes in TorchRec state_dict load and creation to use DTensor for Row Wise path in both EmbeddingCollection and EmbeddingBagCollection + Changes to DCP to support LocalShardsWrapper for saving and reading (WriteItems and ReadItems) + Added DTensor paths to callsites where ShardedTensors are expected. **LocalShardsWrapper supports the following torch ops:** + torch.ops._c10d_functional.all_gather_into_tensor.default + aten._to_copy.default + aten.view.default + aten.equal.default + aten.detach.default With extensibility to add more as required by use cases. See https://docs.google.com/document/d/16Ptl50mGFJW2cljdF2HQ6FwsiA0scwbAbjx_4dhabJw/edit?usp=drivesdk for more info regarding design and approach. Reviewed By: XilunWu Differential Revision: D54375878

facebook-github-bot · 2024-06-23T22:54:32Z

This pull request was exported from Phabricator. Differential Revision: D54375878

Summary: Pull Request resolved: pytorch#2147 **This is the first part of migration TorchRec state dict checkpointing from ShardedTensor to DTensor. It sets up the necessary infra to support additional sharding schemes. The general approach is to keep ShardedTensor paths and remove them once all sharding types are supported on DTensor. This includes ShardingPlan and ShardedTensor dataclasses such as ShardedTensorMetadata. Those will be migrated in a separate diff with ParameterSharding** NOTE: This version of LocalShardsWrapper does not support empty shards, that is added in the next diff enabling CW. D57063512 **This diff includes:** + LocalShardsWrapper torch.tensor subclass to be used with DTensor + Changes in TorchRec state_dict load and creation to use DTensor for Row Wise path in both EmbeddingCollection and EmbeddingBagCollection + Changes to DCP to support LocalShardsWrapper for saving and reading (WriteItems and ReadItems) + Added DTensor paths to callsites where ShardedTensors are expected. **LocalShardsWrapper supports the following torch ops:** + torch.ops._c10d_functional.all_gather_into_tensor.default + aten._to_copy.default + aten.view.default + aten.equal.default + aten.detach.default With extensibility to add more as required by use cases. See https://docs.google.com/document/d/16Ptl50mGFJW2cljdF2HQ6FwsiA0scwbAbjx_4dhabJw/edit?usp=drivesdk for more info regarding design and approach. Reviewed By: XilunWu Differential Revision: D54375878

facebook-github-bot · 2024-06-24T00:17:36Z

This pull request was exported from Phabricator. Differential Revision: D54375878

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 20, 2024

facebook-github-bot added the fb-exported label Jun 20, 2024

iamzainhuda force-pushed the export-D54375878 branch from 5aaabb9 to b6d07b2 Compare June 20, 2024 20:03

iamzainhuda force-pushed the export-D54375878 branch from b6d07b2 to 6877d68 Compare June 23, 2024 22:54

iamzainhuda force-pushed the export-D54375878 branch from 6877d68 to d0cc9a6 Compare June 24, 2024 00:17

facebook-github-bot closed this in 96c7e64 Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replacing ShardedTensor with DTensor for RW sharding #2147

Replacing ShardedTensor with DTensor for RW sharding #2147

Uh oh!

iamzainhuda commented Jun 20, 2024

Uh oh!

facebook-github-bot commented Jun 20, 2024

Uh oh!

facebook-github-bot commented Jun 20, 2024

Uh oh!

facebook-github-bot commented Jun 23, 2024

Uh oh!

facebook-github-bot commented Jun 24, 2024

Uh oh!

Uh oh!

Replacing ShardedTensor with DTensor for RW sharding #2147

Replacing ShardedTensor with DTensor for RW sharding #2147

Uh oh!

Conversation

iamzainhuda commented Jun 20, 2024

Uh oh!

facebook-github-bot commented Jun 20, 2024

Uh oh!

facebook-github-bot commented Jun 20, 2024

Uh oh!

facebook-github-bot commented Jun 23, 2024

Uh oh!

facebook-github-bot commented Jun 24, 2024

Uh oh!

Uh oh!