Skip to content

Move PJRT Python APIs out of torch_xla.experimental.* #5011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 6, 2023

Conversation

will-cromar
Copy link
Collaborator

@will-cromar will-cromar commented May 15, 2023

Reorganize the experimental PJRT Python APIs.

  • Add new _internal module for APIs that are well-tested, but likely to change. I moved device-specific logic here, since I expect to rework it in the near future. All of these functions are mainly used for framework development. In general, users shouldn't have to call them directly.
  • Most functionality that interacts directly with the runtime is moved into the new torch.runtime module.
  • Added deprecation module to register deprecated aliases for all public functions that are moving out into other parts of of torch_xla.
  • Print a warning once per deprecated function. For example:
>>> from torch_xla.experimental import tpu
>>> tpu.num_available_chips()
WARNING:root:torch_xla.experimental.tpu.num_available_chips is deprecated. Use torch_xla._internal.tpu.num_available_chips instead.
4
>>> tpu.num_available_chips()
4
>>> tpu.version()
WARNING:root:torch_xla.experimental.tpu.version is deprecated. Use torch_xla._internal.tpu.version instead.
4
  • Update references across the repository.

Summary of new modules:

  • torch_xla.runtime
  • torch_xla._internal.tpu
  • torch_xla._internal.gpu
  • torch_xla._internal.pjrt

@will-cromar will-cromar changed the title [WIP] Move PJRT Python APIs out of torch_xla.experimental.* Move PJRT Python APIs out of torch_xla.experimental.* May 18, 2023
@will-cromar will-cromar requested a review from JackCaoG May 18, 2023 17:43
@will-cromar will-cromar marked this pull request as ready for review May 18, 2023 17:45
@@ -11,14 +11,15 @@
import torch_xla.core.xla_env_vars as xenv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we renamed these experimental files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

# TODO(wcromar): Detect GPU device too


def device_type() -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have similar functions in torch_xla.core.xla_model, do we want to do some clean up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to take this chance to do some clean up. I am always confuse what function to call for local ordinal, gloabal ordinal, worla_size etc and what do they really mean in a pod context. If we can restructure those api a bit and maybe have those apis in this runtime instead that would be nice...(random idea, might need more thinking)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xla_model becomes a kitch sink and we put random things in it, if we can move all runtime related bits in this file it is actually nicer..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline. We'll start to move APIs that interact directly with the runtime to a new module, and leave any modeling-related APIs in xla_model. I moved the PJRT version of rendezvous back to xla_model, and the old rendezvous is will be an alias of that implementation when PJRT is enabled.

@will-cromar will-cromar force-pushed the wcromar/stable-pjrt-api branch from 8fa857f to 660a4cf Compare May 24, 2023 21:30
@will-cromar will-cromar force-pushed the wcromar/stable-pjrt-api branch 2 times, most recently from 93def62 to de2f117 Compare June 1, 2023 21:22
@will-cromar will-cromar requested a review from JackCaoG June 1, 2023 21:23
return

logging.warning(
'XRT configuration not detected. Defaulting to preview PJRT '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to change preview to stable here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

Copy link
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but prefer to merge on Monday

@will-cromar will-cromar force-pushed the wcromar/stable-pjrt-api branch from dc1130e to f11be83 Compare June 6, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants