Skip to content

[torchx/specs + runner] few improvements to runner API and component fn handling #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
kiukchung opened this issue Jan 14, 2022 · 1 comment
Closed
2 of 3 tasks
Assignees
Milestone

Comments

@kiukchung
Copy link
Contributor

kiukchung commented Jan 14, 2022

[WIP] Description and Motivation and Detailed Proposal

NOTE: I'm updating this as I'm looking at the code, will remove the "[WIP]" prefix when I'm done.

Here are a few suggested improvements to the runner API and component function handling

  • Add workspace to the default Runner API as well.
    - Motivation: dryrun_component and run_component APIs in Runner vs WorkspaceRunner have different method signatures but same method name. WorkspaceRunner is a subclass of Runner.
  • Create a new module for handling component function evaluations
    - Motivation: currently this code is in torchx/specs/api.py which is more of a dataclass module than a functional module.
  • Create a DAO-like object to represent .torchxconfig -> Load the configs once in code and pass the TorchXConfig object around.
    - Motivation: historically we only loaded scheduler cfg values from .torchxconfig, but there has been asks to allow users to specify more defaults (e.g. component function arguments) from .torchxconfig. Config loading is currently done ad-hoc via methods specific to loading scheduler configs in the torchx.runner.config module, but as we add other types of configs beyond scheduler cfg, its worth loading the config as a proper config object that can be passed around the APIs.

Alternatives

N/A

Additional context/links

Links will be provided inline in the first section.

facebook-github-bot pushed a commit that referenced this issue Jan 15, 2022
Summary:
**Summary:** Makes it possible to specify component parameter defaults in `.torchxconfig`.  See changes to `.torchxconfig` files included in this diff and the *Test Plan* section for example usage and config specification.

**Motivation:** Useful UX for those using builtin components that have required params (b/c no "global" defaults exist universally and hence cannot be specified as defaults in the component function declaration) that are always static for a particular user/team's use case of the builtin

**Example:**  `image` in `dist.ddp` will in most cases be some constant for the team but no universal default exists (and hence cannot be specified in the function declaration of `dist.ddp` itself) and is cumbersome to specify it all the time in the commandline.

**Alternative:** is to copy the builtin as a separate component and hardcode (or default in the function declaration) the desired fields, but this requires the user to fork the builtin, which is sub-optimal for those in the "exploration/dev" phase and currently uninterested in productionalizing the component.

**Other Notes:** While working on this feature, I've noticed a few improvements/cleanups that we need to work on which I'm tracking as [issue-368](#368). We need to push this code in the interest of time, and I've done as much as I could to NOT change any major APIs until we address the issues properly through issue-368.

Reviewed By: aivanou

Differential Revision: D33576756

fbshipit-source-id: b65af48a570cc83c366df4eb71a8583a0be6018f
@kiukchung kiukchung added this to the 0.1.2 release milestone Jan 25, 2022
@d4l3k d4l3k modified the milestones: 0.1.2 release, 0.2.0 Release Mar 28, 2022
@kurman kurman self-assigned this May 19, 2022
kurman added a commit to kurman/torchx that referenced this issue May 24, 2022
…"specs.builders".

Summary:
Extract functional module out out `specs.api` to new `specs.builders` module.

- Diff/PR should be 100% refactoring without any functional impact.
- `parse_app_handle` could not be moved due to external usage via `specs.api` reference and will be handled separately

pytorch#368

Differential Revision: D36617402

fbshipit-source-id: 6599f4a3e76c465379f1502000ee7c61ad05fbf2
facebook-github-bot pushed a commit that referenced this issue May 26, 2022
…"specs.builders". (#499)

Summary:
Pull Request resolved: #499

Extract functional module out out `specs.api` to new `specs.builders` module.

- Diff/PR should be 100% refactoring without any functional impact.
- `parse_app_handle` could not be moved due to external usage via `specs.api` reference and will be handled separately

#368

Reviewed By: d4l3k

Differential Revision: D36617402

fbshipit-source-id: 563aa15aa354244d09a8eb32da7f5275e7c9ade0
kurman added a commit to kurman/torchx that referenced this issue Jun 2, 2022
Summary:
Extract generic config module and decouple storage concerns from configuration data.

- This should allow adding new configuration options besides INI
- All the client code should work before, but we are moving away from runner specific logic.

Github issue (3rd task) pytorch#368

Differential Revision:
D36855026

LaMa Project: L1123150

fbshipit-source-id: 65ee2f7c6d118e19430ba8f1b13052e7668694c6
kurman added a commit to kurman/torchx that referenced this issue Jun 7, 2022
Summary:
Pull Request resolved: pytorch#508

Extract generic config module and decouple storage concerns from configuration data.

- This should allow adding new configuration options besides INI
- All the client code should work before, but we are moving away from runner specific logic.

Github issue (3rd task) pytorch#368

Differential Revision:
D36855026

LaMa Project: L1123150

fbshipit-source-id: 3562bf258270c661354ea58264cb099037b1d984
kurman added a commit to kurman/torchx that referenced this issue Jun 7, 2022
Summary:
Pull Request resolved: pytorch#508

Extract generic config module and decouple storage concerns from configuration data.

- This should allow adding new configuration options besides INI
- All the client code should work before, but we are moving away from runner specific logic.

Github issue (3rd task) pytorch#368

Differential Revision:
D36855026

LaMa Project: L1123150

fbshipit-source-id: 9be2d410eb28b7443dcb85aaf9875869499a7782
@kurman
Copy link
Contributor

kurman commented Jun 9, 2022

Will create separate RFC for the last task

  • Changes are breaking and will require proper review process
  • We will benefit from design review as well since we will change the implementation.

@kurman kurman closed this as completed Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants