Skip to content

[feature] Support different PS/worker types #1369

Closed
@gaocegege

Description

@gaocegege

In some customer cases, users want to schedule one PS for one GPU machine, and place other PSes in CPU machines, like this:

  tfReplicaSpecs:
    PS-1:
      replicas: 3
      template:
        spec:
          podAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                 matchExpressions:
                 - key: gpu-type
                    operator: In
                    values:
                    - true
               topologyKey: topology.kubernetes.io/zone
          containers:
            - name: tensorflow
              image: xxx
    PS-2:
      replicas: 5
      template:
        spec:
          podAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                 matchExpressions:
                 - key: gpu-type
                    operator: In
                    values:
                    - false
               topologyKey: topology.kubernetes.io/zone
          containers:
            - name: tensorflow
              image: xxx

/cc @zw0610

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions