Skip to content

pod can not scheduler ,no event,kube-scheduler panic #853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 4 tasks
13567436138 opened this issue Jan 12, 2025 · 4 comments · May be fixed by #910
Open
1 of 4 tasks

pod can not scheduler ,no event,kube-scheduler panic #853

13567436138 opened this issue Jan 12, 2025 · 4 comments · May be fixed by #910
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@13567436138
Copy link

Area

  • Scheduler
  • Controller
  • Helm Chart
  • Documents

Other components

No response

What happened?

- schedulerName: trimaran-scheduler
  plugins:
    score:
      disabled:
      - name: NodeResourcesBalancedAllocation
      - name: NodeResourceFit
      enabled:
       - name: LoadVariationRiskBalancing
  pluginConfig:
  - name: LoadVariationRiskBalancing
    args:
      safeVarianceMargin: 1
      safeVarianceSensitivity: 2
      metricProvider:
        type: KubernetesMetricsServer
        address: https://metrics-server.kube-system.svc.cluster.local:443
        token: eyJhbGciOiJSUzI1NiIsImtpZCI6IkQzTndMYkFBSnhLcDJLYTBER1ZwSlVxT1RBMlFYbVd2QXZLZFJzTGV6RFkifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzM2NjYxMTc2LCJpYXQiOjE3MzY2NTc1NzYsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiYTQ2ZGY0YzYtYjY0ZS00ZGU2LTllNDMtMzVhN2FiOGRkMmM3Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJtZXRyaWNzLXNlcnZlciIsInVpZCI6ImU4YjQxZmY4LTZiZDAtNDdiYS1iZTc0LTkxN2RlNzdjM2RhZSJ9fSwibmJmIjoxNzM2NjU3NTc2LCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06bWV0cmljcy1zZXJ2ZXIifQ.Pm8lzu1IZHH2n4uD9NOiqG9POw1WNTcFJiGcSsydEATS7MSZDsc7UPs70RZQr3rFI2tva4u1mfy_buiOq8bpUDxx7JCgtiCUI3hTV-gybFlPACrFVNzkv8ulvXisBU24wOgHifFITeiy3KZxZZi9E3ILDMA7j6g4DdxBB3iMTk2k3QB2b9xvYCGoYS4OdMRmO8FABOcpH5vccrt_ucKGl5D9-RdIZiowerQRyYxcO0lOdPbc3mMh6j3QOzlZeg9E1q_Ez1-0UMTYR3qTcIjNaUtW8E6Ay8NG_CbYBLT6Jk4nVmSJug_XbYeDw5Dsy_0kpekHmrBkCjt0xFXaFhJR0Q

W0112 06:23:27.624803 12 client_config.go:664] error creating inClusterConfig, falling back to default config: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x34ce94c]

goroutine 1 [running]:
github.com/paypal/load-watcher/pkg/watcher.(*Watcher).GetLatestWatcherMetrics(0x0, {0x3ccae5e, 0x3})
        /root/go/pkg/mod/github.com/paypal/[email protected]/pkg/watcher/watcher.go:191 +0x4c
github.com/paypal/load-watcher/pkg/watcher/api.libraryClient.GetLatestWatcherMetrics({{0x0, 0x0}, 0x0})
        /root/go/pkg/mod/github.com/paypal/[email protected]/pkg/watcher/api/client.go:78 +0x5b
sigs.k8s.io/scheduler-plugins/pkg/trimaran.(*Collector).updateMetrics(0xc0001368c0, {{0x3f9bbe8, 0xc000906dc0}, 0x0})
        /root/scheduler-plugins-master/pkg/trimaran/collector.go:134 +0x6e
sigs.k8s.io/scheduler-plugins/pkg/trimaran.NewCollector({{0x3f9bbe8, 0xc000906dc0}, 0x0}, 0xc0009a71a0)
        /root/scheduler-plugins-master/pkg/trimaran/collector.go:77 +0x65f
sigs.k8s.io/scheduler-plugins/pkg/trimaran/loadvariationriskbalancing.New({0x3f95e38, 0xc000919d10}, {0x3f6ffe8, 0xc0009a7180}, {0x3fbbd80, 0xc00088a488})
        /root/scheduler-plugins-master/pkg/trimaran/loadvariationriskbalancing/loadvariationriskbalancing.go:63 +0x18d
k8s.io/kubernetes/pkg/scheduler/framework/runtime.NewFramework({0x3f95e38, 0xc000919d10}, 0xc00096fc80, 0xc000c8e7c8, {0xc00060f1e0, 0xc, 0x16})
        /root/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/framework/runtime/framework.go:338 +0x10c9
k8s.io/kubernetes/pkg/scheduler/profile.newProfile({0x3f95e38, 0xc000919d10}, {{0xc0001a98c0, 0x12}, 0x0, 0xc0001b1188, {0xc00040c700, 0x8, 0x8}}, 0xc00096fc80, ...)
        /root/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/profile/profile.go:42 +0x174
k8s.io/kubernetes/pkg/scheduler/profile.NewMap({0x3f95e38, 0xc000919d10}, {0xc0009064c0, 0x1, 0x1}, 0xc00096fc80, 0xc00096a810, {0xc000c8ed70, 0xb, 0xb})
        /root/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/profile/profile.go:55 +0x22e
k8s.io/kubernetes/pkg/scheduler.New({0x3f95e38, 0xc000919d10}, {0x3fc7348, 0xc000a00540}, {0x3fbc3b8, 0xc000136460}, {0x3f96940, 0xc0009a4d80}, 0xc00096a810, {0xc000c8f348, ...})
        /root/go/pkg/mod/k8s.io/[email protected]/pkg/scheduler/scheduler.go:304 +0xd37
k8s.io/kubernetes/cmd/kube-scheduler/app.Setup({0x3f95e38, 0xc000919d10}, 0xc0005d3170, {0xc000351570, 0xe, 0xe})
        /root/go/pkg/mod/k8s.io/[email protected]/cmd/kube-scheduler/app/server.go:413 +0xb49
k8s.io/kubernetes/cmd/kube-scheduler/app.runCommand(0xc0001f2308, 0xc0005d3170, {0xc000351570, 0xe, 0xe})
        /root/go/pkg/mod/k8s.io/[email protected]/cmd/kube-scheduler/app/server.go:153 +0x385
k8s.io/kubernetes/cmd/kube-scheduler/app.NewSchedulerCommand.func2(0xc0001f2308, {0xc0005ff800, 0x0, 0x6})
        /root/go/pkg/mod/k8s.io/[email protected]/cmd/kube-scheduler/app/server.go:103 +0x77
github.com/spf13/cobra.(*Command).execute(0xc0001f2308, {0xc000050080, 0x6, 0x6})
        /root/go/pkg/mod/github.com/spf13/[email protected]/command.go:985 +0xfca
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001f2308)
        /root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117 +0x9d0
github.com/spf13/cobra.(*Command).Execute(0xc0001f2308)
        /root/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041 +0x32
k8s.io/component-base/cli.run(0xc0001f2308)
        /root/go/pkg/mod/k8s.io/[email protected]/cli/run.go:143 +0x36f
k8s.io/component-base/cli.Run(0xc0001f2308)
        /root/go/pkg/mod/k8s.io/[email protected]/cli/run.go:44 +0x45
main.main()
        /root/scheduler-plugins-master/cmd/scheduler/main.go:69 +0x4c5
2025-01-12T06:24:25Z debug layer=debugger detaching
2025-01-12T06:24:25Z debug layer=debugger detaching
root@k8s-master01:/etc/kubernetes/manifests# cat kube-scheduler.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - dlv 
    - exec
    - /bin/scheduler
    - --log
    - --listen=0.0.0.0:40000 
    - --headless
    - --api-version=2
    #  - --continue
    - --accept-multiclient
    - --
    - --config=/etc/kubernetes/sched-cc.yaml
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --v=2
    image: registry.cn-hangzhou.aliyuncs.com/hxpdocker/scheduler-plugins:v1.31 
    imagePullPolicy: Always
#    livenessProbe:
#      failureThreshold: 8
#      httpGet:
#        host: 127.0.0.1
#        path: /healthz
#        port: 10259
#        scheme: HTTPS
#      initialDelaySeconds: 10
#      periodSeconds: 10
#      timeoutSeconds: 15
    name: kube-scheduler
    resources:
      requests:
        cpu: 100m
#    startupProbe:
#      failureThreshold: 24
#      httpGet:
#        host: 127.0.0.1
#        path: /healthz
#        port: 10259
#        scheme: HTTPS
#      initialDelaySeconds: 10
#      periodSeconds: 10
#      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /etc/kubernetes/sched-cc.yaml
      name: sched-cc
      readOnly: true
    - mountPath: /etc/kubernetes/scheduler.conf
      name: kubeconfig
      readOnly: true
  hostNetwork: true
  priority: 2000001000
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/sched-cc.yaml
      type: FileOrCreate
    name: sched-cc
  - hostPath:
      path: /etc/kubernetes/scheduler.conf
      type: FileOrCreate
    name: kubeconfig
root@k8s-master01:/etc/kubernetes# kubectl describe pod nginx-74df8bd4ff-xdzr8 
Name:             nginx-74df8bd4ff-xdzr8
Namespace:        default
Priority:         0
Service Account:  default
Node:             <none>
Labels:           app=nginx
                  pod-template-hash=74df8bd4ff
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/nginx-74df8bd4ff
Containers:
  nginx:
    Image:      registry.cn-hangzhou.aliyuncs.com/hxpdocker/nginx:latest
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:  300m
    Requests:
      cpu:        300m
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fhmcf (ro)
Volumes:
  kube-api-access-fhmcf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>
root@k8s-master01:/etc/kubernetes# kubectl get pod -owide
NAME                     READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
nginx-74df8bd4ff-xdzr8   0/1     Pending   0          5m52s   <none>   <none>   <none>           <none>
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      schedulerName: trimaran-scheduler
      containers:
        - name: nginx
          image: registry.cn-hangzhou.aliyuncs.com/hxpdocker/nginx:latest
          resources:
            limits:
              cpu: 0.3
            requests:
              cpu: 0.3
EOF

What did you expect to happen?

pod can scheduler

How can we reproduce it (as minimally and precisely as possible)?

No response

Anything else we need to know?

No response

Kubernetes version

1.31

Scheduler Plugins version

master
@13567436138 13567436138 added the kind/bug Categorizes issue or PR as related to a bug. label Jan 12, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 12, 2025
@togettoyou
Copy link

/assign

@togettoyou
Copy link

Hi @13567436138

The panic issue has been fixed in #910

As for the scheduling issue you mentioned, I’ve verified that everything works as expected. You can use the following configuration to validate it yourself:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: trimaran-scheduler
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: trimaran-scheduler
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: trimaran-scheduler
    namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: trimaran-scheduler
  namespace: kube-system
data:
  config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
      - schedulerName: trimaran-scheduler
        plugins:
          score:
            disabled:
              - name: NodeResourcesBalancedAllocation
              - name: NodeResourcesFit
            enabled:
              - name: LoadVariationRiskBalancing
        pluginConfig:
          - name: LoadVariationRiskBalancing
            args:
              metricProvider:
                type: KubernetesMetricsServer
              safeVarianceMargin: 1
              safeVarianceSensitivity: 2
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: trimaran-scheduler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: trimaran-scheduler
  template:
    metadata:
      labels:
        app: trimaran-scheduler
    spec:
      containers:
        - name: kube-scheduler
          image: registry.cn-hangzhou.aliyuncs.com/hxpdocker/scheduler-plugins:v1.31
          imagePullPolicy: IfNotPresent
          command:
            - /bin/scheduler
            - --config=/etc/kubernetes/config.yaml
          volumeMounts:
            - name: config-volume
              mountPath: /etc/kubernetes
      serviceAccountName: trimaran-scheduler
      volumes:
        - name: config-volume
          configMap:
            name: trimaran-scheduler

Everything is working as expected:

$ kubectl -n kube-system logs -f trimaran-scheduler-7cd88d4d5f-mz4cd
I0521 06:50:42.691498       1 serving.go:386] Generated self-signed cert in-memory
W0521 06:50:42.692420       1 client_config.go:659] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0521 06:50:43.295056       1 client_config.go:659] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2025-05-21T06:50:43Z" level=info msg="Started watching metrics"
I0521 06:50:43.317921       1 server.go:167] "Starting Kubernetes Scheduler" version="v0.0.0-master+$Format:%H$"
I0521 06:50:43.317970       1 server.go:169] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0521 06:50:43.323685       1 requestheader_controller.go:172] Starting RequestHeaderAuthRequestController
I0521 06:50:43.323739       1 shared_informer.go:313] Waiting for caches to sync for RequestHeaderAuthRequestController
I0521 06:50:43.323735       1 configmap_cafile_content.go:205] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0521 06:50:43.323918       1 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0521 06:50:43.323747       1 configmap_cafile_content.go:205] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0521 06:50:43.323957       1 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0521 06:50:43.324172       1 secure_serving.go:213] Serving securely on [::]:10259
I0521 06:50:43.324268       1 tlsconfig.go:243] "Starting DynamicServingCertificateController"
I0521 06:50:43.424109       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0521 06:50:43.424121       1 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0521 06:50:43.424141       1 shared_informer.go:320] Caches are synced for RequestHeaderAuthRequestController

The scheduling events of the pod are as follows:

$ kubectl describe pod nginx-6c8c6c4544-trhjz
Name:             nginx-6c8c6c4544-trhjz
Namespace:        default
......
Events:
  Type    Reason     Age    From                Message
  ----    ------     ----   ----                -------
  Normal  Scheduled  3m50s  trimaran-scheduler  Successfully assigned default/nginx-6c8c6c4544-trhjz to docker-desktop
  Normal  Pulling    3m49s  kubelet             Pulling image "nginx"
  Normal  Pulled     3m47s  kubelet             Successfully pulled image "nginx" in 2.565766563s (2.565801711s including waiting)
  Normal  Created    3m47s  kubelet             Created container nginx
  Normal  Started    3m47s  kubelet             Started container nginx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants