Memory Leak on EKS Clusters

**Description**

We run Cosign on our AWS EKS Clusters to verify signatures of our in-house images. Over a couple of days, we see the memory being used by these pods increase significantly. 

![Image](https://github.com/user-attachments/assets/795a22b8-33f5-431a-a75a-be5741437467)

When the memory hits the allocated requested memory for the pod we start seeing errors for failed verifications and the pod continuously restarts. The following log entries are continuously generated by the pods

```
[INFO] Webhook ServeHTTP request=&http.Request{Method:"POST", URL:(*url.URL)(0xc0182243f0), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept":[]string{"application/json, */*"}, "Accept-Encoding":[]string{"gzip"}, "Content-Length":[]string{"21818"}, "Content-Type":[]string{"application/json"}, "User-Agent":[]string{"kube-apiserver-admission"}}, Body:(*http.body)(0xc03e010980), GetBody:(func() (io.ReadCloser, error))(nil), ContentLength:21818, TransferEncoding:[]string(nil), Close:false, Host:"webhook.cosign-system.svc:443", Form:url.Values(nil), PostForm:url.Values(nil), MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"10.2.0.6:39576", RequestURI:"/mutations?timeout=25s", TLS:(*tls.ConnectionState)(0xc0070c6d80), Cancel:(<-chan struct {})(nil), Response:(*http.Response)(nil), Pattern:"/mutations", ctx:(*context.cancelCtx)(0xc001e362d0), pat:(*http.pattern)(0xc000765200), matches:[]string(nil), otherValues:map[string]string(nil)}

[INFO] remote admission controller audit annotations=map[string]string(nil)

[ERROR] Failed the resource specific validation

[WARN] Failed to validate at least one policy for <IMAGE-URI> wanted 1 policies, only validated 0

[ERROR] error validating signatures: Get "https://<AccountID>.dkr.ecr.eu-west-1.amazonaws.com/v2/": context canceled
```

Manually killing the pod is the only way to fix this as the newly created one will have much lower memory consumption but over a couple of days this process repeats it self. 

We thought this might be related to the webhook timeout being too short, but we recently increased this to 15 seconds and it hasn't helped. 

We think this behaviour indicates a memory leak in the Cosign application.

**Version**

Policy Controller Version 0.12.0, Chart Version 0.9.1, Our EKS clusters are running k8s v1.30


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory Leak on EKS Clusters #1820

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory Leak on EKS Clusters #1820

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions