You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue:
The main issue is that after a very large number of files are created in the emptyDir, the kubelet on that node is unable to restart. The service fails due to an "error" in restorecon, which is executed as a PreStart dependency in the kubelet.service unit
Initially, I used git clone inside a container, which writes files to an emptyDir. However, I discovered that the problem wasn't related to git clone itself but rather the large number of files appearing in the emptyDir. After all files are created in the container, I enter with ssh into the node where the emptyDir was mounted and attempt to restart the kubelet. Every time, the restart fails, and the service logs only mention SELinux denials for files created in the container.
I’ve determined that the kubelet’s ability to restart is dependent on how fast the node’s hardware is. Slower nodes fail when trying to process around 400,000 files. Faster nodes handle that, but even they fail when the file count reaches 900,000.
Log into the node and execute the following command:
systemctl restart kubelet.service
The result should be that the kubelet fails to start due to issues with the directory containing the data.
Example kubelet service log (in practice, this is just one repeating log for various files):
May 07 08:23:58 worker-4.dev.example.com restorecon[53570]: /var/lib/kubelet/pods/f61afe9e-7fc3-413c-8f61-xd41affe9f73/volumes/kubernetes.io~empty-dir/repo-storage/files/file_264137.txt not reset as customized by admin to system_u:object_r:container_file_t:s0:c5,c35
Troubleshooting performed:
I verified multiple times that the SELinux context on the files is correct and consistent. I compared emptyDirs from containers that succeed and fail, using:
I checked the security contexts for the emptyDir that causes the issue and one that does not perform the git clone and does not cause any issue. Both directories and files within these directories had exactly the same security contexts, verified using:
ls -lZa
All files in this directory have the same security context (the same one set in the pod under spec.securitycontext.selinuxoptions). I verified this as follows:
The kubelet should be able to restart regardless of how many files appear in an emptyDir. Files with valid SELinux policies should not interfere with the restart process, even when their count is extremely high.
The text was updated successfully, but these errors were encountered:
Diegunio
changed the title
Git clone executed in container, makes kubelet unable to start
Spawning hundreds of thousands files in emptyDir makes kubelet unable to restart
May 8, 2025
Uh oh!
There was an error while loading. Please reload this page.
Issue:
The main issue is that after a very large number of files are created in the emptyDir, the kubelet on that node is unable to restart. The service fails due to an "error" in restorecon, which is executed as a PreStart dependency in the kubelet.service unit
Initially, I used git clone inside a container, which writes files to an emptyDir. However, I discovered that the problem wasn't related to git clone itself but rather the large number of files appearing in the emptyDir. After all files are created in the container, I enter with ssh into the node where the emptyDir was mounted and attempt to restart the kubelet. Every time, the restart fails, and the service logs only mention SELinux denials for files created in the container.
I’ve determined that the kubelet’s ability to restart is dependent on how fast the node’s hardware is. Slower nodes fail when trying to process around 400,000 files. Faster nodes handle that, but even they fail when the file count reaches 900,000.
Version:
UPI 4.18.0-okd-scos.8
registry.ci.openshift.org/origin/release-scos@sha256:de900611bc63fa24420d5e94f3647e4d650de89fe54302444e1ab26a4b1c81c6
Issue Behavior:
The issue always occurs and can be reproduced every time.
How to reporduce:
Example of container that spawns many files
The result should be that the kubelet fails to start due to issues with the directory containing the data.
Example kubelet service log (in practice, this is just one repeating log for various files):
Troubleshooting performed:
I verified multiple times that the SELinux context on the files is correct and consistent. I compared emptyDirs from containers that succeed and fail, using:
I checked the security contexts for the emptyDir that causes the issue and one that does not perform the git clone and does not cause any issue. Both directories and files within these directories had exactly the same security contexts, verified using:
ls -lZa
lsattr:
getfattr -d -m -:
All files in this directory have the same security context (the same one set in the pod under spec.securitycontext.selinuxoptions). I verified this as follows:
I tried using spec.volumes.emptydir.medium: Memory in the deployment definition, but the issue still occurred.
I set the most restrictive possible security context in the pod definition, with the SCC set to restricted-v2.
Pod-level securityContext:
initContainer securityContext:
container securityContext:
Expected behavior:
The kubelet should be able to restart regardless of how many files appear in an emptyDir. Files with valid SELinux policies should not interfere with the restart process, even when their count is extremely high.
The text was updated successfully, but these errors were encountered: