Skip to content

Fix podman system reset deletes the podman.sock #25504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

Honny1
Copy link
Member

@Honny1 Honny1 commented Mar 7, 2025

This PR prevents the removal of the podman.sock file using the podman system reset command.

The podman system reset removes the RunDirectory directory as part of the podman machine reset, where podman.sock is usually stored.

Fixes: https://issues.redhat.com/browse/RHEL-71320

Does this PR introduce a user-facing change?

The `podman system reset` command does not remove the `podman.sock` file.

@openshift-ci openshift-ci bot added release-note do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Mar 7, 2025
@Honny1 Honny1 force-pushed the sys-reset-podman-socket branch from cc60309 to 901e4d4 Compare March 7, 2025 13:14
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh, I am not sure this is a sane expectation.

There is a fundamental issue here. There is one podman.service but there can be many separate podman instances under the same user (i.e. --root/-runroot). When you rest one it should not bring down the global socket that is totally unrelated IMO.

At least in e2e tests this is a total no go as it "breaks" user systems by doing this. The e2e tests should not touch the global podman instance.

What is exactly deleting the socket here? I don't see it in the reset code.
cc @mheon

@mheon
Copy link
Member

mheon commented Mar 7, 2025

@Luap99 When we discussed this, my understanding was that we already took down the podman system service associated with the c/storage that was about to be removed, which also removed the podman.sock socket as the service was destroyed, but that explanation does not make the most sense to me now that I think about it. It is worth investigating the exact mechanism that is deleting the socket given that.

@Honny1
Copy link
Member Author

Honny1 commented Mar 7, 2025

@Luap99 This line removes the contents of /run/user/1000/podman(for root /run/podman), where podman.sock is usually stored.

So would a better solution be to exclude the podman.sock from removal?

@Luap99
Copy link
Member

Luap99 commented Mar 7, 2025

@Luap99 This line removes the contents of /run/user/1000/podman(for root /run/podman), where podman.sock is usually stored.

Ah machine reset... I was looking at the libpod reset! It should certainly only reset files we care about.

But the raises some rather bad side effects as none of the machine code is isolated paths wise which means running e2e locally wipe my machines which they really should not.

I mean not deleting the socket is an option but I am not sure if that improves things enough, if the podman service is running while you do system reset I doubt the service can ever be in a sane state and should be restarted. I don't know if restarting the socket automatically stop and restarts the service as well.

Logically if we really want to deal with these issues we likely should start by stopping the podman.service when running system reset. And then check if we still have to restart the socket or not then.

But even with all that I am not convinced that this is right due the possibility of --root/--runroot, etc... Sure that is not a common use case but we use that heavily for testing. A system reset should then not all of the sudden touch other things.
I guess if we really want that behavior then it should only do this if --root/--runroot were not given?

@mheon
Copy link
Member

mheon commented Mar 7, 2025

If we can identify the owner of podman.sock, and if we know that it is not managed by systemd, I think it would be reasonable to skip the restart-the-service step. If it was started by systemd, we don't care if --root and the like were used; they'll be in the systemd unit, so the same config will be used. If it was not started by systemd, restarting is not reasonable, regardless of --root/--runroot - we want the service to be managed by init, not us. @Luap99 Does this sound reasonable?

@mheon
Copy link
Member

mheon commented Mar 7, 2025

Though, this raises the question: how can we tell if podman.sock is associated with the c/storage root we're purging with system reset... I guess we could add the info to podman info?

@Luap99
Copy link
Member

Luap99 commented Mar 7, 2025

Though, this raises the question: how can we tell if podman.sock is associated with the c/storage root we're purging with system reset... I guess we could add the info to podman info?

Connect to the socket via the remote bindings, then check if reset process graphroot equals our service graphroot.
Then you would still need to know how if the remote is running under systemd.


Also not here we cannot have any hard requirements on systemd. Podman must work without systemd and so far looking at this it always tries to use systemd which of course will fail on non systemd distros or in a container, etc...

I really don't want to deal with all these complexities here, I personally rather just say the user is responsible for restart the systemd unit if they use it.

@Honny1
Copy link
Member Author

Honny1 commented Mar 7, 2025

It's getting a little complicated. I suggest avoiding removing podman.sock and notifying the user to restart the service. Is this okay? @mheon @Luap99

@mheon
Copy link
Member

mheon commented Mar 7, 2025

Sounds OK to me. I'll poke @baude as it is machine reset that removes it.

@Honny1 Honny1 force-pushed the sys-reset-podman-socket branch 3 times, most recently from b4776e3 to f887f90 Compare March 10, 2025 12:18
@Honny1 Honny1 marked this pull request as ready for review March 10, 2025 13:43
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2025
@Honny1 Honny1 requested a review from Luap99 March 10, 2025 14:07
Comment on lines 114 to 132
It("system reset - restart socket", func() {
SkipIfRemote("system reset not supported on podman --remote")
SkipIfNotActive("podman.socket", "podman.socket is not active")

session := podmanTest.PodmanExitCleanly("info", "--format", "{{.Host.RemoteSocket.Path}}")
socketPath := session.OutputToString()

curlBeforeReset := StartSystemExec("curl", []string{"-s", "--max-time", "3", "--unix-socket", socketPath, "http://localhost/libpod/_ping"})
curlBeforeReset.WaitWithDefaultTimeout()
Expect(curlBeforeReset).Should(ExitCleanly())
Expect(curlBeforeReset.OutputToString()).Should(Equal("OK"))

podmanTest.PodmanExitCleanly("system", "reset", "--force")

curlAfterReset := StartSystemExec("curl", []string{"-s", "--max-time", "3", "--unix-socket", socketPath, "http://localhost/libpod/_ping"})
curlAfterReset.WaitWithDefaultTimeout()
Expect(curlAfterReset).Should(ExitCleanly())
Expect(curlAfterReset.OutputToString()).Should(Equal("OK"))
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not seem useful, it assumes the podman.service works while e2e tests generally should be fully isolated.
And the socket is disabled in all our CI envs AFAIK so the test is effectively skipped in CI thus there is no coverage either.

I don't think curl is needed either. If we really want to test then it seems enough to test if socket path exist before rest if it does it must exist after reset, so a simple
Expect(path).To(BeAnExistingFile()) should be enough.

But maybe no test works best. Or if we want a test maybe in test/system as better as we do require full ownership over the storage there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I didn't know that. I will remove test.

@Honny1 Honny1 force-pushed the sys-reset-podman-socket branch from f887f90 to d0437a3 Compare March 10, 2025 15:48
@Honny1 Honny1 added the No New Tests Allow PR to proceed without adding regression tests label Mar 10, 2025
@Honny1 Honny1 requested a review from Luap99 March 10, 2025 15:54
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Mar 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Honny1, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2025
}
for _, file := range files {
if file.Name() != excludeFile {
if err := os.RemoveAll(filepath.Join(path, file.Name())); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you use GuardedRemoveAll here for safety?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed because iteration over the contents of the directory is performed. However, the same check could be used to make sure it doesn't remove the contents of the / directory.

…mmand

The `podman system reset` removes the `RunDirectory` directory as part of the machine reset, where `podman.sock` is usually stored.

Fixes: https://issues.redhat.com/browse/RHEL-71320

Signed-off-by: Jan Rodák <[email protected]>
@Honny1 Honny1 force-pushed the sys-reset-podman-socket branch from d0437a3 to 41924f8 Compare March 10, 2025 17:44
@Honny1
Copy link
Member Author

Honny1 commented Mar 10, 2025

/packit rebuild-failed

@Honny1 Honny1 requested a review from mheon March 11, 2025 09:55
@mheon
Copy link
Member

mheon commented Mar 11, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit c6ab079 into containers:main Mar 11, 2025
89 checks passed
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Jun 10, 2025
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jun 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. No New Tests Allow PR to proceed without adding regression tests release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants