Skip to content

Stop leaving sockets in CLOSE_WAIT on failed TLS connections #3634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2025

Conversation

jes
Copy link
Contributor

@jes jes commented Apr 24, 2025

Summary

Prevent outbound TLS sockets from being left in CLOSE_WAIT after failed handshakes.

Details

When an outbound TLS connection fails during the handshake (e.g. due to missing certificates), a file descriptor sticks around without being closed, and ends up stuck in CLOSE_WAIT.

Although tcp_conn_release() is called, it does not actually close the file descriptor. In other paths this is handled explicitly with close(fd) if the connection's proc_id differs from the current process. This fix follows the same pattern.

Testing confirmed that TLS sockets now close promptly on failure and that working TLS connections continue to function correctly.

Solution

Add a conditional close(fd) in proto_tls_send() after calling tcp_conn_release(), using the pattern already present elsewhere: only close if c->proc_id != process_no.

Compatibility

I don't think this change breaks any other scenarios.

We've been running this in prod for over a year, I unfortunately missed creating a PR for it until now.

@bogdan-iancu bogdan-iancu removed the request for review from liviuchircu April 24, 2025 11:17
@@ -622,6 +622,9 @@ static int proto_tls_send(const struct socket_info* send_sock,
return rlen;
con_release:
sh_log(c->hist, TCP_SEND2MAIN, "send 1, (%d)", c->refcnt);
/* close the fd if this process is not meant to own it */
if (c->proc_id != process_no)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jes , are you sure about this fix? the con_release label is jumped (from above) only if a connection was not already found, so the connection was locally (to the process) started. If the conn (and fd) are local to the process, I assume c->proc_id == process_no, right? so the code you added will be all the time false, not executed. Or maybe I'm missing something here :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worked on this a long time ago but in my notes it says that c->proc_id was 0 in the offending cases. So maybe the problem is just that c->proc_id wasn't getting set correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jes , I had to time to dig a bit more into this. And the only valid case (according to your description and also to the code), is this one here https://github.com/OpenSIPS/opensips/blob/master/modules/proto_tls/proto_tls.c#L556, when the TCP conn is created (and we have a fd), but the TSL init fails. Here we should do the close on the fd, before jumping to con_release. IMHO this is the proper fix, meaning the fix in the right spot.

Copy link

Any updates here? No progress has been made in the last 30 days, marking as stale.

@github-actions github-actions bot added the stale label May 25, 2025
@jes
Copy link
Contributor Author

jes commented May 25, 2025

No updates here.

@stale stale bot removed the stale label May 25, 2025
@bogdan-iancu bogdan-iancu self-assigned this Jun 12, 2025
@bogdan-iancu bogdan-iancu merged commit fde3ff3 into OpenSIPS:master Jun 25, 2025
72 checks passed
bogdan-iancu added a commit that referenced this pull request Jun 25, 2025
Stop leaving sockets in CLOSE_WAIT on failed TLS connections

(cherry picked from commit fde3ff3)
bogdan-iancu added a commit that referenced this pull request Jun 25, 2025
Stop leaving sockets in CLOSE_WAIT on failed TLS connections

(cherry picked from commit fde3ff3)
bogdan-iancu added a commit that referenced this pull request Jun 25, 2025
Stop leaving sockets in CLOSE_WAIT on failed TLS connections

(cherry picked from commit fde3ff3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants