-
Notifications
You must be signed in to change notification settings - Fork 606
Stop leaving sockets in CLOSE_WAIT on failed TLS connections #3634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -622,6 +622,9 @@ static int proto_tls_send(const struct socket_info* send_sock, | |||
return rlen; | |||
con_release: | |||
sh_log(c->hist, TCP_SEND2MAIN, "send 1, (%d)", c->refcnt); | |||
/* close the fd if this process is not meant to own it */ | |||
if (c->proc_id != process_no) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jes , are you sure about this fix? the con_release
label is jumped (from above) only if a connection was not already found, so the connection was locally (to the process) started. If the conn (and fd) are local to the process, I assume c->proc_id == process_no
, right? so the code you added will be all the time false, not executed. Or maybe I'm missing something here :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worked on this a long time ago but in my notes it says that c->proc_id
was 0 in the offending cases. So maybe the problem is just that c->proc_id
wasn't getting set correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jes , I had to time to dig a bit more into this. And the only valid case (according to your description and also to the code), is this one here https://github.com/OpenSIPS/opensips/blob/master/modules/proto_tls/proto_tls.c#L556, when the TCP conn is created (and we have a fd), but the TSL init fails. Here we should do the close on the fd, before jumping to con_release
. IMHO this is the proper fix, meaning the fix in the right spot.
Any updates here? No progress has been made in the last 30 days, marking as stale. |
No updates here. |
Stop leaving sockets in CLOSE_WAIT on failed TLS connections (cherry picked from commit fde3ff3)
Stop leaving sockets in CLOSE_WAIT on failed TLS connections (cherry picked from commit fde3ff3)
Stop leaving sockets in CLOSE_WAIT on failed TLS connections (cherry picked from commit fde3ff3)
Summary
Prevent outbound TLS sockets from being left in
CLOSE_WAIT
after failed handshakes.Details
When an outbound TLS connection fails during the handshake (e.g. due to missing certificates), a file descriptor sticks around without being closed, and ends up stuck in
CLOSE_WAIT
.Although
tcp_conn_release()
is called, it does not actually close the file descriptor. In other paths this is handled explicitly withclose(fd)
if the connection'sproc_id
differs from the current process. This fix follows the same pattern.Testing confirmed that TLS sockets now close promptly on failure and that working TLS connections continue to function correctly.
Solution
Add a conditional
close(fd)
inproto_tls_send()
after callingtcp_conn_release()
, using the pattern already present elsewhere: only close ifc->proc_id != process_no
.Compatibility
I don't think this change breaks any other scenarios.
We've been running this in prod for over a year, I unfortunately missed creating a PR for it until now.