Description
Hey everyone!
In my org, we're using Digital Ocean Managed Postgre, and we're having a database setup of 1 primary and 2 standbys. Digital Ocean does not provide hosts for all 3 nodes, just one host that always points to the primary node (something like XXX.a.db.ondigitalocean.com). Sometimes, by design, primary failover happens, so that primary node becomes secondary and some secondary becomes primary.
Problem is, when that happens, connections seem to remain open with the node that was primary and just became secondary. Those connections now forever start erroring out with write tcp YYY->ZZZ: write: connection timed out
, and never recover.
Mitigation I came up with was db.SetConnMaxLifetime(time.Minute)
, but that's not ideal. Is there any better way around this problem at the moment?
PS: I saw a similar issue here #683, but I don't think it applies to our problem, as we do not have multiple hosts provided, just one host string, and that host points to the current primary.