-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Include error messages for stale primary #1714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR propagates specific error messages when marking primaries as stale and extends server invalidation methods to accept a Throwable
cause.
- Added
Throwable cause
parameters toresetToConnecting
/invalidate
methods in Server interfaces and implementations - Updated
AbstractMultiServerCluster
to pass detailed exception messages when detecting stale primaries - Enhanced unit tests to assert exception message propagation in
ServerDiscoveryAndMonitoringTest
and adaptedTestServer
behavior
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
driver-core/src/test/unit/com/mongodb/internal/connection/TestServer.java | Updated invalidate /resetToConnecting to include an exception in the ServerDescription |
driver-core/src/test/unit/com/mongodb/internal/connection/ServerDiscoveryAndMonitoringTest.java | Added assertions for expected exception message when "error" key is present |
driver-core/src/test/unit/com/mongodb/internal/connection/DefaultServerSpecification.groovy | Invoke invalidate(Throwable) in tests instead of parameterless call |
driver-core/src/main/com/mongodb/internal/connection/LoadBalancedServer.java | Updated no-op resetToConnecting /invalidate signatures to accept a Throwable |
driver-core/src/main/com/mongodb/internal/connection/DefaultServer.java | Propagate cause into SDAM update and use SdamIssue.specific for invalidate |
driver-core/src/main/com/mongodb/internal/connection/ClusterableServer.java | Extended interface methods to accept a Throwable cause |
driver-core/src/main/com/mongodb/internal/connection/AbstractMultiServerCluster.java | Invalidate stale primaries with specific exception messages via new cause parameter |
Comments suppressed due to low confidence (2)
driver-core/src/test/unit/com/mongodb/internal/connection/ServerDiscoveryAndMonitoringTest.java:120
- [nitpick] Consider adding an
else
branch to explicitly assert that no exception is present when the "error" key is absent, preventing unexpected exceptions from slipping through tests.
if(expectedServerDescriptionDocument.containsKey("error")){
driver-core/src/test/unit/com/mongodb/internal/connection/DefaultServerSpecification.groovy:139
- Using a concrete TestServerListener instead of a Mock prevents interaction verification. Use a Mock or Spy to allow verifying serverDescriptionChanged invocation.
def serverListener = new TestServerListener()
driver-core/src/main/com/mongodb/internal/connection/AbstractMultiServerCluster.java
Outdated
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/internal/connection/AbstractMultiServerCluster.java
Outdated
Show resolved
Hide resolved
driver-core/src/main/com/mongodb/internal/connection/AbstractMultiServerCluster.java
Outdated
Show resolved
Hide resolved
* @see #specific(Throwable, Context) | ||
*/ | ||
static SdamIssue unspecified(final Context context) { | ||
return new SdamIssue(null, assertNotNull(context)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, an unspecified()
was used in a case where the primary was marked as stale due to the discovery of a newer primary. This behavior was handled in SdamDescriptionManager
by updating the server’s description to UNKNOWN
and initiating a reconnect:
Lines 131 to 134 in 4f8217a
} else if (sdamIssue.relatedToWriteConcern() || !sdamIssue.specific()) { | |
updateDescription(sdamIssue.serverDescription()); | |
serverMonitor.connect(); | |
} |
This behavior aligns with the following section of the SDAM specification:
“A note on invalidating the old primary: when a new primary is discovered, the client finds the previous primary (there should be none or one) and replaces its description with a default ServerDescription of type Unknown. Additionally, the error field of the new ServerDescription object MUST include a descriptive error explaining that it was invalidated because the primary was determined to be stale. A multi-threaded client MUST request an immediate check for that server as soon as possible.”
The unspecified()
method previously implied a transition to UNKNOWN
without providing an error. As this is no longer consistent with the current spec - which requires an explanatory error in such transitions - the method has been removed.
Relevant spec changes:
SDAM Error Field Documentation
Discussion in Spec PR #1729
@@ -56,7 +56,7 @@ final class DefaultSdamServerDescriptionManager implements SdamServerDescription | |||
} | |||
|
|||
@Override | |||
public void update(final ServerDescription candidateDescription) { | |||
public void monitorUpdate(final ServerDescription candidateDescription) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original update()
method performed two checks:
- Checks if the server is either a data-bearing connection or a direct connection. This aligns with the Connection Pool Management section of the SDAM spec.
- Checks if the ServerDescription contains an exception, and invalidates the connection pool accordingly. This corresponds to the Error Handling section under Server Monitoring.
Although the spec specifies handling only network and command errors, the implementation currently checks for anyThrowable
.
These two behaviors conform to the spec requirements for server monitor behavior:
- The first check is applied
after a successful server check
. - The second check applies
when a server check fails
.
The update()
method was invoked from two places:
ServerMonitor
, which is expected and directly supports the server monitor responsibilities outlined in the spec.DefaultServer.resetToConnecting()
, which is used in one specific case:when an RSPrimary is discovered with a stale electionId/setVersion
. In that case,update()
was called to transition the server to anUNKNOWN
state. However, neither check (1 nor 2) was triggered, so the method merely updated the description.
This approach worked until the introduction of a new change where the RSPrimary with a stale electionId/setVersion
case began passing an explicit exception, which unintentionally triggered a side effect of connection pool invalidation.
To address this and maintain separation of concerns, the logic has now been split:
monitorUpdate()
: now encapsulates the server monitor-specific logic (checks 1 and 2).
update()
: is simplified to only update the description to UNKNOWN
(currently used for one specific case).
JAVA-5697