Refactor response handlers to improve error handling and streamline mid-stream error processing #128923

Jan-Kazlouski-elastic · 2025-06-04T18:04:14Z

This refactoring is initiated by @jonathan-buttner 's comment and moves provider-independent, repeated logic to BaseResponseHandler, allowing all handlers dealing with streaming errors to reuse it. The goal is to reduce duplication and standardize error handling across providers.

Design Trade-offs and Considerations
1. Class Hierarchy Limitations
Currently, all streaming handlers inherit from non-streaming handlers. As a result, shared methods related to streaming error handling become accessible in contexts where they are not applicable (i.e., non-streaming handlers).
This is a compromise to avoid duplicating logic, but it exposes potentially misleading APIs in non-streaming classes. A more robust solution would involve a cleaner separation in the class hierarchy, e.g., extracting a common intermediate base class or using composition.

2. Enforcing Overrides with UnsupportedOperationException
For provider-specific methods that must be overridden, I've added UnsupportedOperationException as a safeguard.
This ensures that if a required override is missing and the method is called, it fails fast. However, this is a runtime check. Ideally, we'd enforce such overrides at compile time, possibly using abstract methods or interfaces. I’m open to suggestions here if we want stricter enforcement. In current implementation it is handled in some places using assert request.isStreaming().

3. Error Handling Scenarios
There are two main error-handling scenarios:

buildChatCompletionError: Handles errors returned by the provider in the regular HTTP response. This occurs for non-streaming results from provider in elastic streaming operations.
buildMidStreamChatCompletionError: Handles errors returned mid-stream, which must be extracted from the stream content.

4. Provider-Specific Handler Overview

ElasticInferenceServiceUnifiedChatCompletionResponseHandler: Contains custom logic; overrides most methods.
GoogleVertexAiUnifiedChatCompletionResponseHandler: Standard logic; only provider-specific methods overridden.
OpenAiUnifiedChatCompletionResponseHandler: Standard logic; only provider-specific methods overridden.
HuggingFaceChatCompletionResponseHandler: Standard logic; extends OpenAI handler and overrides only specific methods.
MistralUnifiedChatCompletionResponseHandler: Overrides only non-mid-stream error method; inherits others from OpenAI handler.

5. Use of instanceof and Casting
Currently, ErrorResponse types are checked via instanceof, and casting is done manually. While an alternative could be using reflection or a more abstracted pattern, I’m deliberately avoiding reflection due to it being risky impact.
A more structured approach (e.g., visitor pattern or dedicated handler interfaces per provider) would require a broader refactor, which may be worth considering in the future.

6. Regarding the Mistral PR Comment
As mentioned in the Mistral PR discussion, lifting instanceof checks higher in the hierarchy would impact the flow for other providers. This refactoring keeps the current behavior intact while still avoiding duplication by moving shared logic upward in the hierarchy.

Final Thoughts
This refactoring improves maintainability by reducing code duplication and providing a flexible way to build UnifiedChatCompletionException based on provider-specific logic. However, it introduces some technical debt due to the existing class hierarchy and runtime enforcement of required overrides.

I’d appreciate any feedback on whether we should consider a more structural redesign (e.g., class hierarchy split or polymorphic error handling). Didn't want to go too far into changing architecture because it wasn't requested in initial comment.

Have you signed the contributor license agreement?
Have you followed the contributor guidelines?
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
If submitting code, have you checked that your submission is for an OS and architecture that we support?
If you are submitting this code for a class then read our policy for that.

…id-stream error processing

Jan-Kazlouski-elastic · 2025-06-04T18:07:23Z