Skip to content

Service capabilities / error behaviors #1163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Service capabilities / error behaviors #1163

wants to merge 14 commits into from

Conversation

benjie
Copy link
Member

@benjie benjie commented Apr 30, 2025

This is a rewrite of

It is re-implemented on top of the recent editorial work (e.g. renaming _field error_ to _execution error_) and also makes a significant change in that it does not require onError to be included in the response, instead an introspection field is used to indicate:

  1. that onError is supported
  2. what the behavior will be if onError is not present

Replaces:

GraphQL.js implementation:


Please see this 60 second video on the motivation for this PR (the last few seconds of the video also covers "transitional non-null" which is a separate concern).

As agreed at the nullability working group, disabling error propagation is the future of error handling in GraphQL. Error propagation causes a number of issues, but chief among them are:

  1. It destroys useful data in the response.
  2. It makes it unsafe to store resulting data in normalized stores.

Clients such as Relay do not want error propagation to be a thing.

This has traditionally resulted in schema design best practices advising using nullable in positions where errors were expected, even if null was never a semantically valid value for that position. And since errors can happen everywhere, this has lead to an explosion of nullability and significant pain on the client side with developers having to do seemingly unnecessary null checks in loads of positions, or worse - unsafely bypassing the type safety.

The reason that GraphQL does error propagation is to keep it's "not null" promise in the event that an error occurs (and is replaced with null due to the way GraphQL responses are structured and limitations in JSON) in a non-nullable position.

It doesn't take much code on the client to prevent the client reading a null that relates to an error, graphql-toe can be used with almost any JavaScript or TypeScript based GraphQL client (not Relay, but it has @throwOnFieldError that you can use instead) and achieves this in 512 bytes gzipped - and that's with a focus on performance rather than bundle size.

This PR allows the client to take responsibility for error handling by specifying onError: "NO_PROPAGATE" as part of the GraphQL request, and thereby turns off error propagation behavior. This is also set as the recommended default for future schemas.

With clients responsible for error handling, we no longer need to factor the possibility of whether something can error or not into its nullability, meaning we can use the not-null ! to indicate all the positions in the schema for which null is not a semantically valid value - i.e. the underlying resource will never be a legitimate null.

The end result:

  • true nullability indicated in schema - no more thinking about where errors are likely
  • fewer null checks on clients
  • clients can leverage their native error handling capabilities such as try/catch or <ErrorBoundary />
  • safe to store errored responses into normalized stores

I've also included onError: "HALT" in this proposal. We've discovered that there's a small but significant class of clients out there, mostly ad-hoc scripts, that throw away the entire response when any error occurs. By codifying this into the spec we make it easier to implement these clients, and we allow the server to halt processing the rest of the request unnecessarily.

As noted by @revangen in this comment:

I've also included onError: "ABORT" "HALT" in this proposal.

Appreciate this being included. For Shopify's public Admin GraphQL API, we have a mix of scenarios that result in a partial success response and only error response. Having been around for 8+ years, we are reluctant at times to change its behaviour to favour partial responses as we don't control majority of clients. Providing clients a way to specify the server's behaviour provides a migration path should clients care about partial responses.

Copy link

netlify bot commented Apr 30, 2025

Deploy Preview for graphql-spec-draft ready!

Name Link
🔨 Latest commit b5f64ae
🔍 Latest deploy log https://app.netlify.com/projects/graphql-spec-draft/deploys/682f8163109ded00087743eb
😎 Deploy Preview https://deploy-preview-1163--graphql-spec-draft.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@benjie benjie marked this pull request as ready for review April 30, 2025 11:41
@benjie benjie added the 💡 Proposal (RFC 1) RFC Stage 1 (See CONTRIBUTING.md) label Apr 30, 2025
@benjie
Copy link
Member Author

benjie commented Apr 30, 2025

I have applied the RFC1 label to this as inherited from the PR it replaces:

@benjie
Copy link
Member Author

benjie commented Apr 30, 2025

A full, CI-passing, implementation of this is available in GraphQL.js now; check it out: graphql@canary-pr-4364

Comment on lines 168 to 171
enum __ErrorBehavior {
NO_PROPAGATE
PROPAGATE
ABORT
Copy link
Contributor

@martinbonnin martinbonnin Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving my previous comment here, love this proposal overall but I would love it even more with NULL

enum __ErrorBehavior {
  NULL
  PROPAGATE
  ABORT
}

tldr: I believe this is going to be read and discussed a lot more than it's going to be written by users and I would optimize for the conciseness, pronouncability and memorability of NULL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize my thoughts on it from the previous thread: NULL is what I originally wrote, I agree it looks and initially feels nicer; but ultimately I discounted it for a number of reasons:

Communication issues

If you tell someone on the phone to set onError: NULL, they're very likely to instead set onError: null.

Ambiguity issues

onError: null and onError: NULL are very similar, but the first means PROPAGATE and the latter means NO_PROPAGATE. This is likely to be the source of many user errors and much frustration (both for newbies, and for the people supporting them including library authors).

Errors already become null in the spec:

[an error] raised [...] is handled as though the response position [...] resolved to null
-- https://spec.graphql.org/draft/#sel-FANVNJCAACGW-qR

onError: NULL doesn't implicitly seem to mean something different from this.

Please suggest other alternatives!

I'm very open to changing the word NO_PROPAGATE to something else, I really don't like NO_PROPAGATE, not least because I always want to spell it propOgate rather than propAgate. But I think changing it to NULL would be a mistake.

I've gone through about 30 alternatives when brainstorming it, and none are sufficiently clear whilst also being shorter than NO_PROPAGATE. My favourite alternatives are things like LOCAL/LOCALIZE/ISOLATE/CONTAIN, but I'm not sure they convey the meaning quite as well as NO_PROPAGATE does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo... Maybe LOCAL_NULL actually is quite compelling. Not thought of that one before.

### @behavior

```graphql
directive @behavior(onError: __ErrorBehavior! = PROPAGATE) on SCHEMA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on directive @onError(behavior: __ErrorBehavior!) on SCHEMA to mirror the naming of the client onError field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s why I think it should be a request parameter and not a directive. (Further, I don’t think we should allow both because that opens up questions around what happens if the client sets one and the user sets the other: who wins?)

Copy link
Contributor

@fotoetienne fotoetienne May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about defining the server default onError behavior only in introspection and not in the SDL? We can instead define abstract "execution configuration" parameters that informs execution behavior. Implementations can then choose to have this be configured via a schema directive or as a separate configuration file.

@benjie benjie changed the title Allow clients to disable error propagation via request parameter (take 2) Service capabilities / error behaviors May 22, 2025
@benjie
Copy link
Member Author

benjie commented May 22, 2025

This PR description is currently out of date; I've reworked the capabilities infrastructure based on Lee's feedback, see the changes for details.

@martinbonnin
Copy link
Contributor

@benjie @leebyron given the previous discussions on feature discovery have been going on for years, I'm doubtful we can land __Capability soon... Yet onError is crucially needed to move the nullability proposals, is there any chance we could use a new introspection field (__Schema.defaultErrorBehaviour) instead? Or should we expedite __Capability in a separate proposal first? (I'm down with that FWIW)

@benjie
Copy link
Member Author

benjie commented May 25, 2025

@martinbonnin You don’t like this proposal?

@martinbonnin
Copy link
Contributor

@benjie Don't get me wrong, I like __Capability! It's just something that has been discussed before in the community. For an example, it's relatively similar to https://github.com/graphql/graphql-wg/blob/main/rfcs/FeatureDiscovery.md. Feels like getting __Capability through the finish line is probably going to require a much wider discussion and wait times (the time for everyone potentially affected to review __Capability).

While I think onResponse is in a much better spot of being fast tracked.

Of course, I'm all for having both of them. If we want to do that, I think it'd make sense to merge __Capability first and then have onError use that as __Capability solves a more general problem.

In a perfect world, we can land both __Capability and onError in the next couple of months.

Comment on lines +515 to +519

The `__Service` type is returned from the `__service` meta-field and provides
information about the GraphQL service, most notably about its capabilities. This
type was added after the original release of GraphQL, so older schemas may not
support it.
Copy link
Contributor

@martinbonnin martinbonnin May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the __service indirection vs having __capabilities?
My mental model is that the endpoint is the service. It has a __schema and __capabilities are at the same level (except capabilities can't be expressed in terms of type system).

Comment on lines +542 to +544
must contain only ASCII letters, digits and hyphens. These constraints are
inspired by reverse domain name notation to encourage global uniqueness and
collision-resistance. Unlike the domain name system, capability identifiers are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

Comment on lines +580 to +585
- {"org.graphql.defaultErrorBehavior"} - indicates the _default error behavior_
of the service via the {value}. If not present, must be assumed to be
{"PROPAGATE"}.
- {"org.graphql.onError"} - indicates that the service allows the client to
specify {onError} in a request to indicate the _error behavior_ the service
should use for the request. {value} is {null}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

Suggested change
- {"org.graphql.defaultErrorBehavior"} - indicates the _default error behavior_
of the service via the {value}. If not present, must be assumed to be
{"PROPAGATE"}.
- {"org.graphql.onError"} - indicates that the service allows the client to
specify {onError} in a request to indicate the _error behavior_ the service
should use for the request. {value} is {null}.
- {"org.graphql.defaultErrorBehavior"} - indicates the _default error behavior_
of the service via the {value}.
If present, the service must be assumed to support the {onError} parameter
to indicate the _error behavior_ on requests.
If not present, must be assumed to be {"PROPAGATE"}.

Less invalid cases possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 Proposal (RFC 1) RFC Stage 1 (See CONTRIBUTING.md)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants