Skip to content

performance issues due to breadth first execution of grapqhl queries in case of async resolvers during calls burst #200

Open
@alessandrolulli

Description

@alessandrolulli

When we receive calls burst we found that the calls "wait each other" (i.e. the first call of the burst waits the last one).
This result in degradation of performances in both time of execution and memory consumption in the server because we have to keep many calls in fly.

This is particularly evident in big graphql queries where the users request many fields and we have several depth in the queries where each level has many async fields.

Just to be TLDR, looking the implementations of graphql and asyncio we understood that this is due to the following:

  • graphql breadth first way to schedule and resolve the fields
  • asyncio internal FIFO queue of tasks to be executed

As an example lets have queries like that, where data may be like beer vendors and we want for each beer vendor many fields that describes that vendor, a1...a100, b1...b100, ...:

query {
  data {
    a1 {
      b1 {
        c1
        ...
        c100
      }
      ...
      b100 {
        c1
        ...
        c100
      }
    }
    ...
    a100 { ... }
  }
}

If we have n of this calls coming in burst when we arrive to the depth of the c fields we have many many task scheduled in the asyncio queue.

If we check the of order of execution we have that the first query, on each level, "waits" the other queries, because all the queries schedules a lot of tasks.

In the proof of concept, that you may find at the end of the post, you can verify the order of execution of the resolvers.

It could be very nice to have some sort of priority in the order to let the first query not wait the scheduling and resolve of all the queries before ending.
I understand that this is something between graphql and asyncio but i think it could affect the use of graphql in environments receiving many calls.
Fixes, helps and hints in how to improve this would be very appreciated.

import asyncio

from graphene import ObjectType, Schema, String, Field

FIELD_NUMBER = 2
CONCURRENT_QUERIES = 10


def make_resolver(i, j=None):
    async def resolver(self, info):
        print(f"START query {info.context['query_number']} | a{i} | b{j}")
        await asyncio.sleep(0.001)
        print(f"END query {info.context['query_number']} | a{i} | b{j}")
        return i

    return resolver


def create_fields():
    fields = {}
    for i in range(FIELD_NUMBER):
        inner_fields = {}
        for j in range(FIELD_NUMBER):
            inner_fields[f"b{j}"] = String()
            inner_fields[f"resolve_b{j}"] = make_resolver(i, j)

        MyType = type(
            f"MyType",
            (ObjectType,),
            inner_fields,
        )

        fields[f"a{i}"] = Field(MyType)
        fields[f"resolve_a{i}"] = make_resolver(i)

    return fields


async def make_query(schema, query_number):
    inner_query_values = [f"b{i}" for i in range(FIELD_NUMBER)]
    query_values = [
        "a%s {%s}" % (i, " ".join(inner_query_values)) for i in range(FIELD_NUMBER)
    ]
    query_string = "{ %s }" % (" ".join(query_values),)

    await schema.execute_async(
        query_string, context_value=dict(query_number=query_number)
    )


async def main():
    Query = type("Query", (ObjectType,), create_fields())
    schema = Schema(query=Query)

    await asyncio.gather(*[make_query(schema, i) for i in range(CONCURRENT_QUERIES)])


asyncio.run(main())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions