-
Notifications
You must be signed in to change notification settings - Fork 103
[Bug] asyncio.wait is non-deterministic when used with coroutines instead of tasks #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm facing this same issue, it seems, by the logs. I, too, use asyncio for concurrent activity execution (by multiple workers separated by threads). Environment/Versions |
Hrmm, I was only able to get the mismatch after replay (replay was being forced due to overgrowing history size I think). I am still investigating here... |
Turns out This is not a Temporal-specific issue. What is the output of the following? async def do_something(name: str) -> None:
print(f"Run {name}")
await asyncio.wait([
do_something("something1"),
do_something("something2"),
do_something("something3"),
])
await asyncio.wait([
do_something("something4"),
do_something("something5"),
do_something("something6"),
]) In 3.10, ignoring deprecation warnings, it may be something like (but it can change with only slight code alterations):
Why was that output not predictable? Unfortunately, Python in The reason this works with So, basically we need to educate people that Does this make sense? I will add a README note/warning. We could offer our own deterministic form of this function, but we'd probably require tasks anyways like Python does now. Any alternative suggestions or ways we can approach this? |
Thanks for your investigation @cretz. I'll leave this open for comment in case anyone has a better idea, but for my purposes this issue is resolved through the use of tasks. Personally, I think the only thing that can be done is provide a warning. It's a gotcha that is on Python's shoulders, not the SDK, and as you said it's unlikely that a fix will be applied retroactively. |
Uh oh!
There was an error while loading. Please reload this page.
What are you really trying to do?
Split a list of incoming events into concurrently executed activity streams. The repro below mimics a stripped down version of the original code structure.
Describe the bug
When using
asyncio.wait()
with a list of async method calls, which contain a series of activity executions, an error can occur which results in later activities receiving the return values of different concurrent executions.Here is a table to illustrate what I mean. Given an activity that executes
f(x) -> x
, if we pass the numbers 1-3 concurrently, we might see this:This issue is random, and rare. The reproduction below is designed to maximize the chance of running into it.
Minimal Reproduction
Below is a self-contained reproduction of the issue. Using this input should encounter the error fairly consistently, a little less than once per run:
The temporal setup I used to reproduce this issue is a vanilla Temporal server install running via
temporal server start-dev
, with two worker instances running the below python file usingpython -m main
. I primarily tested using Python 3.8.When there is a mismatch, it is printed to the console.
Environment/Versions
Additional context
It appears that this bug occurs after this warning is printed to the worker console:
I can't confirm whether this warning was also appearing when this issue was happening in the real code. It also appears to result in the workflow task restarting, which seems correlated with the determinism breakdown.
Additionally, it appears that this issue has indirectly been mitigated in 3.11, as
asyncio.wait()
no longer allows passing coroutines directly. When the method is wrapped inasyncio.create_task()
, this issue disappears. The above warning is still printed, but doesn't result in disordered activity results.This code change in the workflow removes the issue:
The text was updated successfully, but these errors were encountered: