-
Notifications
You must be signed in to change notification settings - Fork 124
fix(workflow): process all activation jobs as a single batch #1488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(workflow): process all activation jobs as a single batch #1488
Conversation
c4e0b53
to
2dec464
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes good sense. Nothing blocking, but a few suggestions.
packages/test/src/test-workflows.ts
Outdated
@@ -1853,7 +1894,7 @@ test('conditionRacer', async (t) => { | |||
makeFireTimerJob(1) | |||
) | |||
); | |||
compareCompletion(t, completion, makeSuccess([{ cancelTimer: { seq: 1 } }])); | |||
compareCompletion(t, completion, makeSuccess([], [SdkFlags.ProcessWorkflowActivationJobsAsSingleBatch])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this cancel timer go away?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That CancelTimer command was bogus. We just sent a FireTimer(1)
. Why would we cancel an already fired timer?
That timer comes from the condition(fn, timeout)
line. In this case, fn
will return true as soon as the signal handler is invoked. Since signals were procesed in a distinct phase, and timer events in a second phase, the microtasks created internally by condition()
would have the opportunity to settle on a state where "fn
returns true, because the signal has been received, but timer has not yet fired, because the job has not yet been processed".
ProcessWorkflowActivationJobsAsSingleBatch
fixes that. As soon as the fn
returns true, the internal condition promise (the one without a timer) gets resolved, but by the time the then
microtask gets executed, the TimerFired
event has also been processed, and so the attempt to cancel the timer is known to be a noop.
Just for extra safety, I modified that test to return the boolean from condition(fn, timeout)
, so we know that we didn't break the documented semantic of that API.
0d5d76e
to
9c2eb5b
Compare
What was changed
Previously, when processing a Workflow activation, the SDK would execute
notifyHasPatch
andsignalWorkflow
jobs in distinct phases, before other types of jobs. The primary reason behind that multi-phase algorithm was to avoid the possibility that a Workflow execution might complete before all incoming signals have been dispatched (at least to the point that the synchronous part of the handler function has been executed).This PR replaces that multi-phase algorithm with a simpler one where jobs are simply sorted as
signals and updates -> others
, but without processing them as distinct batches (i.e. without leaving/reentering the VM context between each group, which automatically triggers the execution of all outstanding microtasks). That single-phase approach resolves some rare, edge-case quirks of the former algorithm (including the one described in [Bug] Workflows can be constructed in which update handlers do not, but should, execute #1474), and yet still satisfies to the original requirement of ensuring that everysignalWorkflow
jobs - and nowdoUpdate
jobs as well - have been given a proper chance to execute before the Workflow main function might complete.This change is gated by a new SDK flag, to preserve compatibility with workflow execution history produced by previous releases. However, there is no intention at this point of making that that flag rollback-safe.
Lang no longer stops processing jobs for the current activation after the workflow code emits a workflow completion command; instead, lang continues to process remaining jobs until the next stop point, after which it reports all generated commands to Core.