fix(workflow): process all activation jobs as a single batch #1488

mjameswh · 2024-08-07T21:30:11Z

What was changed

Previously, when processing a Workflow activation, the SDK would execute notifyHasPatch and signalWorkflow jobs in distinct phases, before other types of jobs. The primary reason behind that multi-phase algorithm was to avoid the possibility that a Workflow execution might complete before all incoming signals have been dispatched (at least to the point that the synchronous part of the handler function has been executed).

This PR replaces that multi-phase algorithm with a simpler one where jobs are simply sorted as signals and updates -> others, but without processing them as distinct batches (i.e. without leaving/reentering the VM context between each group, which automatically triggers the execution of all outstanding microtasks). That single-phase approach resolves some rare, edge-case quirks of the former algorithm (including the one described in [Bug] Workflows can be constructed in which update handlers do not, but should, execute #1474), and yet still satisfies to the original requirement of ensuring that every signalWorkflow jobs - and now doUpdate jobs as well - have been given a proper chance to execute before the Workflow main function might complete.

This change is gated by a new SDK flag, to preserve compatibility with workflow execution history produced by previous releases. However, there is no intention at this point of making that that flag rollback-safe.
Lang no longer stops processing jobs for the current activation after the workflow code emits a workflow completion command; instead, lang continues to process remaining jobs until the next stop point, after which it reports all generated commands to Core.

packages/test/src/test-workflows.ts

Sushisource

I think this makes good sense. Nothing blocking, but a few suggestions.

packages/test/src/workflows/signal-update-ordering.ts

Sushisource · 2024-08-09T16:29:11Z

packages/test/src/test-workflows.ts

@@ -1853,7 +1894,7 @@ test('conditionRacer', async (t) => {
        makeFireTimerJob(1)
      )
    );
-    compareCompletion(t, completion, makeSuccess([{ cancelTimer: { seq: 1 } }]));
+    compareCompletion(t, completion, makeSuccess([], [SdkFlags.ProcessWorkflowActivationJobsAsSingleBatch]));


Why did this cancel timer go away?

That CancelTimer command was bogus. We just sent a FireTimer(1). Why would we cancel an already fired timer?

That timer comes from the condition(fn, timeout) line. In this case, fn will return true as soon as the signal handler is invoked. Since signals were procesed in a distinct phase, and timer events in a second phase, the microtasks created internally by condition() would have the opportunity to settle on a state where "fn returns true, because the signal has been received, but timer has not yet fired, because the job has not yet been processed".

ProcessWorkflowActivationJobsAsSingleBatch fixes that. As soon as the fn returns true, the internal condition promise (the one without a timer) gets resolved, but by the time the then microtask gets executed, the TimerFired event has also been processed, and so the attempt to cancel the timer is known to be a noop.

Just for extra safety, I modified that test to return the boolean from condition(fn, timeout), so we know that we didn't break the documented semantic of that API.

packages/worker/src/workflow/vm-shared.ts

…updates-with-signals

mjameswh requested a review from a team as a code owner August 7, 2024 21:30

mjameswh marked this pull request as draft August 7, 2024 22:32

mjameswh added 2 commits August 8, 2024 03:07

fix(workflow): group update jobs with signals

740bb46

Different approach + wip on temporalio#1421

2dec464

mjameswh force-pushed the reorder-updates-with-signals branch from c4e0b53 to 2dec464 Compare August 8, 2024 07:08

single-phase

e519cec

dandavison reviewed Aug 9, 2024

View reviewed changes

packages/test/src/test-workflows.ts Outdated Show resolved Hide resolved

Deal with patches separately

8bd92a9

mjameswh changed the title ~~fix(workflow): Group update jobs with signals~~ fix(workflow): process all activation jobs as a single batch Aug 9, 2024

mjameswh marked this pull request as ready for review August 9, 2024 11:16

Merge branch 'main' into reorder-updates-with-signals

ee89110

Sushisource approved these changes Aug 9, 2024

View reviewed changes

mjameswh added 6 commits August 12, 2024 17:54

More tests

fc9879d

Add test for complete update after workflow returns

15807bb

Update internal docs

ca89105

internal doc

5c1ffd0

wip

c083d23

Revert VM timeout

b1013d7

Sushisource mentioned this pull request Aug 13, 2024

[Feature Request] Consider aligning activation job application with TS changes temporalio/sdk-python#606

Open

mjameswh added 4 commits August 13, 2024 18:00

Fix unit tests for "dont discard post-completion commands"

995047c

Some timers were too short for CI

e34a595

More tests

f36ed47

Resolves some test flakiness

9c2eb5b

mjameswh force-pushed the reorder-updates-with-signals branch from 0d5d76e to 9c2eb5b Compare August 14, 2024 10:42

mjameswh added 4 commits August 14, 2024 07:05

nits

5305d2c

more nits

98606ad

Merge commit '56207d8bb19e813e405a9530eed1dc44b6d8466a' into reorder-…

7f66779

…updates-with-signals

Silence otlp integration tests warnings

3dbadd1

mjameswh merged commit d6e2738 into temporalio:main Aug 14, 2024
70 checks passed

mjameswh deleted the reorder-updates-with-signals branch November 22, 2024 16:19

Sushisource mentioned this pull request Dec 4, 2024

[Feature Request] Align activation job application with TS changes temporalio/sdk-dotnet#375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(workflow): process all activation jobs as a single batch #1488

fix(workflow): process all activation jobs as a single batch #1488

Uh oh!

mjameswh commented Aug 7, 2024 •

edited

Loading

Uh oh!

Uh oh!

Sushisource left a comment

Uh oh!

Uh oh!

Uh oh!

Sushisource Aug 9, 2024

Uh oh!

mjameswh Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix(workflow): process all activation jobs as a single batch #1488

fix(workflow): process all activation jobs as a single batch #1488

Uh oh!

Conversation

mjameswh commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Uh oh!

Uh oh!

Sushisource left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Sushisource Aug 9, 2024

Choose a reason for hiding this comment

Uh oh!

mjameswh Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mjameswh commented Aug 7, 2024 •

edited

Loading