[Sorta-breaking] Update core to fix low wft slots/pollers issue #877

Sushisource · 2025-05-28T01:19:54Z

Updated core to obtain the fix for low (<2) WFT slot/poller counts causing workers to potentially get stuck.

This is a sorta-breaking change in that it makes a previously invalid configuration into a hard error at worker startup time.

Sushisource · 2025-05-28T01:50:35Z

Unclear why this test_workflow_deadlock_fill_up_slots is failing now, but it seems related given it failed everywhere. Will need to look into it.

Sushisource · 2025-05-28T05:05:55Z

tests/worker/test_workflow.py

+        # Start the worker with CPU count + 11 task slots
+        max_concurrent_workflow_tasks=cpu_count + 11,


I'll admit I don't actually understand how starting cpu_count + 5 workflows somehow eats up 100% of cpu_count + 10 task slots.

However, it does, and before my change the pollers would (erroneously) make sure permits were attempted to be acquired evenly between sticky/nonsticky. Now that that's fixed, 100% of the permits can be eaten up by the deadlockers, meaning the non-deadlocking workflow never gets a chance to start.

This is, IMO, seemingly an natural consequence of deadlocking every single task slot. From what I understand, we all accepted that we could never free slots of deadlocked workflows until they actually became released.

So, seemingly this test change is acceptable, but please LMK if I'm wrong.

I'll admit I don't actually understand how starting cpu_count + 5 workflows somehow eats up 100% of cpu_count + 10 task slots.

I think this is worth understanding. I would assume "max slots used" = "max concurrent tasks" + "max pollers", is that not correct here?

Ah, yes, that is it. I forgot the default number of pollers here is 5.

cretz · 2025-05-28T12:59:00Z

temporalio/worker/_worker.py

-                poll workflow task requests we will perform at a time on this
-                worker's task queue.
+                poll workflow task requests we will perform at a time on this worker's task queue.
+                Must be set to at least two if ``max_cached_workflows`` is nonzero.


What happens if it is not? Is there some validation error coming from Core?

👍 If there is an actual exception thrown, that works for me. I assume it's a clear exception? I do think we should make sure to call this out in the release notes as technically a breaking change to those that have (improperly) set this to 1.

Yes it should be quite clear

cretz · 2025-05-28T12:59:58Z

tests/worker/test_workflow.py

+        # Start the worker with CPU count + 11 task slots
+        max_concurrent_workflow_tasks=cpu_count + 11,


I'll admit I don't actually understand how starting cpu_count + 5 workflows somehow eats up 100% of cpu_count + 10 task slots.

I think this is worth understanding. I would assume "max slots used" = "max concurrent tasks" + "max pollers", is that not correct here?

Sushisource requested a review from a team as a code owner May 28, 2025 01:19

Sushisource added 2 commits May 27, 2025 18:24

Update core

50cacdd

Update docstrings with minimum slot/poller requirements

809d8b4

Sushisource force-pushed the update-core branch from 505aa6d to 809d8b4 Compare May 28, 2025 01:25

Sushisource force-pushed the update-core branch from 5ce2822 to aba6e69 Compare May 28, 2025 04:57

Ensure test_workflow_deadlock_fill_up_slots can pass

121a380

Sushisource force-pushed the update-core branch from aba6e69 to 121a380 Compare May 28, 2025 05:02

Sushisource commented May 28, 2025

View reviewed changes

cretz reviewed May 28, 2025

View reviewed changes

cretz approved these changes May 28, 2025

View reviewed changes

Sushisource merged commit b24326c into main May 28, 2025
18 checks passed

Sushisource deleted the update-core branch May 28, 2025 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Sorta-breaking] Update core to fix low wft slots/pollers issue #877

[Sorta-breaking] Update core to fix low wft slots/pollers issue #877

Uh oh!

Sushisource commented May 28, 2025

Uh oh!

Sushisource commented May 28, 2025 •

edited

Loading

Uh oh!

Sushisource May 28, 2025

Uh oh!

cretz May 28, 2025 •

edited

Loading

Uh oh!

Sushisource May 28, 2025

Uh oh!

cretz May 28, 2025

Uh oh!

Sushisource May 28, 2025

Uh oh!

cretz May 28, 2025

Uh oh!

Sushisource May 28, 2025

Uh oh!

cretz May 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

		# Start the worker with CPU count + 11 task slots
		max_concurrent_workflow_tasks=cpu_count + 11,

[Sorta-breaking] Update core to fix low wft slots/pollers issue #877

[Sorta-breaking] Update core to fix low wft slots/pollers issue #877

Uh oh!

Conversation

Sushisource commented May 28, 2025

Uh oh!

Sushisource commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sushisource May 28, 2025

Choose a reason for hiding this comment

Uh oh!

cretz May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sushisource May 28, 2025

Choose a reason for hiding this comment

Uh oh!

cretz May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Sushisource May 28, 2025

Choose a reason for hiding this comment

Uh oh!

cretz May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Sushisource May 28, 2025

Choose a reason for hiding this comment

Uh oh!

cretz May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Sushisource commented May 28, 2025 •

edited

Loading

cretz May 28, 2025 •

edited

Loading

cretz May 28, 2025 •

edited

Loading