fix(app): guard against possible race conditions during enqueue #8098

psychedelicious · 2025-06-11T05:37:54Z

Summary

In #7724 we made a number of perf optimisations related to enqueuing. One of these optimisations included moving the enqueue logic - including expensive prep work and db writes - to a separate thread.

At the same time manual DB locking was abandoned in favor of WAL mode.

Finally, we set check_same_thread=False to allow multiple threads to access the connection at a given time.

I think this may be the cause of #7950:

We start an enqueue in a thread (running in bg)
We dequeue
Dequeue pulls a partially-written queue item from DB and we get the errors in the linked issue

To be honest, I don't understand enough about SQLite to confidently say that this kind of race condition is actually possible. But:

The error started popping up around the time we made this change.
I have reviewed the logic from enqueue to dequeue very carefully many times over the past month or so, and I am confident that the error is only possible if we are getting unexpectedly NULL values from the DB.
The DB schema includes NOT NULL constraints for the column that is apparently returning NULL.
Therefore, without some kind of race condition or schema issue, the error should not be possible.
The enqueue_batch call is the only place I can find where we have the possibility of a race condition due to async logic. Everywhere else, all DB interaction for the queue is synchronous, as far as I can tell.

This change retains the perf benefits by running the heavy enqueue prep logic in a separate thread, but moves back to the main thread for the DB write. It also uses an explicit transaction for the write.

Will just have to wait and see if this fixes the issue.

Related Issues / Discussions

QA Instructions

Queuing should still work and it should respect max queue item size limits. I've tested and its working fine. Still just as responsive as before.

I've not been able to reproduce the issue, though, so I can't test it directly.

Merge Plan

n/a

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

In #7724 we made a number of perf optimisations related to enqueuing. One of these optimisations included moving the enqueue logic - including expensive prep work and db writes - to a separate thread. At the same time manual DB locking was abandoned in favor of WAL mode. Finally, we set `check_same_thread=False` to allow multiple threads to access the connection at a given time. I think this may be the cause of #7950: - We start an enqueue in a thread (running in bg) - We dequeue - Dequeue pulls a partially-written queue item from DB and we get the errors in the linked issue To be honest, I don't understand enough about SQLite to confidently say that this kind of race condition is actually possible. But: - The error started popping up around the time we made this change. - I have reviewed the logic from enqueue to dequeue very carefully _many_ times over the past month or so, and I am confident that the error is only possible if we are getting unexpectedly `NULL` values from the DB. - The DB schema includes `NOT NULL` constraints for the column that is apparently returning `NULL`. - Therefore, without some kind of race condition or schema issue, the error should not be possible. - The `enqueue_batch` call is the only place I can find where we have the possibility of a race condition due to async logic. Everywhere else, all DB interaction for the queue is synchronous, as far as I can tell. This change retains the perf benefits by running the heavy enqueue prep logic in a separate thread, but moves back to the main thread for the DB write. It also uses an explicit transaction for the write. Will just have to wait and see if this fixes the issue.

psychedelicious requested review from blessedcoolant, hipsterusername and jazzhaiku as code owners June 11, 2025 05:37

github-actions bot added python PRs that change python files services PRs that change app services labels Jun 11, 2025

hipsterusername approved these changes Jun 13, 2025

View reviewed changes

psychedelicious enabled auto-merge (rebase) June 13, 2025 13:36

psychedelicious force-pushed the psyche/fix/enqueue-race-cond branch from 7cced2e to 117a1aa Compare June 13, 2025 13:36

psychedelicious merged commit 1ff3d44 into main Jun 13, 2025
17 of 24 checks passed

psychedelicious deleted the psyche/fix/enqueue-race-cond branch June 13, 2025 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(app): guard against possible race conditions during enqueue #8098

fix(app): guard against possible race conditions during enqueue #8098

Uh oh!

psychedelicious commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

fix(app): guard against possible race conditions during enqueue #8098

fix(app): guard against possible race conditions during enqueue #8098

Uh oh!

Conversation

psychedelicious commented Jun 11, 2025

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Uh oh!

Uh oh!