Skip to content

feat: TagValueIterator holds RLock for too long (#26369) #26372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master-1.x
Choose a base branch
from

Conversation

devanbenz
Copy link

I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to return tk.TagValueIterator()

According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex

If any goroutine calls RWMutex.Lock while the lock is already held by one or more readers, concurrent calls to RWMutex.RLock will block until the writer has acquired (and released) the lock, to ensure that the lock eventually becomes available to the writer. Note that this prohibits recursive read-locking.

I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction:

TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
Lock() is about to be called in AddSeriesList
TagValueIterator tk about to RLock()

The TagValueIterator tk is found within return tk.TagValueIterator()

tk.f.mu.RLock()

The AddSeriesList Lock() is found here:

So it appears that we are attempt to lock inside both AddSeriesList and TagValueIterator with different Lock types on the same RWMutex.

After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to RLock() is the underlying issue.

(cherry picked from commit 2e842ff)

Closes #

Describe your proposed changes here.

  • I've read the contributing section of the project README.
  • Signed CLA (if not already signed).

I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to `return tk.TagValueIterator()`

According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex
> If any goroutine calls [RWMutex.Lock](https://pkg.go.dev/sync#RWMutex.Lock) while the lock is already held by one or more readers, concurrent calls to [RWMutex.RLock](https://pkg.go.dev/sync#RWMutex.RLock) will block until the writer has acquired (and released) the lock, to ensure that the lock eventually becomes available to the writer. Note that this prohibits recursive read-locking.

I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction:

```
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
Lock() is about to be called in AddSeriesList
TagValueIterator tk about to RLock()
```

The `TagValueIterator tk` is found within `return tk.TagValueIterator()`

https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L1388

The `AddSeriesList Lock()` is found here:

https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L545

So it appears that we are attempt to lock inside both `AddSeriesList` and `TagValueIterator` with different Lock types on the same `RWMutex`.

After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to `RLock()` is the underlying issue.

(cherry picked from commit 2e842ff)
@devanbenz devanbenz linked an issue May 8, 2025 that may be closed by this pull request
Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the original.

Note that 1.x does not have the same lock issue that 2.x has, so this does not immediately fix any issues for 1.x. However, #20008 probably needs to be backported to master-1.x, which will make this PR necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port #26369 to master-1.x
2 participants