feat: TagValueIterator holds RLock for too long (#26369) #26372

devanbenz · 2025-05-08T20:56:44Z

I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to return tk.TagValueIterator()

According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex

If any goroutine calls RWMutex.Lock while the lock is already held by one or more readers, concurrent calls to RWMutex.RLock will block until the writer has acquired (and released) the lock, to ensure that the lock eventually becomes available to the writer. Note that this prohibits recursive read-locking.

I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction:

TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
TagValueIterator tk about to RLock()
TagValueIterator tk is RLock()
TagValueIterator tk about to RUnlock()
TagValueIterator tk is RUnlock()
TagValueIterator is RUnlock()
TagValueIterator is about to RLock()
TagValueIterator is RLock()
Lock() is about to be called in AddSeriesList
TagValueIterator tk about to RLock()

The TagValueIterator tk is found within return tk.TagValueIterator()

influxdb/tsdb/index/tsi1/log_file.go

Line 1388 in eebe655

tk.f.mu.RLock()

The AddSeriesList Lock() is found here:

influxdb/tsdb/index/tsi1/log_file.go

Line 545 in eebe655

f.mu.Lock()

So it appears that we are attempt to lock inside both AddSeriesList and TagValueIterator with different Lock types on the same RWMutex.

After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to RLock() is the underlying issue.

(cherry picked from commit 2e842ff)

Closes #

Describe your proposed changes here.

I've read the contributing section of the project README.
Signed CLA (if not already signed).

I looks like we are holding a reader lock and defer'ing it, while at the same time acquiring that same reader lock within the call to `return tk.TagValueIterator()` According to Go's RWMutex documentation this is not a good idea: https://pkg.go.dev/sync#RWMutex > If any goroutine calls [RWMutex.Lock](https://pkg.go.dev/sync#RWMutex.Lock) while the lock is already held by one or more readers, concurrent calls to [RWMutex.RLock](https://pkg.go.dev/sync#RWMutex.RLock) will block until the writer has acquired (and released) the lock, to ensure that the lock eventually becomes available to the writer. Note that this prohibits recursive read-locking. I believe this is what is causing the deadlock we are seeing in a few different tickets. I had ran a reproducer where I was effectively reading and writing from a bucket that I had set the retention policy to 1 minute and the shard rollover to 30 seconds. I was able to reproduce the deadlock consistently this way. Adding some logging around the mutexes I am seeing the following during every reproduction: ``` TagValueIterator is RUnlock() TagValueIterator is about to RLock() TagValueIterator is RLock() TagValueIterator tk about to RLock() TagValueIterator tk is RLock() TagValueIterator tk about to RUnlock() TagValueIterator tk is RUnlock() TagValueIterator is RUnlock() TagValueIterator is about to RLock() TagValueIterator is RLock() TagValueIterator tk about to RLock() TagValueIterator tk is RLock() TagValueIterator tk about to RUnlock() TagValueIterator tk is RUnlock() TagValueIterator is RUnlock() TagValueIterator is about to RLock() TagValueIterator is RLock() Lock() is about to be called in AddSeriesList TagValueIterator tk about to RLock() ``` The `TagValueIterator tk` is found within `return tk.TagValueIterator()` https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L1388 The `AddSeriesList Lock()` is found here: https://github.com/influxdata/influxdb/blob/eebe655e4c596b8f78bae30e5729092b6732d1a7/tsdb/index/tsi1/log_file.go#L545 So it appears that we are attempt to lock inside both `AddSeriesList` and `TagValueIterator` with different Lock types on the same `RWMutex`. After modifying the code to the following I'm no longer seeing the problem. I have a feeling that the recursive call to `RLock()` is the underlying issue. (cherry picked from commit 2e842ff)

gwossum

Looks like the original.

Note that 1.x does not have the same lock issue that 2.x has, so this does not immediately fix any issues for 1.x. However, #20008 probably needs to be backported to master-1.x, which will make this PR necessary.

devanbenz linked an issue May 8, 2025 that may be closed by this pull request

Port #26369 to master-1.x #26370

Open

gwossum approved these changes May 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: TagValueIterator holds RLock for too long (#26369) #26372

feat: TagValueIterator holds RLock for too long (#26369) #26372

Uh oh!

devanbenz commented May 8, 2025

Uh oh!

gwossum left a comment

Uh oh!

Uh oh!

feat: TagValueIterator holds RLock for too long (#26369) #26372

Are you sure you want to change the base?

feat: TagValueIterator holds RLock for too long (#26369) #26372

Uh oh!

Conversation

devanbenz commented May 8, 2025

Uh oh!

gwossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!