-
Notifications
You must be signed in to change notification settings - Fork 19
retry fetching block data with decreasing batch size #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry fetching block data with decreasing batch size #204
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one!
1ac61b8
to
4f2ca9a
Compare
e226f96
to
3d6fb55
Compare
4f2ca9a
to
c692c5e
Compare
WalkthroughA new method, Changes
Sequence Diagram(s)sequenceDiagram
participant Worker
participant RPC
participant ResultsChannel
Worker->>Worker: processChunkWithRetry(chunk, resultsCh)
alt chunk size == 1
Worker->>RPC: Fetch block via RPC
RPC-->>Worker: Block result
Worker->>ResultsChannel: Send result
else chunk size > 1
Worker->>RPC: Fetch all blocks in chunk
RPC-->>Worker: Block results (some may fail)
alt All succeeded
Worker->>ResultsChannel: Send all results
else Some failed
Worker->>Worker: Retry failed blocks (possibly split & concurrent)
Worker->>ResultsChannel: Send successful results as available
end
end
sequenceDiagram
participant Worker
participant ResultsChannel
Worker->>Worker: Run()
loop For each chunk
Worker->>Worker: processChunkWithRetry(chunk, resultsCh)
end
Worker->>Worker: Collect all results from resultsCh
Worker->>Worker: Sort results by block number
Worker-->>Caller: Return sorted results
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
c692c5e
to
afbb7e0
Compare
afbb7e0
to
78e2b76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/worker/worker.go (2)
26-33
: Consider moving the delay outside the recursive call stackPlacing the
time.Sleep
in adefer
means every recursive invocation – including those created solely for retries on a single block – will incur the fullBatchDelay
.
With deep-splitting this can quickly multiply the total wall-clock time byO(log n)
(or evenO(n)
in the worst case of all failures), although the actual RPC fan-out already throttles the provider.If the intent is simply to rate-limit requests to the provider, consider moving the delay:
- Just before the top-level
GetFullBlocks
call, instead of indefer
, so it executes once per network round-trip.- Or guard it with
if len(chunk) == originalChunkSize { … }
to avoid stacking delays on leaf retries.This keeps the throttle while preventing unnecessary cumulative sleep on recursive splits.
118-121
: Sorting logic looks correct, butbig.Int
→float64
lossiness remains downstreamThe added sort by
BlockNumber
is spot-on.
Note that a few lines below (unchanged) the largest block is converted tofloat64
, which silently loses precision after ~2⁵³. Consider using an integer gauge or exposing the value as a label/string instead.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal/worker/worker.go
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
internal/worker/worker.go (3)
internal/rpc/rpc.go (1)
GetFullBlockResult
(18-22)configs/config.go (1)
Cfg
(170-170)internal/common/utils.go (1)
SliceToChunks
(10-23)
🔇 Additional comments (1)
internal/worker/worker.go (1)
96-97
: Good catch on channel bufferingSwitching the buffer from chunk count to the total block count prevents deadlocks when retries emit more result batches than the original chunking.
// Split failed blocks in half and retry | ||
mid := len(failedBlocks) / 2 | ||
leftChunk := failedBlocks[:mid] | ||
rightChunk := failedBlocks[mid:] | ||
|
||
log.Debug().Msgf("Splitting %d failed blocks into chunks of %d and %d", len(failedBlocks), len(leftChunk), len(rightChunk)) | ||
|
||
var wg sync.WaitGroup | ||
wg.Add(2) | ||
|
||
go func() { | ||
defer wg.Done() | ||
w.processChunkWithRetry(leftChunk, resultsCh) | ||
}() | ||
|
||
go func() { | ||
defer wg.Done() | ||
w.processChunkWithRetry(rightChunk, resultsCh) | ||
}() | ||
|
||
wg.Wait() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Potential goroutine explosion when many blocks fail
Every failure split spawns two new goroutines and waits (wg.Wait()
).
For k
consecutively failing blocks this creates 2k-1
goroutines (binary tree), which can reach thousands under a bad RPC outage and may:
- Exceed the Go scheduler’s default thread limits.
- Create unnecessary network pressure once the RPC endpoint comes back.
Recommend introducing a simple concurrency limiter (token/semaphore) or using a worker-pool instead of spawning unbounded goroutines, e.g.:
type Worker struct {
rpc rpc.IRPCClient
+ sem chan struct{} // capacity == max concurrent RPCs
}
func (w *Worker) acquire() { w.sem <- struct{}{} }
func (w *Worker) release() { <-w.sem }
// before launching a goroutine
-wg.Add(2)
-go func() { … }()
+wg.Add(2)
+go func() {
+ w.acquire()
+ defer w.release()
+ …
+}()
This keeps memory & CPU predictable even under sustained failure scenarios.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In internal/worker/worker.go around lines 68 to 89, the current approach spawns
two new goroutines for each failure split without limit, causing exponential
growth in goroutines under repeated failures. To fix this, implement a
concurrency limiter such as a semaphore or token bucket to restrict the number
of concurrent goroutines processing chunks. Alternatively, refactor to use a
fixed-size worker pool that processes failed blocks from a queue, ensuring
controlled concurrency and preventing resource exhaustion during sustained
failure scenarios.
TL;DR
Implemented a retry mechanism for failed block fetches with automatic chunk splitting and added result sorting.
What changed?
processChunkWithRetry
function that:Run
method to use the new retry mechanismHow to test?
Why make this change?
This change improves the reliability of block fetching by automatically retrying failed requests with smaller batch sizes. The previous implementation would fail an entire batch if any block in it failed, leading to missing data. The new approach maximizes data retrieval by isolating problematic blocks and ensuring the results are returned in a consistent order regardless of how they were processed.
Summary by CodeRabbit