retry fetching block data with decreasing batch size #204

iuwqyir · 2025-05-19T14:15:14Z

TL;DR

Implemented a retry mechanism for failed block fetches with automatic chunk splitting and added result sorting.

What changed?

Added a new processChunkWithRetry function that:
- Processes blocks in chunks and detects failed requests
- Splits failed blocks into smaller chunks and retries them recursively
- Continues splitting until successful or reaching single block chunks
- Sends successful results to the results channel as they're processed
Modified the Run method to use the new retry mechanism
Added sorting of results by block number before returning
Removed the conditional sleep and made it consistent after each chunk processing

How to test?

Run the application with a mix of valid and invalid block numbers
Verify that all valid blocks are fetched successfully, even when part of a batch with invalid blocks
Confirm that the results are properly sorted by block number
Check logs for "Splitting failed blocks" messages to verify the retry mechanism is working

Why make this change?

This change improves the reliability of block fetching by automatically retrying failed requests with smaller batch sizes. The previous implementation would fail an entire batch if any block in it failed, leading to missing data. The new approach maximizes data retrieval by isolating problematic blocks and ensuring the results are returned in a consistent order regardless of how they were processed.

Summary by CodeRabbit

New Features
- Improved reliability of block fetching with automatic retries for failed attempts.
Enhancements
- Results are now delivered more promptly for successfully fetched blocks, even if others fail.
- Improved sorting of fetched blocks for more consistent output.

iuwqyir · 2025-05-19T14:15:32Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

catalyst17

nice one!

coderabbitai · 2025-05-21T06:54:25Z

Walkthrough

A new method, processChunkWithRetry, was introduced to the Worker struct to implement a retry mechanism for block fetching. The Run method now uses this new method for chunk processing, and the results channel buffer size was adjusted. Results are sorted by block number before returning, with metrics updates unchanged.

Changes

File(s)	Change Summary
internal/worker/worker.go	Added `processChunkWithRetry` method for block fetch retries; updated `Run` to use it, changed results channel buffering, and sorted results by block number.

Sequence Diagram(s)

sequenceDiagram
    participant Worker
    participant RPC
    participant ResultsChannel

    Worker->>Worker: processChunkWithRetry(chunk, resultsCh)
    alt chunk size == 1
        Worker->>RPC: Fetch block via RPC
        RPC-->>Worker: Block result
        Worker->>ResultsChannel: Send result
    else chunk size > 1
        Worker->>RPC: Fetch all blocks in chunk
        RPC-->>Worker: Block results (some may fail)
        alt All succeeded
            Worker->>ResultsChannel: Send all results
        else Some failed
            Worker->>Worker: Retry failed blocks (possibly split & concurrent)
            Worker->>ResultsChannel: Send successful results as available
        end
    end

sequenceDiagram
    participant Worker
    participant ResultsChannel

    Worker->>Worker: Run()
    loop For each chunk
        Worker->>Worker: processChunkWithRetry(chunk, resultsCh)
    end
    Worker->>Worker: Collect all results from resultsCh
    Worker->>Worker: Sort results by block number
    Worker-->>Caller: Return sorted results

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

internal/worker/worker.go (2)

26-33: Consider moving the delay outside the recursive call stack

Placing the time.Sleep in a defer means every recursive invocation – including those created solely for retries on a single block – will incur the full BatchDelay.
With deep-splitting this can quickly multiply the total wall-clock time by O(log n) (or even O(n) in the worst case of all failures), although the actual RPC fan-out already throttles the provider.

If the intent is simply to rate-limit requests to the provider, consider moving the delay:

Just before the top-level GetFullBlocks call, instead of in defer, so it executes once per network round-trip.

Or guard it with if len(chunk) == originalChunkSize { … } to avoid stacking delays on leaf retries.

This keeps the throttle while preventing unnecessary cumulative sleep on recursive splits.

118-121: Sorting logic looks correct, but big.Int → float64 lossiness remains downstream

The added sort by BlockNumber is spot-on.
Note that a few lines below (unchanged) the largest block is converted to float64, which silently loses precision after ~2⁵³. Consider using an integer gauge or exposing the value as a label/string instead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee125fc and 78e2b76.

📒 Files selected for processing (1)

internal/worker/worker.go (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

internal/worker/worker.go (3)

internal/rpc/rpc.go (1)

GetFullBlockResult (18-22)

configs/config.go (1)

Cfg (170-170)

internal/common/utils.go (1)

SliceToChunks (10-23)

🔇 Additional comments (1)

internal/worker/worker.go (1)

96-97: Good catch on channel buffering

Switching the buffer from chunk count to the total block count prevents deadlocks when retries emit more result batches than the original chunking.

coderabbitai · 2025-05-21T06:59:29Z

internal/worker/worker.go

+	// Split failed blocks in half and retry
+	mid := len(failedBlocks) / 2
+	leftChunk := failedBlocks[:mid]
+	rightChunk := failedBlocks[mid:]
+
+	log.Debug().Msgf("Splitting %d failed blocks into chunks of %d and %d", len(failedBlocks), len(leftChunk), len(rightChunk))
+
+	var wg sync.WaitGroup
+	wg.Add(2)
+
+	go func() {
+		defer wg.Done()
+		w.processChunkWithRetry(leftChunk, resultsCh)
+	}()
+
+	go func() {
+		defer wg.Done()
+		w.processChunkWithRetry(rightChunk, resultsCh)
+	}()
+
+	wg.Wait()
+}


🛠️ Refactor suggestion

Potential goroutine explosion when many blocks fail

Every failure split spawns two new goroutines and waits (wg.Wait()).
For k consecutively failing blocks this creates 2k-1 goroutines (binary tree), which can reach thousands under a bad RPC outage and may:

Exceed the Go scheduler’s default thread limits.

Create unnecessary network pressure once the RPC endpoint comes back.

Recommend introducing a simple concurrency limiter (token/semaphore) or using a worker-pool instead of spawning unbounded goroutines, e.g.:

type Worker struct { rpc rpc.IRPCClient + sem chan struct{} // capacity == max concurrent RPCs } func (w *Worker) acquire() { w.sem <- struct{}{} } func (w *Worker) release() { <-w.sem } // before launching a goroutine -wg.Add(2) -go func() { … }() +wg.Add(2) +go func() { + w.acquire() + defer w.release() + … +}()

This keeps memory & CPU predictable even under sustained failure scenarios.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In internal/worker/worker.go around lines 68 to 89, the current approach spawns two new goroutines for each failure split without limit, causing exponential growth in goroutines under repeated failures. To fix this, implement a concurrency limiter such as a semaphore or token bucket to restrict the number of concurrent goroutines processing chunks. Alternatively, refactor to use a fixed-size worker pool that processes failed blocks from a queue, ensuring controlled concurrency and preventing resource exhaustion during sustained failure scenarios.

This was referenced May 19, 2025

add rpc function to fetch transactions by hashes #202

Merged

change sql script position so that it is run first #203

Merged

iuwqyir requested a review from catalyst17 May 19, 2025 14:15

iuwqyir marked this pull request as ready for review May 19, 2025 14:15

catalyst17 approved these changes May 19, 2025

View reviewed changes

iuwqyir force-pushed the 05-19-retry_fetching_block_data_with_decreasing_batch_size branch from 1ac61b8 to 4f2ca9a Compare May 20, 2025 09:08

iuwqyir mentioned this pull request May 20, 2025

pass context to rpc and worker #205

Merged

iuwqyir force-pushed the 05-19-change_sql_script_position_so_that_it_is_run_first branch 2 times, most recently from e226f96 to 3d6fb55 Compare May 21, 2025 06:54

iuwqyir force-pushed the 05-19-retry_fetching_block_data_with_decreasing_batch_size branch from 4f2ca9a to c692c5e Compare May 21, 2025 06:54

iuwqyir changed the base branch from 05-19-change_sql_script_position_so_that_it_is_run_first to graphite-base/204 May 21, 2025 06:55

iuwqyir force-pushed the 05-19-retry_fetching_block_data_with_decreasing_batch_size branch from c692c5e to afbb7e0 Compare May 21, 2025 06:56

graphite-app bot changed the base branch from graphite-base/204 to main May 21, 2025 06:56

retry fetching block data with decreasing batch size

78e2b76

iuwqyir force-pushed the 05-19-retry_fetching_block_data_with_decreasing_batch_size branch from afbb7e0 to 78e2b76 Compare May 21, 2025 06:56

iuwqyir merged commit c8cf165 into main May 21, 2025
5 of 6 checks passed

iuwqyir deleted the 05-19-retry_fetching_block_data_with_decreasing_batch_size branch May 21, 2025 06:58

coderabbitai bot reviewed May 21, 2025

View reviewed changes

iuwqyir mentioned this pull request May 21, 2025

limit concurrent rpc requests in worker #206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

retry fetching block data with decreasing batch size #204

retry fetching block data with decreasing batch size #204

Uh oh!

iuwqyir commented May 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

iuwqyir commented May 19, 2025 •

edited

Loading

Uh oh!

catalyst17 left a comment

Uh oh!

coderabbitai bot commented May 21, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot May 21, 2025

Uh oh!

Uh oh!

retry fetching block data with decreasing batch size #204

retry fetching block data with decreasing batch size #204

Uh oh!

Conversation

iuwqyir commented May 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What changed?

How to test?

Why make this change?

Summary by CodeRabbit

Uh oh!

iuwqyir commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

catalyst17 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

iuwqyir commented May 19, 2025 •

edited by coderabbitai bot

Loading

iuwqyir commented May 19, 2025 •

edited

Loading

coderabbitai bot commented May 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)