Skip to content

[ENH]: Add spann metrics #4492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 12, 2025
Merged

[ENH]: Add spann metrics #4492

merged 5 commits into from
May 12, 2025

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented May 8, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Adds metrics for various operations during compaction
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

@sanketkedia sanketkedia changed the title add spann metrics [ENH]: Add spann metrics May 8, 2025
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

github-actions bot commented May 8, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link

github-actions bot commented May 8, 2025

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

@sanketkedia sanketkedia marked this pull request as ready for review May 8, 2025 05:55
Copy link
Contributor

propel-code-bot bot commented May 8, 2025

Instrumentation of SPANN Operations with Detailed Metrics

This PR adds comprehensive metrics instrumentation to SPANN index operations, particularly during compaction processes. It implements counters and histograms for various operations like reassignments, splits, head creation/deletion, and measures latency for key operations. The PR moves stopwatch functionality to a shared utility and ensures metrics are properly tracked throughout the compaction lifecycle.

Key Changes:
• Added SpannMetrics struct with counters and histograms for all SPANN operations
• Added WriteStats struct to track atomic counters during index operations
• Instrumented code paths to record metrics at key points in compaction workflow
• Added detailed latency tracking for garbage collection, commit, and flush operations
• Refactored Stopwatch utility to be reusable across crates

Affected Areas:
• rust/index/src/spann/types.rs
• rust/tracing/src/util.rs
• rust/cache/src/foyer.rs
• rust/segment/src/distributed_spann.rs
• rust/blockstore/src/arrow/flusher.rs
• rust/blockstore/src/types/flusher.rs

Potential Impact:

Functionality: No changes to core functionality, but enhances observability of SPANN operations

Performance: Minimal overhead from metrics collection, with performance data now available for analysis

Security: No security implications

Scalability: Helps diagnose scalability issues by providing visibility into compaction bottlenecks

Review Focus:
• Implementation of Stopwatch utility and its usage patterns
• Placement of metrics collection points in code paths
• Appropriate use of atomic operations for counter increments
• Naming and organization of metrics for discoverability

Testing Needed

• Verify metrics are properly emitted during compaction operations
• Check that latency measurements accurately reflect operation durations
• Ensure metrics don't add significant overhead to SPANN operations

Code Quality Assessment

rust/tracing/src/util.rs: Good encapsulation of the Stopwatch functionality

rust/blockstore/src/types/flusher.rs: Potential issue with unimplemented!() in num_entries() method for MemoryBlockfileFlusher

rust/index/src/spann/types.rs: File has become significantly larger and could benefit from refactoring into multiple files

Best Practices

Metrics:
• Each operation has a dedicated counter
• Latency measurements use histograms
• Consistent attribute tagging with collection_id

Rust:
• Using Arc<AtomicU32/64> for thread-safe counters
• Appropriate use of shared utility code
• Proper implementation of Drop trait for automatic metric recording

Potential Issues

• The num_entries() method uses unimplemented!() for MemoryBlockfileFlusher which could cause runtime panics
• Frequent string conversions for collection_id may add unnecessary overhead
• Tracking multiple sub-categories of reassigns may create too many metrics

This summary was automatically generated by @propel-code-bot


pub fn num_blocks(&self) -> usize {
match self {
BlockfileFlusher::MemoryBlockfileFlusher(_) => unimplemented!(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The num_blocks method for BlockfileFlusher::MemoryBlockfileFlusher uses unimplemented!(). If a SpannIndexFlusher (from chroma-index) is created with MemoryBlockfileFlusher instances, which is possible if SPANN uses in-memory blockfiles, calling flush() on SpannIndexFlusher will invoke num_blocks() on these memory flushers (e.g., in SpannIndexFlusher::flush when calculating num_pl_blocks_flushed). This will lead to a panic.

Consider returning a value that is semantically correct for in-memory flushers. If "flushed blocks" primarily refers to units written to persistent storage, 0 might be appropriate. If it represents a conceptual block, 1 could be an alternative. Returning 0 seems safer to prevent panics and align with the idea of blocks flushed to disk.

Suggested change
BlockfileFlusher::MemoryBlockfileFlusher(_) => unimplemented!(),
BlockfileFlusher::MemoryBlockfileFlusher(_) => 0,

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

@sanketkedia sanketkedia requested a review from HammadB May 8, 2025 06:11
@sanketkedia sanketkedia force-pushed the 05-06-add_spann_metrics branch from c45b732 to 81006ee Compare May 8, 2025 16:41
@sanketkedia sanketkedia force-pushed the 05-06-add_spann_metrics branch from 81006ee to 4dd9814 Compare May 10, 2025 01:06
@sanketkedia sanketkedia requested a review from rescrv May 10, 2025 01:07
@sanketkedia sanketkedia force-pushed the 05-06-add_spann_metrics branch from 4dd9814 to d3ac39a Compare May 12, 2025 18:54
@sanketkedia sanketkedia enabled auto-merge (squash) May 12, 2025 20:20
@sanketkedia sanketkedia merged commit e03a07b into main May 12, 2025
71 checks passed
narner pushed a commit to narner/chroma that referenced this pull request May 14, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
  - Adds metrics for various operations during compaction
- New functionality
  - ...

## Test plan

_How are these changes tested?_
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
itaismith pushed a commit that referenced this pull request May 23, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
  - Adds metrics for various operations during compaction
- New functionality
  - ...

## Test plan

_How are these changes tested?_
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants