Skip to content

Add API to dump memory usage #916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 20, 2025

Conversation

ibraheemdev
Copy link
Contributor

@ibraheemdev ibraheemdev commented Jun 16, 2025

This PR adds the methods <dyn Database>::{structs_info, queries_info}, which allow accessing memory usage information about salsa structs and memos. In ty, this lets us generate a report that looks like this:

=======SALSA DUMP=======
`Definition`                                       metadata=7.21MB   fields=14.41MB  count=180134
`Expression`                                       metadata=4.42MB   fields=4.42MB   count=92086
`member_lookup_with_policy_::interned_arguments`   metadata=1.94MB   fields=2.21MB   count=34566
`class_member_with_policy_::interned_arguments`    metadata=1.56MB   fields=1.78MB   count=27820
`IntersectionType`                                 metadata=0.75MB   fields=1.50MB   count=13350
`OverloadLiteral`                                  metadata=1.19MB   fields=0.85MB   count=21181
...
`Definition -> ty_python_semantic::types::infer::TypeInference`
    metadata=24.41MB  fields=32.41MB  count=144690
`Expression -> ty_python_semantic::types::infer::TypeInference`
    metadata=28.34MB  fields=17.95MB  count=80115
`ScopeId -> ty_python_semantic::types::infer::TypeInference`
    metadata=14.34MB  fields=6.31MB   count=28170
`member_lookup_with_policy_::interned_arguments -> ty_python_semantic::place::PlaceAndQualifiers`
    metadata=4.92MB   fields=1.66MB   count=34566
...

One current limitation is that we cannot provide the name of the query function, only it's input and output types. The other is that this only reports the stack size of any struct fields, which makes this a lot less useful for anything even wrapped trivially in a Box or Arc. Ideally we would have some way of asking a type for it's heap allocation size (maybe using something like the get-size crate), but it would likely have to be opt-in to avoid breakage, and I'm not sure exactly how that would look (maybe a macro attribute, or feature flag?).

I'm not sure how others have been profiling salsa-related memory usage, but the above report has been extremely useful for us. Also unsure if this should be under the salsa_unstable feature.

Copy link

netlify bot commented Jun 16, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 10ce744
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/685594bcba54f3000861b286

Copy link

codspeed-hq bot commented Jun 16, 2025

CodSpeed Performance Report

Merging #916 will not alter performance

Comparing ibraheemdev:ibraheem/memory-usage-dump (10ce744) with master (87a730f)

Summary

✅ 12 untouched benchmarks

@ibraheemdev ibraheemdev force-pushed the ibraheem/memory-usage-dump branch from 4a3497a to e1293de Compare June 16, 2025 17:48
@ibraheemdev ibraheemdev force-pushed the ibraheem/memory-usage-dump branch from e1293de to 090523a Compare June 16, 2025 17:59
@MichaReiser
Copy link
Contributor

Sorry, I didn't get to reviewing this today. I plan to review tomorrow. Note, there's a test failure on 1.85

Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I think it would be good to gate this behind the unstable feature for now.

It would be nice if we could extract the query name somehow. I left a few ideas but not surehow (and I'm not very familiar with that part of the code). Maybe @Veykril or @davidbarsky have an idea of how we could do that?

src/database.rs Outdated
///
/// The returned map holds memory usage information for memoized values of a given query, keyed
/// by its `(input, output)` type names.
pub fn queries_info(&self) -> HashMap<(&'static str, &'static str), IngredientInfo> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a concrete proposal but I was just wondering if we could somehow use Zalsa::memo_ingredient_indices or maybe the jar_map (by indexing with the erased type id) to get the query ingredient index.

src/database.rs Outdated
@@ -132,4 +134,99 @@ impl dyn Database {
let views = self.zalsa().views();
views.downcaster_for().downcast(self)
}

/// Returns information about any Salsa structs.
pub fn structs_info(&self) -> Vec<IngredientInfo> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this ingredients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not all ingredients, it's only tracked/interned/input structs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s the big three, no? We’d be missing supertypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queries are also ingredients but they're split out from this method.

Internally there's also a bunch of other things that we call ingredients, but I'm not sure that's relevant for this API (tracked struct fields, etc.).

@davidbarsky
Copy link
Contributor

It would be nice if we could extract the query name somehow. I left a few ideas but not sure
how.

I'm trying to remember, but I also remember struggling with this when I last touched this.

@MichaReiser
Copy link
Contributor

@carljm any chance you know how to get the query name?

@carljm
Copy link
Contributor

carljm commented Jun 18, 2025

@carljm any chance you know how to get the query name?

Not off the top of my head, no, I'd need to spend some time looking at it.

@ibraheemdev
Copy link
Contributor Author

ibraheemdev commented Jun 18, 2025

The problem is that memos are stored on the input ingredients, which are completely disconnected from the query ingredient. I think the only way to do this would be to store the type name on MemoEntryType, which is probably not something we want to do (except maybe under a feature flag?).

@MichaReiser MichaReiser enabled auto-merge June 20, 2025 17:08
@MichaReiser MichaReiser added this pull request to the merge queue Jun 20, 2025
Merged via the queue into salsa-rs:master with commit c145596 Jun 20, 2025
12 checks passed
@github-actions github-actions bot mentioned this pull request Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants