-
Notifications
You must be signed in to change notification settings - Fork 181
Replace ingredient cache with faster ingredient map #921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ingredient cache with faster ingredient map #921
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #921 will degrade performances by 11.19%Comparing Summary
Benchmarks breakdown
|
Hmm looks like this may not be good enough... I guess we could keep the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and I prefer it over the other PR not just because it's much faster but also because it doesn't require using raw-api.
I do think it makes sense to keep the secondary. A 10% regression for the most common case seems a lot.
Main
single
ty_walltime fastest │ slowest │ median │ mean │ samples │ iters
╰─ small │ │ │ │ │
╰─ pydantic 296.6 ms │ 305.5 ms │ 299.5 ms │ 300.5 ms │ 3 │ 3
multi
ty_walltime fastest │ slowest │ median │ mean │ samples │ iters
╰─ small │ │ │ │ │
╰─ pydantic 69.42 ms │ 612.2 ms │ 605.8 ms │ 429.1 ms │ 3 │ 3
This PR
single
ty_walltime fastest │ slowest │ median │ mean │ samples │ iters
╰─ small │ │ │ │ │
╰─ pydantic 303.8 ms │ 322.4 ms │ 305.9 ms │ 310.7 ms │ 3 │ 3
multi
ty_walltime fastest │ slowest │ median │ mean │ samples │ iters
╰─ small │ │ │ │ │
╰─ pydantic 55.83 ms │ 79.95 ms │ 57.73 ms │ 64.5 ms │ 3 │ 3
Overall: The multi-threading regression is now about the same (~10%) as the single threaded regression when using multiple databases
Hmm, it does seem that shuttle got stuck somewhere.... |
The other PR doesn't actually need the raw-api, that just made it easier to make the shuttle shim.. it looks like we'll need a shuttle shim for this one as well. I now realized the shim can just return a copy of the value instead of having to mimic the guard API. |
It might also be worth adding specific benchmarks to compare running with a second database, because right now (for ty_walltime) we are assuming that the fastest run is the one with the first database. At least on my machine it wasn't clear that was the case; the gap was a lot closer and running into noise. |
Did you change the sample count to 1? |
Yes. Did you run those benchmarks on this PR directly? Because with the |
I did see #921 (review). The results are fairly consistent, with the first run being slightly slower. It was mainly apparent in the multi-threaded case. But you can try running some end-to-end benchmarks (that call the CLI) to get a better sense for the regression |
I'd expect that removing I'd really prefer to keep the fast path for the happy case, given that is what usually matters for applications here (at least for r-a, we only ever have one database in production) 😕 |
e402d0c
to
c944589
Compare
There's no lock for the read path here, but yeah I agree. I was just curious to see how close we could get without special casing the single-database case (and it seems like we got pretty close!). I updated the code and added back the |
Hmm the benchmark create a new database on every run, so they are measuring the slow path here, so I guess the regression is expected. |
36ec61f
to
32be70b
Compare
src/zalsa.rs
Outdated
#[doc(hidden)] | ||
#[inline] | ||
pub fn lookup_jar_by_type<J: Jar>(&self) -> Option<IngredientIndex> { | ||
let jar_type_id = TypeId::of::<J>(); | ||
self.jar_map.get(&jar_type_id, &self.jar_map.guard()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit unfortunate that we need a new method for this. I guess it shouldn't matter in the fast-path where we always exit early but it does mean that self.jar_map.get
is called three times in a slow path of a query (and that might affect inlining).
Could we rewrite this to an entry-style API where jar_by_type
returns an Entry
struct that has an is_add
function (or similar) and a get
to get the value (which might insert it).
Entry api
Subject: [PATCH] Only yield if all heads result in a cycle. Retry if even just one inner cycle made progress (in which case there's a probably a new memo)
---
Index: src/accumulator.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/accumulator.rs b/src/accumulator.rs
--- a/src/accumulator.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/src/accumulator.rs (date 1750920458682)
@@ -64,7 +64,7 @@
impl<A: Accumulator> IngredientImpl<A> {
/// Find the accumulator ingredient for `A` in the database, if any.
pub fn from_zalsa(zalsa: &Zalsa) -> Option<&Self> {
- let index = zalsa.add_or_lookup_jar_by_type::<JarImpl<A>>();
+ let index = zalsa.lookup_jar_by_type::<JarImpl<A>>().get();
let ingredient = zalsa.lookup_ingredient(index).assert_type::<Self>();
Some(ingredient)
}
Index: src/zalsa.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/zalsa.rs b/src/zalsa.rs
--- a/src/zalsa.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/src/zalsa.rs (date 1750920358014)
@@ -299,23 +299,19 @@
/// **NOT SEMVER STABLE**
#[doc(hidden)]
#[inline]
- pub fn lookup_jar_by_type<J: Jar>(&self) -> Option<IngredientIndex> {
- let jar_type_id = TypeId::of::<J>();
- self.jar_map.get(&jar_type_id, &self.jar_map.guard())
- }
-
- /// **NOT SEMVER STABLE**
- #[doc(hidden)]
- #[inline]
- pub fn add_or_lookup_jar_by_type<J: Jar>(&self) -> IngredientIndex {
+ pub fn lookup_jar_by_type<J: Jar>(&self) -> LookupJarByType<'_, J> {
let jar_type_id = TypeId::of::<J>();
-
let guard = self.jar_map.guard();
if let Some(index) = self.jar_map.get(&jar_type_id, &guard) {
- return index;
+ return LookupJarByType::Registered(index);
};
- self.add_or_lookup_jar_by_type_slow::<J>(jar_type_id, guard)
+ LookupJarByType::Missing {
+ jar_type_id,
+ zalsa: self,
+ guard,
+ _jar: PhantomData::default(),
+ }
}
#[cold]
@@ -567,3 +563,33 @@
// SAFETY: the caller must guarantee that `T` is a wide pointer for `U`
unsafe { &mut *u }
}
+
+#[doc(hidden)]
+pub enum LookupJarByType<'a, Jar> {
+ Registered(IngredientIndex),
+ Missing {
+ jar_type_id: TypeId,
+ zalsa: &'a Zalsa,
+ guard: papaya::LocalGuard<'a>,
+ _jar: PhantomData<Jar>,
+ },
+}
+
+impl<J: Jar> LookupJarByType<'_, J> {
+ pub const fn is_missing(&self) -> bool {
+ matches!(self, LookupJarByType::Missing { .. })
+ }
+
+ #[inline]
+ pub fn get(self) -> IngredientIndex {
+ match self {
+ LookupJarByType::Registered(index) => index,
+ LookupJarByType::Missing {
+ jar_type_id,
+ zalsa,
+ guard,
+ _jar,
+ } => zalsa.add_or_lookup_jar_by_type_slow::<J>(jar_type_id, guard),
+ }
+ }
+}
Index: components/salsa-macro-rules/src/setup_tracked_struct.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/components/salsa-macro-rules/src/setup_tracked_struct.rs b/components/salsa-macro-rules/src/setup_tracked_struct.rs
--- a/components/salsa-macro-rules/src/setup_tracked_struct.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/components/salsa-macro-rules/src/setup_tracked_struct.rs (date 1750920452744)
@@ -188,7 +188,7 @@
$zalsa::IngredientCache::new();
CACHE.get_or_create(zalsa, || {
- zalsa.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>()
+ zalsa.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get()
})
}
}
@@ -211,7 +211,7 @@
type MemoIngredientMap = $zalsa::MemoIngredientSingletonIndex;
fn lookup_or_create_ingredient_index(aux: &$zalsa::Zalsa) -> $zalsa::IngredientIndices {
- aux.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().into()
+ aux.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get().into()
}
#[inline]
Index: components/salsa-macro-rules/src/setup_input_struct.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/components/salsa-macro-rules/src/setup_input_struct.rs b/components/salsa-macro-rules/src/setup_input_struct.rs
--- a/components/salsa-macro-rules/src/setup_input_struct.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/components/salsa-macro-rules/src/setup_input_struct.rs (date 1750920570382)
@@ -101,14 +101,14 @@
$zalsa::IngredientCache::new();
CACHE.get_or_create(zalsa, || {
- zalsa.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>()
+ zalsa.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get()
})
}
pub fn ingredient_mut(db: &mut dyn $zalsa::Database) -> (&mut $zalsa_struct::IngredientImpl<Self>, &mut $zalsa::Runtime) {
let zalsa_mut = db.zalsa_mut();
zalsa_mut.new_revision();
- let index = zalsa_mut.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>();
+ let index = zalsa_mut.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get();
let (ingredient, runtime) = zalsa_mut.lookup_ingredient_mut(index);
let ingredient = ingredient.assert_type_mut::<$zalsa_struct::IngredientImpl<Self>>();
(ingredient, runtime)
@@ -150,7 +150,7 @@
type MemoIngredientMap = $zalsa::MemoIngredientSingletonIndex;
fn lookup_or_create_ingredient_index(aux: &$zalsa::Zalsa) -> $zalsa::IngredientIndices {
- aux.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().into()
+ aux.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get().into()
}
#[inline]
Index: tests/interned-structs_self_ref.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/tests/interned-structs_self_ref.rs b/tests/interned-structs_self_ref.rs
--- a/tests/interned-structs_self_ref.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/tests/interned-structs_self_ref.rs (date 1750920478928)
@@ -87,7 +87,9 @@
let zalsa = db.zalsa();
CACHE.get_or_create(zalsa, || {
- zalsa.add_or_lookup_jar_by_type::<zalsa_struct_::JarImpl<Configuration_>>()
+ zalsa
+ .lookup_jar_by_type::<zalsa_struct_::JarImpl<Configuration_>>()
+ .get()
})
}
}
@@ -114,7 +116,8 @@
type MemoIngredientMap = zalsa_::MemoIngredientSingletonIndex;
fn lookup_or_create_ingredient_index(aux: &Zalsa) -> salsa::plumbing::IngredientIndices {
- aux.add_or_lookup_jar_by_type::<zalsa_struct_::JarImpl<Configuration_>>()
+ aux.lookup_jar_by_type::<zalsa_struct_::JarImpl<Configuration_>>()
+ .get()
.into()
}
Index: components/salsa-macro-rules/src/setup_tracked_fn.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/components/salsa-macro-rules/src/setup_tracked_fn.rs b/components/salsa-macro-rules/src/setup_tracked_fn.rs
--- a/components/salsa-macro-rules/src/setup_tracked_fn.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/components/salsa-macro-rules/src/setup_tracked_fn.rs (date 1750920531455)
@@ -151,21 +151,23 @@
fn fn_ingredient(db: &dyn $Db) -> &$zalsa::function::IngredientImpl<$Configuration> {
let zalsa = db.zalsa();
$FN_CACHE.get_or_create(zalsa, || {
+ let lookup = zalsa.lookup_jar_by_type::<$Configuration>();
+
// If the ingredient has already been inserted, we know that the downcaster
// has also been registered. This is a fast-path for multi-database use cases
// that bypass the ingredient cache and will always execute this closure.
- zalsa.lookup_jar_by_type::<$Configuration>()
- .unwrap_or_else(|| {
- <dyn $Db as $Db>::zalsa_register_downcaster(db);
- zalsa.add_or_lookup_jar_by_type::<$Configuration>()
- })
+ if lookup.is_missing() {
+ <dyn $Db as $Db>::zalsa_register_downcaster(db);
+ }
+
+ lookup.get()
})
}
pub fn fn_ingredient_mut(db: &mut dyn $Db) -> &mut $zalsa::function::IngredientImpl<Self> {
<dyn $Db as $Db>::zalsa_register_downcaster(db);
let zalsa_mut = db.zalsa_mut();
- let index = zalsa_mut.add_or_lookup_jar_by_type::<$Configuration>();
+ let index = zalsa_mut.lookup_jar_by_type::<$Configuration>().get();
let (ingredient, _) = zalsa_mut.lookup_ingredient_mut(index);
ingredient.assert_type_mut::<$zalsa::function::IngredientImpl<Self>>()
}
@@ -177,7 +179,7 @@
let zalsa = db.zalsa();
$INTERN_CACHE.get_or_create(zalsa, || {
<dyn $Db as $Db>::zalsa_register_downcaster(db);
- zalsa.add_or_lookup_jar_by_type::<$Configuration>().successor(0)
+ zalsa.lookup_jar_by_type::<$Configuration>().get().successor(0)
})
}
}
Index: components/salsa-macro-rules/src/setup_interned_struct.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/components/salsa-macro-rules/src/setup_interned_struct.rs b/components/salsa-macro-rules/src/setup_interned_struct.rs
--- a/components/salsa-macro-rules/src/setup_interned_struct.rs (revision 32be70bbbf53822e97077c22549c917318056051)
+++ b/components/salsa-macro-rules/src/setup_interned_struct.rs (date 1750920422317)
@@ -149,7 +149,7 @@
let zalsa = db.zalsa();
CACHE.get_or_create(zalsa, || {
- zalsa.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>()
+ zalsa.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get()
})
}
}
@@ -182,7 +182,7 @@
type MemoIngredientMap = $zalsa::MemoIngredientSingletonIndex;
fn lookup_or_create_ingredient_index(aux: &$zalsa::Zalsa) -> $zalsa::IngredientIndices {
- aux.add_or_lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().into()
+ aux.lookup_jar_by_type::<$zalsa_struct::JarImpl<$Configuration>>().get().into()
}
#[inline]
Should the benchmarks be updated to reuse the same database and not measure the ingredient registration? |
I think this is already the case, no? b.iter_batched_ref(
|| {
let db = salsa::DatabaseImpl::default();
// we can't pass this along otherwise, and the lifetime is generally informational
let input: InternedInput<'static> =
unsafe { transmute(InternedInput::new(&db, "hello, world!".to_owned())) };
let interned_len = interned_length(black_box(&db), black_box(input));
assert_eq!(black_box(interned_len), 13);
(db, input)
},
|&mut (ref db, input)| {
let interned_len = interned_length(black_box(db), black_box(input));
assert_eq!(black_box(interned_len), 13);
},
BatchSize::SmallInput,
) |
@@ -171,7 +180,7 @@ macro_rules! setup_tracked_fn { | |||
let zalsa = db.zalsa(); | |||
$INTERN_CACHE.get_or_create(zalsa, || { | |||
<dyn $Db as $Db>::zalsa_register_downcaster(db); | |||
zalsa.add_or_lookup_jar_by_type::<$Configuration>().successor(0) | |||
zalsa.lookup_jar_by_type::<$Configuration>().get_or_create().successor(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we use the same two-pased registration here to only register the downcasters in the create branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what we see in the benchmarks is that the many-databases case is now slightly slower when running in single-threaded mode. I don't know papaya enough to say if that's likely but it doesn't seem unreasonable to me that a papaya lookup is more expensive than a mutex lock followed by a fx hash map lookup.
I think this is good to merge as it fixes a significant perf regression in multi-threaded mode, most production cases use a single database, and the single-threaded regression isn't that significant.
Yes, this seems about right. I modified your papaya single_thread benchmark to use a mutex wrapped hash map group.bench_function("std-mutex", |b| {
let mut m = HashMap::<usize, usize>::default();
for i in RandomKeys::new().take(SIZE) {
m.insert(i, i);
}
let m = std::sync::Mutex::new(m);
b.iter(|| {
for i in RandomKeys::new().take(SIZE) {
black_box(assert_eq!(m.lock().unwrap().get(&i), Some(&i)));
}
});
}); and it shows that papaya is ~10% slower:
|
@ibraheemdev I go ahead and merge this, considering that we understand the regression. Would you mind updating Salsa in ty? |
## Summary This PR updates Salsa to pull in Ibraheem's multithreading improvements (salsa-rs/salsa#921). ## Performance A small regression for single-threaded benchmarks is expected because papaya is slightly slower than a `Mutex<FxHashMap>` in the uncontested case (~10%). However, this shouldn't matter as much in practice because: 1. Salsa has a fast-path when only using 1 DB instance which is the common case in production. This fast-path is not impacted by the changes but we measure the slow paths in our benchmarks (because we use multiple db instances) 2. Fixing the 10x slowdown for the congested case (multi threading) outweights the downsides of a 10% perf regression for single threaded use cases, especially considering that ty is heavily multi threaded. ## Test Plan `cargo test`
An alternative to #919, this uses a lock-free map for ingredient access instead of
DashMap
. It looks like this isn't fast enough to get rid of theIngredientCache
entirely, but it's pretty close. I also added a fast-path that avoids checking for the view downcaster if the ingredient has already been registered. This will make the slow-path slower, but that only matters the first time the query is called on a given database.