Skip to content

Replace loom with shuttle #876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 23, 2025
Merged

Conversation

ibraheemdev
Copy link
Contributor

Loom is too slow to run even basic tests (#845), so shuttle should be a lot more useful for us. I added a CI check to run our parallel tests under shuttle. This also undoes some of the changes from #842, mainly shuttle does not require an UnsafeCell wrapper.

Copy link

netlify bot commented May 23, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit b1f118a
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/683091d7370e5e0008da2246

Copy link

codspeed-hq bot commented May 23, 2025

CodSpeed Performance Report

Merging #876 will not alter performance

Comparing ibraheemdev:ibraheem/shuttle (b1f118a) with master (f7b0856)

Summary

✅ 12 untouched benchmarks

@ibraheemdev ibraheemdev force-pushed the ibraheem/shuttle branch 3 times, most recently from 16c320c to e430505 Compare May 23, 2025 13:39
Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. I also love that this simplifies the code again. I've a small comment but I think this is ready to go

src/table.rs Outdated
Comment on lines 335 to 340
#[cfg(feature = "shuttle")]
let data = unsafe {
let data = (0..PAGE_LEN)
.map(|_| UnsafeCell::new(MaybeUninit::uninit()))
.collect::<Box<[PageDataEntry<T>]>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider using this branch in debug builds because Rust doesn't inline the call without optimizations enabled. We've seen this in ty where the Project table takes 0.5 of stack frame!

}
if current_deps.durability < data.durability {
data.revisions = C::new_revisions(current_deps.changed_at);
// UNSAFE: Marking as mut requires exclusive access for the duration of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review those changes too carefully because I understand that these are just reverting earlier changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the changes are mostly just indentation because the with closure is removed.

@@ -14,7 +14,7 @@ impl Signal {
// otherwise calls to `sum` will tend to be unnecessarily
// synchronous.
if stage > 0 {
let mut v = self.value.lock();
let mut v = self.value.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should disable the signals in shuttle tests? The signals are to construct a very specific scenario, defeating the benefit of shuttle - testing many possible scenarios.

@ibraheemdev ibraheemdev force-pushed the ibraheem/shuttle branch 2 times, most recently from 249b2a1 to 273e915 Compare May 23, 2025 13:58
@ibraheemdev
Copy link
Contributor Author

It looks like shuttle caught a failure in the cycle tests!

@davidbarsky
Copy link
Contributor

davidbarsky commented May 23, 2025

Here's the failing tests:

test panicked in task 'main-thread'
failing schedule:
"
91028003c1bbbd9085d996cdd101000000004948a22449a22492922849924449122949122589
244952a22449[1249](https://github.com/salsa-rs/salsa/actions/runs/15212059093/job/42788411582?pr=876#step:4:1250)49942449922491242589922449a224499424491245922449929224499224
4992244992244992244992244992244992244992244992249122494a12498924499224495212
4949224992244592244952924489924892244949222581244992244992244992244992244992
244992244912
"
pass that string to `shuttle::replay` to replay the failure

thread 'cycle_a_t1_b_t2_fallback::the_test' panicked at tests/parallel/cycle_a_t1_b_t2_fallback.rs:64:9:
assertion `left == right` failed
  left: (1, 9)
 right: (1, 2)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2025-05-23T14:00:12.752542Z ERROR execution{i=241}:step{task=0}: generator::detail::gen: set panic inside generator    
2025-05-23T14:00:12.752688Z  INFO shuttle::scheduler::metrics: run finished iterations=242 steps=[min=295, max=384, avg=314.1] context_switches=[min=4, max=62, avg=48.7] preemptions=[min=0, max=54, avg=41.6] random_choices=[min=0, max=0, avg=0.0]

I dunno what you're planning on doing, but maybe we should just land this branch and fix failing tests as we build out the Shuttle test suite?

@MichaReiser
Copy link
Contributor

I dunno what you're planning on doing, but maybe we should just land this branch and fix failing tests as we build out the Shuttle test suite?

I plan to look into this next week

@@ -1,3 +1,6 @@
// Shuttle doesn't like panics inside of its runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how fixable is this limit in shuttle?

Copy link
Contributor Author

@ibraheemdev ibraheemdev May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, it looks like shuttle has a panic hook that cancels the execution immediately if anything panics. It seems like this is pretty fundamental to replay support.

@davidbarsky
Copy link
Contributor

I dunno what you're planning on doing, but maybe we should just land this branch and fix failing tests as we build out the Shuttle test suite?

I plan to look into this next week

sure! just so that I understand: do you want to fix the failing test prior to landing this PR or do you want this PR to fix the already-surfaced test failure?

(I ask because I'm sure Shuttle can find many more failing tests)

@MichaReiser
Copy link
Contributor

sure! just so that I understand: do you want to fix the failing test prior to landing this PR or do you want this PR to fix the already-surfaced test failure?

It depends on how much progress I make ;) I'm also fine diabling this specific test for now and landing this PR

(I ask because I'm sure Shuttle can find many more failing tests)

Sure, but it depends on if we land new changes :)

@davidbarsky
Copy link
Contributor

davidbarsky commented May 23, 2025

Sure, but it depends on if we land new changes :)

Sorry! I'm assuming that we'd be adding new tests/features to Salsa while CI remains red/broken due to failures surfaced by the Shuttle test job. I don't think this is meaningfully different from today's state.

@ibraheemdev ibraheemdev force-pushed the ibraheem/shuttle branch 2 times, most recently from bc45ce4 to 0a03f77 Compare May 23, 2025 15:08
@ibraheemdev
Copy link
Contributor Author

We can ignore the one failing test for now.

@MichaReiser
Copy link
Contributor

This is excellent. Thank you so much for working on this.

@MichaReiser MichaReiser enabled auto-merge May 23, 2025 15:22
@MichaReiser MichaReiser added this pull request to the merge queue May 23, 2025
Merged via the queue into salsa-rs:master with commit 0414d89 May 23, 2025
12 checks passed
@github-actions github-actions bot mentioned this pull request May 23, 2025
@MichaReiser
Copy link
Contributor

I just noticed that the bug is with fallback immediate. We don't use that. I'll leave this to someone else to fix.

@Veykril
Copy link
Member

Veykril commented May 23, 2025

Is the bug merely shuttle not liking how fallback immediate works or is there an actual bug that shuttle found? cc @ChayimFriedman2

@MichaReiser
Copy link
Contributor

Is the bug merely shuttle not liking how fallback immediate works or is there an actual bug that shuttle found? cc @ChayimFriedman2

I'm fairly certain this is an actual bug

@ChayimFriedman2
Copy link
Contributor

@MichaReiser do you have more details?

@MichaReiser
Copy link
Contributor

@MichaReiser do you have more details?

See #876 (comment)

You need to comment out the not shuttle gating in that test, then run it with the shuttle feature enabled

I didn't look into why it's crashing but the backtrace is very short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants