Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

node: fix shutdown #1308

Merged
merged 8 commits into from
Dec 21, 2018
Merged

node: fix shutdown #1308

merged 8 commits into from
Dec 21, 2018

Conversation

andresilva
Copy link
Contributor

Tokio doesn't run destructors for any futures still in the pool on shutdown. We blocked the tokio runtime on the outer exit future (triggered by ctrl-c), afterwards we would fire the exit signal but would not wait for all futures in the runtime to finish. Most significantly this can lead to database corruption since it isn't closed properly.

Now, after firing the exit signal we wait for the runtime to be idle, which means that all spawned futures must be guarded by the exit signal (aura and grandpa were missing).

Merge after #1306.
Fixes #1161, fixes #1034, fixes #1035.

@andresilva andresilva added the A0-please_review Pull request needs code review. label Dec 20, 2018
@andresilva andresilva requested a review from rphmeier December 20, 2018 18:16
@andresilva andresilva changed the title Fix node shutdown node: fix shutdown Dec 20, 2018
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but overall it looks good.

let _telemetry = service.telemetry();
drop(service);

// TODO [andre]: timeout this future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a github issue for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -120,7 +109,7 @@ construct_service_factory! {
grandpa::NetworkBridge::new(service.network()),
)?;

executor.spawn(voter);
executor.spawn(voter.select(service.on_exit()).then(|_| Ok(())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not take the on_exit as parameter for run_grandpa? I think that makes it more easy to use and people who write their own node does not forget to select on exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. longer-term I would like a special executor that just applies the exit future to everything passed to it.

I spent a little time optimizing exit-future yesterday so it should have minimal overhead now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code according to your suggestions, but FWIW I think I prefer the original version because it's easier to check, we just make sure that all call-sites of spawn are exit-guarded. I think longer-term we should go with @rphmeier's suggestion though.

service.network(),
);

executor.spawn(aura.select(service.on_exit()).then(|_| Ok(())));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as below.

@rphmeier rphmeier added A8-looksgood and removed A0-please_review Pull request needs code review. labels Dec 21, 2018
@rphmeier rphmeier merged commit f632893 into master Dec 21, 2018
@rphmeier rphmeier deleted the andre/fix-shutdown branch December 21, 2018 16:30
@0x7CFE
Copy link
Contributor

0x7CFE commented Dec 21, 2018

Most significantly this can lead to database corruption since it isn't closed properly.

@andresilva, interesting! In theory it may be the real cause of #1036. Need to recheck that issue.

gguoss pushed a commit to chainpool/substrate that referenced this pull request Jan 11, 2019
* node: remove grandpa authority flags

* node: exit-guard grandpa and aura spawned futures

* node: wait for futures to stop running on shutdown

* core: run connectivity tests on same ports

* core: pass on_exit future when starting aura and grandpa

* node: add issue number to todo

* core: fix aura and grandpa tests
gguoss pushed a commit to chainpool/substrate that referenced this pull request Jan 21, 2019
* node: remove grandpa authority flags

* node: exit-guard grandpa and aura spawned futures

* node: wait for futures to stop running on shutdown

* core: run connectivity tests on same ports

* core: pass on_exit future when starting aura and grandpa

* node: add issue number to todo

* core: fix aura and grandpa tests
gguoss pushed a commit to chainpool/substrate that referenced this pull request Jan 21, 2019
* node: remove grandpa authority flags

* node: exit-guard grandpa and aura spawned futures

* node: wait for futures to stop running on shutdown

* core: run connectivity tests on same ports

* core: pass on_exit future when starting aura and grandpa

* node: add issue number to todo

* core: fix aura and grandpa tests
gguoss pushed a commit to chainpool/substrate that referenced this pull request Jan 23, 2019
* node: remove grandpa authority flags

* node: exit-guard grandpa and aura spawned futures

* node: wait for futures to stop running on shutdown

* core: run connectivity tests on same ports

* core: pass on_exit future when starting aura and grandpa

* node: add issue number to todo

* core: fix aura and grandpa tests
gguoss pushed a commit to chainpool/substrate that referenced this pull request Feb 11, 2019
* node: remove grandpa authority flags

* node: exit-guard grandpa and aura spawned futures

* node: wait for futures to stop running on shutdown

* core: run connectivity tests on same ports

* core: pass on_exit future when starting aura and grandpa

* node: add issue number to todo

* core: fix aura and grandpa tests
MTDK1 pushed a commit to bdevux/substrate that referenced this pull request Apr 12, 2019
* node: remove grandpa authority flags

* node: exit-guard grandpa and aura spawned futures

* node: wait for futures to stop running on shutdown

* core: run connectivity tests on same ports

* core: pass on_exit future when starting aura and grandpa

* node: add issue number to todo

* core: fix aura and grandpa tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ctrl-C didn't work Panic "slog-scope: no logger set" Double panic on Ctrl+C exit during block sync
4 participants