-
Notifications
You must be signed in to change notification settings - Fork 128
760_Implement cleanup handling for unexpected Krkn failures #819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
760_Implement cleanup handling for unexpected Krkn failures #819
Conversation
- Add signal handlers for SIGINT and SIGTERM - Implement artifact collection during cleanup - Add cleanup method to AbstractScenarioPlugin - Document cleanup process and configuration - Save cleanup status for tracking Signed-off-by: 10sharmashivam <[email protected]>
ef46c49
to
035047f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure about the full context either, so it would be helpful to have someone more familiar take another look.
I left a few nits and questions on the PR. Thanks!
""" | ||
Handle cleanup when Krkn is interrupted or killed | ||
""" | ||
with cleanup_lock: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, another global variable for guarding is needed, since the callback might be trigger multiple time.
with cleanup_lock: | |
with cleanup_lock: | |
if cleanup_has_run: | |
return | |
cleanup_has_run = True |
try: | ||
# Collect artifacts | ||
collect_artifacts() | ||
# Run scenario cleanup | ||
run_scenario_cleanup() | ||
# Save cleanup status | ||
save_cleanup_status() | ||
except Exception as e: | ||
logging.error(f"Cleanup failed: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we separate the try /catch
for each cases?
logging.error(f"Cleanup failed: {e}") | ||
finally: | ||
logging.info("Cleanup completed") | ||
sys.exit(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
If any of the above cases raise error, we should return 1
status code instead.
"scenario": current_scenario, | ||
"cleanup_status": "completed" | ||
} | ||
with open("cleanup_status.json", "w") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
How about get the cleanup status file path from the environment variable instead of hardcoded here?
@@ -34,6 +36,91 @@ | |||
ScenarioPluginNotFound, | |||
) | |||
|
|||
# Global variables for cleanup | |||
current_scenario = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the current_scenario
is never set in the execution path (main
entrypoint) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @10sharmashivam, first of all thank you very much for your contribution. The overall proposed solution (ABC method for scenario specific cleanup) it's ok, the problem that I see is that the signal handling should be responsibility of the AbstractScenarioPlugin
and not of the main method (run_krkn.py). If you see the API no exception must be propagated outside this block by design, so I envision that the signal handling and the call to the cleanup method should be done in this block and with a self.cleanup()
call instead of reinstantiating the scenario through the factory orchestrating the handler from outside.Please check also #818 where I gave more details about the requirements.
duplicates #818 |
Ref Issue(#760)
Description
This PR implements cleanup handling for unexpected Krkn failures (Ctrl + C or process kill). It ensures proper cleanup of resources and collection of artifacts when Krkn is interrupted.
Changes
run_kraken.py
for SIGINT and SIGTERMcleanup()
method toAbstractScenarioPlugin
for scenario-specific cleanupsignal.md
Testing
Documentation
Added documentation in
signal.md
explaining:Related Issues
Closes #760(#760)