Skip to content

760_Implement cleanup handling for unexpected Krkn failures #819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

10sharmashivam
Copy link
Contributor

Ref Issue(#760)

Description

This PR implements cleanup handling for unexpected Krkn failures (Ctrl + C or process kill). It ensures proper cleanup of resources and collection of artifacts when Krkn is interrupted.

Changes

  • Added signal handlers in run_kraken.py for SIGINT and SIGTERM
  • Implemented artifact collection during cleanup (logs, events, scenario state)
  • Added cleanup() method to AbstractScenarioPlugin for scenario-specific cleanup
  • Added cleanup documentation in signal.md
  • Added cleanup status tracking

Testing

  • Tested signal handling with SIGINT and SIGTERM
  • Verified artifact collection during cleanup
  • Confirmed cleanup status is saved correctly

Documentation

Added documentation in signal.md explaining:

  • Cleanup process during unexpected failures
  • Artifact collection
  • Scenario-specific cleanup
  • Cleanup status tracking

Related Issues

Closes #760(#760)

- Add signal handlers for SIGINT and SIGTERM
- Implement artifact collection during cleanup
- Add cleanup method to AbstractScenarioPlugin
- Document cleanup process and configuration
- Save cleanup status for tracking

Signed-off-by: 10sharmashivam <[email protected]>
@10sharmashivam 10sharmashivam force-pushed the 760_Cleanup_receipes_krkn_failures branch from ef46c49 to 035047f Compare May 17, 2025 16:51
@10sharmashivam 10sharmashivam changed the title Implement cleanup handling for unexpected Krkn failures 760_Implement cleanup handling for unexpected Krkn failures May 17, 2025
Copy link

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about the full context either, so it would be helpful to have someone more familiar take another look.

I left a few nits and questions on the PR. Thanks!

"""
Handle cleanup when Krkn is interrupted or killed
"""
with cleanup_lock:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, another global variable for guarding is needed, since the callback might be trigger multiple time.

Suggested change
with cleanup_lock:
with cleanup_lock:
if cleanup_has_run:
return
cleanup_has_run = True

Comment on lines +52 to +60
try:
# Collect artifacts
collect_artifacts()
# Run scenario cleanup
run_scenario_cleanup()
# Save cleanup status
save_cleanup_status()
except Exception as e:
logging.error(f"Cleanup failed: {e}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we separate the try /catch for each cases?

logging.error(f"Cleanup failed: {e}")
finally:
logging.info("Cleanup completed")
sys.exit(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
If any of the above cases raise error, we should return 1 status code instead.

"scenario": current_scenario,
"cleanup_status": "completed"
}
with open("cleanup_status.json", "w") as f:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
How about get the cleanup status file path from the environment variable instead of hardcoded here?

@@ -34,6 +36,91 @@
ScenarioPluginNotFound,
)

# Global variables for cleanup
current_scenario = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the current_scenario is never set in the execution path (main entrypoint) ?

Copy link
Collaborator

@tsebastiani tsebastiani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @10sharmashivam, first of all thank you very much for your contribution. The overall proposed solution (ABC method for scenario specific cleanup) it's ok, the problem that I see is that the signal handling should be responsibility of the AbstractScenarioPlugin and not of the main method (run_krkn.py). If you see the API no exception must be propagated outside this block by design, so I envision that the signal handling and the call to the cleanup method should be done in this block and with a self.cleanup() call instead of reinstantiating the scenario through the factory orchestrating the handler from outside.Please check also #818 where I gave more details about the requirements.

@tsebastiani tsebastiani added hold hold before merging duplicate This issue or pull request already exists labels May 28, 2025
@tsebastiani
Copy link
Collaborator

duplicates #818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists hold hold before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup receipes to handle unexpected krkn failures
3 participants