Skip to content

app-sre/automated-actions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Automated Actions πŸš€

Ruff uv FastAPI Celery License PyPI - Version PyPI - Python Version

Welcome to the Automated Actions project! πŸ‘‹ This system provides a toolset for predefined actions that can be self-serviced by tenants or automatically triggered by events.

πŸ“š Table of Contents

🎯 Problem Statement

AppSRE tenants regularly require manual intervention from AppSRE for various operational tasks. This project aims to establish a toolset allowing a set of predefined actions to be either self-serviced by tenants or automatically triggered by events (e.g., alerts). This reduces manual workload and improves response times. ⏱️

See

✨ Goals

  • 🧩 Extensibility: Easily add new automated actions.
  • πŸ” Discoverability: Actions should be stored and discoverable by users.
  • πŸ›‘οΈ Security: Implement robust authentication and authorization.
  • βš™οΈ Flexible Triggers: Support manual (user-initiated) and automated (event-driven, e.g., AlertManager, cron) triggers.
  • πŸ“Š Output Accessibility: Action outputs (logs, dumps) must be accessible to the requester.
  • 🚦 Throttling: Provide mechanisms to prevent system abuse or overload.

🌟 Key Features

  • πŸš€ Flexible Action Initiation: Supports both manual self-service by users and automated triggering by system events (e.g., alerts).
  • πŸ–₯️ Centralized API Control: A dedicated API server for managing, tracking, and orchestrating all automated actions.
  • πŸ’¨ Scalable Asynchronous Processing: Employs a robust task queue system for efficient and reliable execution of actions in the background.
  • πŸ’Ύ Robust State Management & Throttling: Persistently stores action status and enforces operational limits to ensure system stability.
  • πŸ”’ Enterprise-Grade Security: Implements strong authentication (e.g., OIDC via Red Hat SSO) and fine-grained, configuration-driven authorization.
  • πŸ“œ Declarative Action & Policy Definition: System behavior, including available actions, permissions, and operational parameters, is defined via externalized configuration (app-interface).
  • ⌨️ Dedicated User Interface: Offers a command-line tool (CLI) for tenants to easily trigger and monitor actions.

πŸ—οΈ Architecture Overview

The system is centered around a FastAPI server that orchestrates actions triggered by users or automated systems.

flowchart TD
    CLI["CLI, AlertManager, Qontract-Reconcile"] --> FastAPIServer["automated-actions API Server"]
    FastAPIServer -- "authentication" <--> RHSSO["Red Hat SSO (OIDC)"]
    CLI -- "authentication" <--> RHSSO["Red Hat SSO (OIDC)"]
    FastAPIServer -- "authorization" <--> OPA["Open Policy Agent (OPA)"] <--> AppInterfaceAuthZ["app-interface <br/> (Action Permissions)"]

    FastAPIServer -- "Stores Action Details" --> DynamoDB["DynamoDB (Action Store)"]
    FastAPIServer -- "Enqueues Task" --> TaskQueue["Task Queue (Celery/SQS)"]

    TaskQueue -- "Worker Consumes Task" --> CeleryWorkers["Celery Workers <br/> (Action Logic)"]
    CeleryWorkers -- "Reads Config" --> AppInterface["app-interface <br/> Application Data"]
    CeleryWorkers -- "Fetches Secrets" --> Vault["Vault"]
    CeleryWorkers -- "execute actions" --> TargetSystems["Target Systems (e.g., AWS, OpenShift)"]
    CeleryWorkers -- "Updates Status" --> DynamoDB
Loading
  1. FastAPI Server (automated-actions):
    • The central component.
    • Receives requests from users via the CLI or from AlertManager (webhooks).
    • Handles authentication via Red Hat SSO (OIDC).
    • Performs authorization based on configurations in app-interface.
    • Validates requests and checks throttling limits (using DynamoDB).
    • Assigns a unique ID, stores the action details in DynamoDB, and enqueues the task in Celery via AWS SQS.
    • Provides endpoints to query action status.
  2. CLI (Command Line Interface):
    • The primary tool for users/tenants to interact with the FastAPI server to trigger actions.
  3. Task Queue (Celery & SQS):
    • Manages asynchronous execution of actions. FastAPI sends tasks to SQS, and Celery workers pick them up.
  4. DynamoDB:
    • Used by FastAPI and Celery to store action requests, their status, and timestamps for throttling.
  5. Celery Workers:
    • Execute the logic for each predefined action.
    • Fetch necessary configuration from app-interface and secrets/credentials from Vault.
    • Interact with target systems to perform actions (e.g., AWS RDS reboot, OpenShift Pod restart).
    • Update action status in DynamoDB.
  6. Configuration (app-interface):
    • Defines action permissions (authorization rules), and throttling parameters.

πŸ”‘ Key Concepts

🏷️ Action Types

  1. Automated Actions: Triggered by end-users (typically AppSRE tenants) via the CLI.
    • Example: Restarting a specific deployment.
  2. Automatic Remediations: (not implemented yet) Triggered by alerting systems (e.g., AlertManager webhooks).
    • Example: Automatically restarting a stuck integration based on an alert.

🧩 Packages Overview

This project is structured into several key packages, each with a distinct role:

πŸ“¦ automated_actions

The heart of the system! πŸ’ͺ This package contains the FastAPI server application and the Celery task queue. It exposes the API endpoints, handles incoming requests, orchestrates task queuing, and manages the state of automated actions.

πŸ€– automated_actions_client

Your friendly Python helper for talking to the API! 🐍 This is an auto-generated Python HTTP client, created directly from the project's OpenAPI (Swagger) documentation. It simplifies programmatic interaction with the automated_actions server.

πŸ’» automated_actions_cli

The command center for users! πŸš€ This package provides the Command Line Interface (CLI). Tenants and SREs use this tool to trigger actions, check their status, and interact with the automated actions system from their terminals.

πŸ›‘οΈ opa

Guardian of the gates! πŸ—οΈ This directory houses the Open Policy Agent (OPA) Rego files. These files define the authorization and throttling policies and rules that determine who can perform which actions under what conditions, ensuring secure and controlled operations.

πŸ§ͺ integration_test

Putting it all together! πŸ”¬ This package contains the integration tests. These tests verify that the different components of the system (API, workers, database, etc.) work correctly in concert, ensuring end-to-end functionality.

πŸ› οΈ automated_actions_utils

The shared toolbox! πŸ”§ This package provides common utility functions and API implementations used across other packages. This includes convenient wrappers for interacting with services like HashiCorp Vault (for secrets) and AWS APIs (e.g., for SQS, DynamoDB), promoting code reuse and consistency.

πŸ› οΈ Tech Stack

  • Backend API: Python 🐍, FastAPI πŸš€
  • Task Queue: Celery 🐘
  • Message Broker: AWS SQS βœ‰οΈ
  • Database: AWS DynamoDB πŸ—‚οΈ
  • Authentication/Authorization: Red Hat SSO (OIDC) πŸ”‘, Open Policy Agent (OPA) πŸ›‘οΈ
  • Linting/Formatting: Ruff ✨
  • Package Management: uv πŸ“¦

βš™οΈ Configuration

Action permissions and throttling limits are defined in the app-interface. This declarative approach allows for centralized management and easy auditing of system behavior. The qontract-reconcile automated-actions-config integration transforms these configurations into the OPA policy files.

🎬 Action Overview

This project provides a set of predefined actions that can be triggered by users or automatically by the system. Each action is designed to perform specific tasks on target systems, such as restarting workloads in OpenShift or rebooting AWS RDS instances.

βœ… Available Actions

Refer to actions.md for a detailed list of available actions, their parameters, and usage examples.

πŸ“ Planned Actions

  • πŸ“„ Getting application configuration and logs.
  • πŸ’Ύ Obtaining heap dumps.
  • βš™οΈ Automatically restarting stuck integrations.
  • πŸ€– Recycling stuck Jenkins workers.

πŸš€ Development Setup

πŸ“‹ Prerequisites

  • Python (see .python-version or pyproject.toml for specific version)
  • uv (for Python environment and package management)
  • make
  • AWS CLI configured (for local DynamoDB/SQS interaction if not using mocks, and for deployment)
  • Access to relevant qontract-server and Vault instances (for action execution logic)
  • Docker (for running local dependencies like LocalStack)

πŸ› οΈ Setting up the Environment

  1. Clone the repository:

    git clone https://github.com/chassing/automated-actions.git
    cd automated-actions
  2. Set up the development environment: This command creates/updates a virtual environment using uv and installs dependencies.

    make dev-env
  3. Activate the virtual environment:

    source .venv/bin/activate

βš™οΈ Local Configuration

Configure all required settings in a local settings.conf file in the project root directory (ignored by git).

Please find and use predefined setting files in Vault:

  • AppSRE: Use app-sre-only-settings
  • Other Red Hat employees: TODO: Create a similar settings file in Vault for other Red Hat employees.

Refer to the settings documentation for details on all available automated-actions settings.

▢️ Running the Application (Locally)

Use the provided docker-compose.yml file to run the application and its dependencies (like LocalStack for AWS services) locally.

docker-compose up automated-actions # Or 'docker-compose up -d' for detached mode

This will typically start the FastAPI server, Celery workers, OPA, and mock AWS services. Check the docker-compose.yml for specific service names and configurations.

πŸ”¬ Testing

Run the main test suite from the root directory:

make test

Each package may also have its own test suite. To run tests for a specific package:

cd packages/automated_actions
make test

Linting and Formatting

This project uses Ruff for linting and formatting. To check for issues:

make format

Release

automated-actions-client and automated-actions-cli are released to PyPI. To release a new versions:

  1. Recreate the automated-actions-client package after changes to the OpenAPI spec.

    make generate-client
  2. Update the version numbers in automated_actions_client/pyproject.toml. automated_actions_cli/pyproject.toml.

  3. Create and merge a PR.

πŸ“š Development Guides

Feel free to explore our development guides to enhance your understanding and contribution to the Automated Actions project!

πŸ’‘ Usage Examples

⌨️ CLI Usage

Action: Restart an OpenShift Deployment. (Assuming automated-actions-cli is installed or you entered your local development virtual environment)

automated-actions openshift-workload-restart --cluster <CLUSTER_NAME> --namespace <NAMESPACE_NAME> --kind Pod --name <POD_NAME>

Refer to the automated_actions_cli package README.md for detailed usage instructions.

☁️ Deployment

Deployment is managed using OpenShift templates. Refer to the openshift directory in the root of this repository for specific templates and deployment instructions.

🀝 Contributing

Contributions are welcome! πŸŽ‰ Please follow standard practices:

  1. Fork the repository.
  2. Create a feature branch (git checkout -b feature/your-amazing-feature).
  3. Make your changes.
  4. Ensure tests pass (make test).
  5. Ensure code is linted and formatted (make format).
  6. Commit your changes (git commit -m 'feat: Add some amazing feature').
  7. Push to the branch (git push origin feature/your-amazing-feature).
  8. Open a Pull Request.

πŸ“œ License

This project is licensed under the terms of the Apache 2.0 license.

About

The automated actions server

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •