This repository contains a collection of specialized, autonomous AI agents designed for various complex tasks. Each agent leverages Large Language Models (LLMs) combined with a specific set of tools to achieve its goals in a structured and observable manner. The agents are built using the Rigging and Dreadnode libraries for robust interaction and observability.
The following table provides a high-level overview and comparison of the agents available in this collection.
Agent | Description | Primary Use Case | Environment | Input Method | Key Tools |
---|---|---|---|---|---|
Dangerous Capabilities | Automatically build and run Capture The Flag (CTF) challenges | Reproduce Google's "Dangerous Capabilities" evaluation | Python | A selected challenge container | Kali, Rigging, Dreadnode |
Dotnet Reversing | Reverses and analyzes .NET binaries for vulnerabilities using an LLM. | Security analysis of .NET applications. | Python | Local .NET DLL/EXE files or NuGet package IDs. | dnlib , Rigging, Dreadnode |
Python Agent | Executes Python code in a sandboxed Docker environment to perform general tasks. | General-purpose code execution, data analysis, automation. | Python, Docker | Natural language task, Docker image, volume mounts. | Docker, Jupyter Kernel, Rigging |
Sast Scanning | Benchmarks LLM performance on SAST by running them against code with known vulnerabilities. | Evaluating and comparing LLMs for security code review. | Python, Docker (optional) | Pre-defined code challenges from a local directory. | Rigging, LiteLLM, Dreadnode |
Sensitive Data | Scans various local or remote file systems (e.g., local, S3, GitHub) for sensitive data leaks. | Data governance and security auditing for exposed credentials/PII. | Python, fsspec |
fsspec -compatible URI (e.g., s3://... , github://... ). |
fsspec , Rigging, Dreadnode |
Below are brief descriptions of each agent with a link to their detailed README files.
This agent is designed to perform reverse engineering of .NET binaries. It can decompile .NET assemblies and use an LLM to analyze the resulting source code based on a user-defined task, such as "Find all critical security vulnerabilities."
> View Detailed README for Dotnet Reversing
A general-purpose agent that provides a sandboxed Jupyter environment inside a Docker container. It can execute Python code to accomplish a wide range of programmatic tasks, from data analysis to file manipulation, based on a natural language prompt.
> View Detailed README for Python Agent
This agent is a specialized framework for evaluating the security analysis capabilities of LLMs. It runs "challenges" where the model must find known, predefined vulnerabilities in a codebase. The agent scores the model's performance, providing a quantitative way to benchmark different models for SAST.
> View Detailed README for Sast Scanning
An autonomous agent that explores and analyzes file systems to find and report sensitive data like credentials, API keys, and personal information. Leveraging fsspec
, it can operate on local files, cloud storage (AWS S3, GCS), and remote repositories (GitHub).
> View Detailed README for Sensitive Data Extraction
While each agent has its own specific command-line arguments, they share a common setup:
- Installation: Each agent is a Python application. Dependencies can be installed via
pip
. - LLM Configuration: The agents use
litellm
to connect to various LLMs. You must configure the appropriate environment variables for the model you intend to use (e.g.,OPENAI_API_KEY
,ANTHROPIC_API_KEY
). - Observability: To enable detailed logging, tracing, and metrics, you can configure the agents to connect to a Dreadnode server by providing a server URL and token.
All examples share the same project and dependencies, you setup the virtual environment with uv:
uv sync
For all agents, LLMs are usually specified with a --model
argument, which is passed directly to our Rigging library.
You can read details about different ways to connect to providers, self-hosted servers, or even in-process local models in the docs
Usually, the obvious identifier works out of the box:
gpt-4.1
claude-4-sonnet-latest
ollama/llama3-70b
- You can pass API keys by setting the associated env var (
OPENAI_API_KEY
) or by adding,api_key=...
to your model string. - If you need to control which endpoint the model uses, you can add
,api_base=http://<host>:<port>
to the model string. - As noted in the Rigging docs, these model strings also support properties like
temperature
andtop_k
as needed.
Rigging uses LiteLLM underneath more most LLMs, and you can use their docs to find edge cases for specific providers.
A basic agent with access to a dockerized Jupyter kernel to execute code safely.
uv run -m python_agent --help
- Provided a task (
--task
), begin a generation loop with access to the Jupyter kernel - The work directory (
--work-dir
) is mounted into the container, along with any other docker-style volumes (--volumes
) - When finished, the agent markes the task as complete with a status and summary
- The work directory is logged as an artifact for the run
Based on research from Google DeepMind, this agent works to solve a variety of CTF challenges given access to execute bash commands on a network-local Kali linux container.
uv run -m dangerous_capabilities --help
The harness will automatically build all the containers with the supplied flag, and load them as needed to ensure they are network-isolated from each other. The process is generally:
- For each challenge, produce P agent tasks where P = parallelism
- For all agent tasks, run them in parallel capped at your concurrency setting
- Inside each task, bring up the associated environment
- Continue requesting the next command from the inference model - execute it in the
env
container - If the flag is ever observed in the output, exit
- Otherwise run until an error, give up, or max-steps is reached
Check out ./dangerous_capabilities/challenges/challenges.json to see all the environments and prompts.
This agent is provided access to Cecil and ILSpy for use in reversing and analyzing Dotnet managed binaries for vulnerabilities.
uv run -m dotnet_reversing --help
You can provide a path containing binaries (recursively), and a target vulnerability term that you would like the agent to search for. The tool suite provided to the agent includes:
- Search for a term in target modules to identify functions of interest
- Decompile individual methods, types, or entire modules
- Collect all call flows which lead to a target method in all supplied binaries
- Report a vulnerability finding with associated path, method, and description
- Mark a task as complete with a summary
- Give up on a task with a reason
You can also specify the path as a Nuget package identifier and pass --nuget
to the agent. It
will download the package, extract the binaries, and run the same analysis as above.
# Local (with provided example binaries)
uv run -m dotnet_reversing --model <model> --path dotnet_reversing/example_binaries/flag_protocol
uv run -m dotnet_reversing --model <model> --path dotnet_reversing/example_binaries/harmony
# Nuget
uv run -m dotnet_reversing --model <model> --path <nuget-package-id> --nuget
This agent is provided access to a filsystem tool based on fsspec for use in extracting sensitive data stored in files.
uv run -m sensitive_data_extraction --help
The agent is granted some maximum step count to operate tools, query and search files, and provide
reports of any sensitive data it finds. With the help of fsspec
, the agent can operate on
local files, Github repos, S3 buckets, and other cloud storage systems.
# Local
uv run -m sensitive_data_extraction --model <model> --path /path/to/local/files
# S3
uv run -m sensitive_data_extraction --model <model> --path s3://bucket
# Azure
uv run -m sensitive_data_extraction --model <model> --path azure://container
# GCS
uv run -m sensitive_data_extraction --model <model> --path gcs://bucket
# Github
uv run -m sensitive_data_extraction --model <model> --path github://owner:repo@/
Check out the their docs for more options:
- https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
- https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations
This agent is designed to perform static code analysis to identify security vulnerabilities in source code. It uses a combination of direct file access and container-based approaches to analyze code for common security issues.
uv run -m sast_scanning --help
The agent systematically examines codebases using either direct file access or an isolated container environment. It can:
- Execute targeted analysis commands to search through source files
- Report detailed findings with vulnerability location, type, and severity
- Support various programming languages through configurable extensions
- Operate in two modes: "direct" (filesystem access) or "container" (isolated analysis)
- Challenges and vulnerability patterns are defined in YAML configuration files, allowing for flexible targeting of specific security issues across different codebases.
The agent tracks several key metrics to evaluate performance:
- valid_findings: Count of correctly identified vulnerabilities matching expected issues
- raw_findings: Total number of potential vulnerabilities reported by the model
- coverage: Percentage of known vulnerabilities successfully identified
- duplicates: Count of repeatedly reported vulnerabilities
Findings are scored using a weighted system that prioritizes matching the correct vulnerability name (3x), function (2x), and line location (1x) to balance semantic accuracy with positional precision.
# Run in direct mode (default)
uv run -m sast_scanning --model <model> --mode direct
# Run in container mode (isolated environment)
uv run -m sast_scanning --model <model> --mode container
# Run a specific challenge
uv run -m sast_scanning --model <model> --mode container --challenge <challenge-name>
# Customize analysis parameters
uv run -m sast_scanning --model <model> --max-steps 50 --timeout 60