GitHub - aws-samples/sample-eks-troubleshooting-rag-chatbot

EKS Troubleshooting Assistant

An intelligent troubleshooting chatbot powered by Large Language Models (LLMs) to help support engineers diagnose and resolve issues with applications running on Amazon EKS.

The ingestion pipeline collects and processes application logs, Kubernetes events, and kubelet logs. These are then embedded using Amazon Bedrock and stored in OpenSearch for efficient retrieval.

Leveraging Retrieval-Augmented Generation (RAG), the agentic chatbot enhances LLM responses by incorporating relevant cluster data. It stands out by actively investigating issues—generating and executing read-only kubectl commands when necessary—offering detailed troubleshooting insights based on real-time cluster information and historical data.

Prerequisites

Before running this project, make sure you have the following installed:

Terraform
Docker or Finch

Setup and Execution

Step 1: Provision AWS Resources

First, you need to provision the necessary AWS resources.

Clone the repository:

git clone https://github.com/aws-samples/sample-eks-troubleshooting-rag-chatbot && cd eks-llm-troubleshooting/terraform/

[Optional: Required for Slack integration] Create terraform.tfvars file in the terraform directory for Slack webhook and channel name:

Example contents of terraform.tfvars
```
slack_webhook_url = "https://hooks.slack.com/services/[YOUR-WEBHOOK]"
slack_channel_name = "alert-manager-alerts"
```

Run install script to initialize and install terraform modules.

cd sample-eks-troubleshooting-rag-chatbot/terraform/

./install.sh

Configuration

Local Variables: The locals section in the Terraform script defines the region, VPC CIDR, and availability zones.
Tags: The provisioned resources are tagged with the blueprint name and the GitHub repository URL.

Step 2: Deploy Problem Pods for Testing

You can deploy problem pods into your EKS cluster to generate logs for testing. Use the provided bash script to deploy these pods:

./provision-delete-error-pods.sh -p db-migration

This script will create various pods that are likely to generate errors and logs, which the chatbot can then use for troubleshooting.

Step 3: Use the Chatbot

The Chatbot is running as a deployment in the Kubernetes cluster, you can use it to troubleshoot logs from the problematic pods you deployed earlier. The chatbot will fetch relevant logs based on the user's query and provide context-aware responses.

# Forward the Chatbot service port to your local machine
kubectl port-forward -n agentic-chatbot service/agentic-chatbot 7860:7860

Now you can access it with your preferred browser using the following URL: http://localhost:7860

Cleanup

# Destroy the Terraform resources
terraform destroy --auto-approve

Acknowledgments

This project uses:

Gradio for the user interface.
Terraform AWS EKS Blueprints as the basis for provisioning the infrastructure.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
apps/chatbot		apps/chatbot
static/images		static/images
terraform		terraform
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
oom-kill.sh		oom-kill.sh
provision-delete-error-pods.sh		provision-delete-error-pods.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EKS Troubleshooting Assistant

Prerequisites

Setup and Execution

Step 1: Provision AWS Resources

Configuration

Step 2: Deploy Problem Pods for Testing

Step 3: Use the Chatbot

Cleanup

Acknowledgments

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

aws-samples/sample-eks-troubleshooting-rag-chatbot

Folders and files

Latest commit

History

Repository files navigation

EKS Troubleshooting Assistant

Prerequisites

Setup and Execution

Step 1: Provision AWS Resources

Configuration

Step 2: Deploy Problem Pods for Testing

Step 3: Use the Chatbot

Cleanup

Acknowledgments

Security

License

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages