This repository contains a beginner-friendly AutoGen 0.4 tutorial demonstrating how to create an AI-powered multi-agent system with external API integrations (text-to-speech, image generation) and Ollama integration.
🎥 Watch the YouTube video here
📖 Read the blog post here
- 🤖 Multi-Agent System: Script Writer, Voice Actor, Graphic Designer, and Director agents working together.
- 🎙️ Text-to-Speech: Converts AI-generated text into voiceovers using ElevenLabs API.
- 🖼️ Image Generation: Creates AI-generated visuals using Stability AI.
- 🏡 Local LLM Support: Optional integration with Ollama for running AI models offline.
- 🎮 Interactive Console: Type prompts and watch agents generate content dynamically.
autogen-multi-agent-workflow/
│── tools.py # Utility functions (text-to-speech, image generation)
│── main.py # Entry point for running the workflow
│── .env # API keys (not included, create your own)
│── .gitignore #
│── requirements.txt # Dependencies for the project
│── README.md # Documentation
git clone https://github.com/gswithjeff/autogen-multi-agent-workflow.git
cd autogen-multi-agent-workflow
python -m venv venv
source venv/bin/activate
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
Create a .env
file and add your API keys:
OPENAI_API_KEY=your-openai-api-key
ELEVENLABS_API_KEY=your-elevenlabs-api-key
STABILITY_API_KEY=your-stability-ai-api-key
Before running the workflow, you'll need to create accounts for the following services:
- ElevenLabs (Text to Speech): Sign up here
- Stability AI (Image Generation): Sign up here
These accounts provide API access for text-to-speech and image generation, which are required for the agents to function.
python main.py
You’ll be prompted to enter a task, and the agents will collaborate to generate a script, voiceover, and images dynamically.
1️⃣ Script Writer Agent generates structured captions.
2️⃣ Voice Actor Agent converts text to speech.
3️⃣ Graphic Designer Agent creates images based on captions.
4️⃣ Director Agent orchestrates the final output.
Create a short AI-generated video about space exploration.
{
"topic": "Space Exploration",
"takeaway": "The future of space travel is closer than we think!",
"captions": [
"What lies beyond our galaxy?",
"Humans are reaching new frontiers.",
"AI is shaping space exploration.",
"New planets are waiting to be discovered.",
"The universe is limitless!"
]
}
✅ Voiceovers are generated
✅ Images are created
✅ Final video assembly is handled
- Use a different LLM: Swap OpenAI for Ollama to run locally. (See blog for more)
- Modify agent behaviors: Edit the
system_message
for each agent. - Integrate new APIs: Extend
tools.py
for additional functionality.
Pull requests are welcome! If you find issues or want to improve the workflow, feel free to open an issue.
This project is licensed under the MIT License.
You are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software,
as long as the original copyright notice and permission notice appear in all copies.
For full details, see the LICENSE file.
If you find this project helpful, ⭐️ star the repo and share your thoughts!