DeepSeek R1 AI Voice Agent

A sophisticated real-time AI voice assistant leveraging DeepSeek R1's advanced reasoning capabilities for seamless conversational interactions through cutting-edge speech processing technology.

🎯 Project Overview

This enterprise-grade voice agent delivers an exceptional conversational AI experience by integrating:

Advanced Speech Recognition: High-fidelity real-time transcription powered by AssemblyAI
Intelligent Response Generation: DeepSeek R1's sophisticated reasoning and contextual understanding
Natural Voice Synthesis: Professional-grade text-to-speech conversion via ElevenLabs
Low-Latency Streaming: Optimized audio processing for responsive real-time interactions
Contextual Memory: Persistent conversation context for coherent multi-turn dialogues
Cross-Platform Compatibility: Seamless operation across macOS, Linux, and Windows environments

🚀 Core Capabilities

Real-Time Speech Processing

High-Accuracy Transcription: Leverages AssemblyAI's state-of-the-art speech recognition
Noise Reduction: Advanced audio preprocessing for optimal transcription quality
Multi-Language Support: Extensible language processing capabilities

Intelligent AI Responses

DeepSeek R1 Integration: Harnesses the power of DeepSeek's 7B parameter model
Contextual Understanding: Maintains conversation flow and context awareness
Reasoning Capabilities: Advanced logical reasoning and problem-solving abilities
Response Optimization: Character-limited responses for optimal real-time performance

Professional Audio Output

ElevenLabs Integration: Industry-leading text-to-speech synthesis
Voice Customization: Configurable voice models and parameters
Streaming Architecture: Low-latency audio streaming for immediate playback
Audio Quality: High-fidelity 16kHz sample rate for crystal-clear output

📋 System Requirements

Essential Dependencies

API Services

AssemblyAI Account: Register for free API access
ElevenLabs Account: Create ElevenLabs account

Core Software Components

Ollama Installation

# Download and install from official source
curl -fsSL https://ollama.com/install.sh | sh

PortAudio System Library

Ubuntu/Debian Systems:

sudo apt update && sudo apt install portaudio19-dev

macOS Systems:

brew install portaudio

Windows Systems: PortAudio is automatically included with Python package installation.

MPV Media Player (macOS Only)

brew install mpv

🛠️ Installation Guide

Step 1: Repository Setup

git clone https://github.com/bigdata5911/DeepSeek-R1-Voice-Agent.git
cd DeepSeek-R1-Voice-Agent

Step 2: Python Environment Configuration

# Install required Python packages
pip install "assemblyai[extras]" ollama elevenlabs

Step 3: AI Model Deployment

# Download DeepSeek R1 model via Ollama
ollama pull deepseek-r1:7b

Step 4: API Configuration

Edit AIVoiceAgent.py and configure your API credentials:

# AssemblyAI Configuration
aai.settings.api_key = "YOUR_ASSEMBLYAI_API_KEY"

# ElevenLabs Configuration
self.client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")

🎮 Usage Instructions

Launching the Voice Agent

python AIVoiceAgent.py

Interaction Workflow

Voice Input: Speak clearly into your microphone
Processing Pipeline: Speech → Transcription → AI Analysis → Response Generation
Audio Output: AI response converted to speech and streamed in real-time
Conversation Continuation: Context-aware dialogue progression

Session Management

Start: Execute the Python script to initiate the voice agent
Stop: Press Ctrl+C to gracefully terminate the session

⚙️ Advanced Configuration

Model Parameters

AI Engine: DeepSeek R1 7B (configurable via Ollama)
Voice Engine: ElevenLabs Turbo v2 (customizable)
Response Constraints: 300-character limit for optimal performance
Audio Specifications: 16kHz sample rate for professional quality

Customization Options

System Prompt Modification: Adjust AI behavior and personality
Response Length Tuning: Modify character limits for different use cases
Voice Model Selection: Choose from ElevenLabs voice library
Audio Stream Optimization: Fine-tune streaming parameters

🔧 Troubleshooting Guide

Common Issues and Solutions

Module Import Errors

# Reinstall AssemblyAI with extras
pip install "assemblyai[extras]"

Ollama Connection Issues

# Verify Ollama service status
ollama serve

# Check model availability
ollama list

Audio Device Problems

Verify microphone permissions and access
Confirm PortAudio installation integrity
Test audio input with system applications

API Service Errors

Validate API key authenticity and permissions
Monitor API usage quotas and limits
Ensure stable network connectivity

Performance Optimization

Hardware Recommendations: Use high-quality microphone for optimal transcription
Network Requirements: Stable broadband connection for API services
System Resources: Close unnecessary applications to maximize performance

🏗️ Technical Architecture

┌─────────────────┐    ┌──────────────┐    ┌─────────────────┐
│   Audio Input   │───▶│  AssemblyAI  │───▶│   DeepSeek R1   │
│   (Microphone)  │    │ (Speech-to-  │    │ (AI Response    │
└─────────────────┘    │  Text)       │    │  Generation)    │
                       └──────────────┘    └─────────────────┘
                                                      │
┌─────────────────┐    ┌──────────────┐              │
│   Audio Output  │◀───│  ElevenLabs  │◀─────────────┘
│   (Speakers)    │    │ (Text-to-    │
└─────────────────┘    │  Speech)     │
                       └──────────────┘

📄 License Information

This project is released under open source licensing. Please refer to the repository for complete license details.

🤝 Contribution Guidelines

We welcome contributions from the community! Please consider:

Issue Reporting: Submit detailed bug reports and feature requests
Code Contributions: Pull requests for improvements and new features
Documentation: Help improve project documentation and guides
Testing: Assist with testing and quality assurance

📞 Support Resources

For technical assistance and community support:

GitHub Issues: Open detailed issues for bug reports and feature requests
Troubleshooting: Refer to the comprehensive troubleshooting section above
API Documentation: Consult official documentation for AssemblyAI, Ollama, and ElevenLabs
Community: Engage with the open-source community for best practices

Technical Note: This application requires persistent internet connectivity for API service integration and adequate system resources for local DeepSeek R1 model execution via Ollama.

Developer: @bigdata5911

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
AIVoiceAgent.py		AIVoiceAgent.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepSeek R1 AI Voice Agent

🎯 Project Overview

🚀 Core Capabilities

Real-Time Speech Processing

Intelligent AI Responses

Professional Audio Output

📋 System Requirements

Essential Dependencies

API Services

Core Software Components

🛠️ Installation Guide

Step 1: Repository Setup

Step 2: Python Environment Configuration

Step 3: AI Model Deployment

Step 4: API Configuration

🎮 Usage Instructions

Launching the Voice Agent

Interaction Workflow

Session Management

⚙️ Advanced Configuration

Model Parameters

Customization Options

🔧 Troubleshooting Guide

Common Issues and Solutions

Performance Optimization

🏗️ Technical Architecture

📄 License Information

🤝 Contribution Guidelines

📞 Support Resources

About

Uh oh!

Releases

Packages

Languages

License

bigdata5911/DeepSeek-R1-Voice-Agent

Folders and files

Latest commit

History

Repository files navigation

DeepSeek R1 AI Voice Agent

🎯 Project Overview

🚀 Core Capabilities

Real-Time Speech Processing

Intelligent AI Responses

Professional Audio Output

📋 System Requirements

Essential Dependencies

API Services

Core Software Components

🛠️ Installation Guide

Step 1: Repository Setup

Step 2: Python Environment Configuration

Step 3: AI Model Deployment

Step 4: API Configuration

🎮 Usage Instructions

Launching the Voice Agent

Interaction Workflow

Session Management

⚙️ Advanced Configuration

Model Parameters

Customization Options

🔧 Troubleshooting Guide

Common Issues and Solutions

Performance Optimization

🏗️ Technical Architecture

📄 License Information

🤝 Contribution Guidelines

📞 Support Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages