A sophisticated real-time AI voice assistant leveraging DeepSeek R1's advanced reasoning capabilities for seamless conversational interactions through cutting-edge speech processing technology.
This enterprise-grade voice agent delivers an exceptional conversational AI experience by integrating:
- Advanced Speech Recognition: High-fidelity real-time transcription powered by AssemblyAI
- Intelligent Response Generation: DeepSeek R1's sophisticated reasoning and contextual understanding
- Natural Voice Synthesis: Professional-grade text-to-speech conversion via ElevenLabs
- Low-Latency Streaming: Optimized audio processing for responsive real-time interactions
- Contextual Memory: Persistent conversation context for coherent multi-turn dialogues
- Cross-Platform Compatibility: Seamless operation across macOS, Linux, and Windows environments
- High-Accuracy Transcription: Leverages AssemblyAI's state-of-the-art speech recognition
- Noise Reduction: Advanced audio preprocessing for optimal transcription quality
- Multi-Language Support: Extensible language processing capabilities
- DeepSeek R1 Integration: Harnesses the power of DeepSeek's 7B parameter model
- Contextual Understanding: Maintains conversation flow and context awareness
- Reasoning Capabilities: Advanced logical reasoning and problem-solving abilities
- Response Optimization: Character-limited responses for optimal real-time performance
- ElevenLabs Integration: Industry-leading text-to-speech synthesis
- Voice Customization: Configurable voice models and parameters
- Streaming Architecture: Low-latency audio streaming for immediate playback
- Audio Quality: High-fidelity 16kHz sample rate for crystal-clear output
- AssemblyAI Account: Register for free API access
- ElevenLabs Account: Create ElevenLabs account
Ollama Installation
# Download and install from official source
curl -fsSL https://ollama.com/install.sh | sh
PortAudio System Library
Ubuntu/Debian Systems:
sudo apt update && sudo apt install portaudio19-dev
macOS Systems:
brew install portaudio
Windows Systems: PortAudio is automatically included with Python package installation.
MPV Media Player (macOS Only)
brew install mpv
git clone https://github.com/bigdata5911/DeepSeek-R1-Voice-Agent.git
cd DeepSeek-R1-Voice-Agent
# Install required Python packages
pip install "assemblyai[extras]" ollama elevenlabs
# Download DeepSeek R1 model via Ollama
ollama pull deepseek-r1:7b
Edit AIVoiceAgent.py
and configure your API credentials:
# AssemblyAI Configuration
aai.settings.api_key = "YOUR_ASSEMBLYAI_API_KEY"
# ElevenLabs Configuration
self.client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")
python AIVoiceAgent.py
- Voice Input: Speak clearly into your microphone
- Processing Pipeline: Speech โ Transcription โ AI Analysis โ Response Generation
- Audio Output: AI response converted to speech and streamed in real-time
- Conversation Continuation: Context-aware dialogue progression
- Start: Execute the Python script to initiate the voice agent
- Stop: Press
Ctrl+C
to gracefully terminate the session
- AI Engine: DeepSeek R1 7B (configurable via Ollama)
- Voice Engine: ElevenLabs Turbo v2 (customizable)
- Response Constraints: 300-character limit for optimal performance
- Audio Specifications: 16kHz sample rate for professional quality
- System Prompt Modification: Adjust AI behavior and personality
- Response Length Tuning: Modify character limits for different use cases
- Voice Model Selection: Choose from ElevenLabs voice library
- Audio Stream Optimization: Fine-tune streaming parameters
Module Import Errors
# Reinstall AssemblyAI with extras
pip install "assemblyai[extras]"
Ollama Connection Issues
# Verify Ollama service status
ollama serve
# Check model availability
ollama list
Audio Device Problems
- Verify microphone permissions and access
- Confirm PortAudio installation integrity
- Test audio input with system applications
API Service Errors
- Validate API key authenticity and permissions
- Monitor API usage quotas and limits
- Ensure stable network connectivity
- Hardware Recommendations: Use high-quality microphone for optimal transcription
- Network Requirements: Stable broadband connection for API services
- System Resources: Close unnecessary applications to maximize performance
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Audio Input โโโโโถโ AssemblyAI โโโโโถโ DeepSeek R1 โ
โ (Microphone) โ โ (Speech-to- โ โ (AI Response โ
โโโโโโโโโโโโโโโโโโโ โ Text) โ โ Generation) โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ Audio Output โโโโโโ ElevenLabs โโโโโโโโโโโโโโโโ
โ (Speakers) โ โ (Text-to- โ
โโโโโโโโโโโโโโโโโโโ โ Speech) โ
โโโโโโโโโโโโโโโโ
This project is released under open source licensing. Please refer to the repository for complete license details.
We welcome contributions from the community! Please consider:
- Issue Reporting: Submit detailed bug reports and feature requests
- Code Contributions: Pull requests for improvements and new features
- Documentation: Help improve project documentation and guides
- Testing: Assist with testing and quality assurance
For technical assistance and community support:
- GitHub Issues: Open detailed issues for bug reports and feature requests
- Troubleshooting: Refer to the comprehensive troubleshooting section above
- API Documentation: Consult official documentation for AssemblyAI, Ollama, and ElevenLabs
- Community: Engage with the open-source community for best practices
Technical Note: This application requires persistent internet connectivity for API service integration and adequate system resources for local DeepSeek R1 model execution via Ollama.
Developer: @bigdata5911