Skip to content

A sophisticated real-time AI voice assistant leveraging DeepSeek R1's advanced reasoning capabilities for seamless conversational interactions through cutting-edge speech processing technology.

License

Notifications You must be signed in to change notification settings

bigdata5911/DeepSeek-R1-Voice-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DeepSeek R1 AI Voice Agent

A sophisticated real-time AI voice assistant leveraging DeepSeek R1's advanced reasoning capabilities for seamless conversational interactions through cutting-edge speech processing technology.

๐ŸŽฏ Project Overview

This enterprise-grade voice agent delivers an exceptional conversational AI experience by integrating:

  • Advanced Speech Recognition: High-fidelity real-time transcription powered by AssemblyAI
  • Intelligent Response Generation: DeepSeek R1's sophisticated reasoning and contextual understanding
  • Natural Voice Synthesis: Professional-grade text-to-speech conversion via ElevenLabs
  • Low-Latency Streaming: Optimized audio processing for responsive real-time interactions
  • Contextual Memory: Persistent conversation context for coherent multi-turn dialogues
  • Cross-Platform Compatibility: Seamless operation across macOS, Linux, and Windows environments

๐Ÿš€ Core Capabilities

Real-Time Speech Processing

  • High-Accuracy Transcription: Leverages AssemblyAI's state-of-the-art speech recognition
  • Noise Reduction: Advanced audio preprocessing for optimal transcription quality
  • Multi-Language Support: Extensible language processing capabilities

Intelligent AI Responses

  • DeepSeek R1 Integration: Harnesses the power of DeepSeek's 7B parameter model
  • Contextual Understanding: Maintains conversation flow and context awareness
  • Reasoning Capabilities: Advanced logical reasoning and problem-solving abilities
  • Response Optimization: Character-limited responses for optimal real-time performance

Professional Audio Output

  • ElevenLabs Integration: Industry-leading text-to-speech synthesis
  • Voice Customization: Configurable voice models and parameters
  • Streaming Architecture: Low-latency audio streaming for immediate playback
  • Audio Quality: High-fidelity 16kHz sample rate for crystal-clear output

๐Ÿ“‹ System Requirements

Essential Dependencies

API Services

Core Software Components

Ollama Installation

# Download and install from official source
curl -fsSL https://ollama.com/install.sh | sh

PortAudio System Library

Ubuntu/Debian Systems:

sudo apt update && sudo apt install portaudio19-dev

macOS Systems:

brew install portaudio

Windows Systems: PortAudio is automatically included with Python package installation.

MPV Media Player (macOS Only)

brew install mpv

๐Ÿ› ๏ธ Installation Guide

Step 1: Repository Setup

git clone https://github.com/bigdata5911/DeepSeek-R1-Voice-Agent.git
cd DeepSeek-R1-Voice-Agent

Step 2: Python Environment Configuration

# Install required Python packages
pip install "assemblyai[extras]" ollama elevenlabs

Step 3: AI Model Deployment

# Download DeepSeek R1 model via Ollama
ollama pull deepseek-r1:7b

Step 4: API Configuration

Edit AIVoiceAgent.py and configure your API credentials:

# AssemblyAI Configuration
aai.settings.api_key = "YOUR_ASSEMBLYAI_API_KEY"

# ElevenLabs Configuration
self.client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")

๐ŸŽฎ Usage Instructions

Launching the Voice Agent

python AIVoiceAgent.py

Interaction Workflow

  1. Voice Input: Speak clearly into your microphone
  2. Processing Pipeline: Speech โ†’ Transcription โ†’ AI Analysis โ†’ Response Generation
  3. Audio Output: AI response converted to speech and streamed in real-time
  4. Conversation Continuation: Context-aware dialogue progression

Session Management

  • Start: Execute the Python script to initiate the voice agent
  • Stop: Press Ctrl+C to gracefully terminate the session

โš™๏ธ Advanced Configuration

Model Parameters

  • AI Engine: DeepSeek R1 7B (configurable via Ollama)
  • Voice Engine: ElevenLabs Turbo v2 (customizable)
  • Response Constraints: 300-character limit for optimal performance
  • Audio Specifications: 16kHz sample rate for professional quality

Customization Options

  • System Prompt Modification: Adjust AI behavior and personality
  • Response Length Tuning: Modify character limits for different use cases
  • Voice Model Selection: Choose from ElevenLabs voice library
  • Audio Stream Optimization: Fine-tune streaming parameters

๐Ÿ”ง Troubleshooting Guide

Common Issues and Solutions

Module Import Errors

# Reinstall AssemblyAI with extras
pip install "assemblyai[extras]"

Ollama Connection Issues

# Verify Ollama service status
ollama serve

# Check model availability
ollama list

Audio Device Problems

  • Verify microphone permissions and access
  • Confirm PortAudio installation integrity
  • Test audio input with system applications

API Service Errors

  • Validate API key authenticity and permissions
  • Monitor API usage quotas and limits
  • Ensure stable network connectivity

Performance Optimization

  • Hardware Recommendations: Use high-quality microphone for optimal transcription
  • Network Requirements: Stable broadband connection for API services
  • System Resources: Close unnecessary applications to maximize performance

๐Ÿ—๏ธ Technical Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Audio Input   โ”‚โ”€โ”€โ”€โ–ถโ”‚  AssemblyAI  โ”‚โ”€โ”€โ”€โ–ถโ”‚   DeepSeek R1   โ”‚
โ”‚   (Microphone)  โ”‚    โ”‚ (Speech-to-  โ”‚    โ”‚ (AI Response    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  Text)       โ”‚    โ”‚  Generation)    โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                      โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚   Audio Output  โ”‚โ—€โ”€โ”€โ”€โ”‚  ElevenLabs  โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚   (Speakers)    โ”‚    โ”‚ (Text-to-    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  Speech)     โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“„ License Information

This project is released under open source licensing. Please refer to the repository for complete license details.

๐Ÿค Contribution Guidelines

We welcome contributions from the community! Please consider:

  • Issue Reporting: Submit detailed bug reports and feature requests
  • Code Contributions: Pull requests for improvements and new features
  • Documentation: Help improve project documentation and guides
  • Testing: Assist with testing and quality assurance

๐Ÿ“ž Support Resources

For technical assistance and community support:

  • GitHub Issues: Open detailed issues for bug reports and feature requests
  • Troubleshooting: Refer to the comprehensive troubleshooting section above
  • API Documentation: Consult official documentation for AssemblyAI, Ollama, and ElevenLabs
  • Community: Engage with the open-source community for best practices

Technical Note: This application requires persistent internet connectivity for API service integration and adequate system resources for local DeepSeek R1 model execution via Ollama.

Developer: @bigdata5911

About

A sophisticated real-time AI voice assistant leveraging DeepSeek R1's advanced reasoning capabilities for seamless conversational interactions through cutting-edge speech processing technology.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages