Skip to content

A real-time, offline voice assistant for Linux and Raspberry Pi. Uses local LLMs (via Ollama), speech-to-text (Vosk), and text-to-speech (Piper) for fast, wake-free voice interaction. No cloud. No APIs. Just Python, a mic, and your voice.

License

Notifications You must be signed in to change notification settings

m15-ai/Local-Voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Local Voice Assistant (Offline, Real-Time AI)

Lightweight, low-latency voice assistant running fully offline on a Raspberry Pi or Linux machine.
Powered by PyAudio, Vosk STT, Piper TTS, and local LLMs via Ollama.

badge badge badge badge


🎯 Features

  • πŸŽ™οΈ Microphone Input using PyAudio
  • πŸ”Š Real-Time Transcription with Vosk
  • 🧠 LLM-Powered Responses using Ollama with models like gemma2:2b, qwen2.5:0.5b
  • πŸ—£οΈ Natural Voice Output via Piper TTS
  • πŸŽ›οΈ Optional Noise & Filter FX using SoX for realism
  • πŸ”§ ALSA Volume Control
  • 🧩 Modular Python code ready for customization

πŸ›  Requirements

  • Raspberry Pi 5 or Linux desktop
  • Python 3.9+
  • PyAudio, NumPy, requests, soxr, pydub, vosk
  • SoX + ALSA utilities
  • Ollama with one or more small LLMs (e.g., Gemma or Qwen)
  • Piper TTS with ONNX voice models

Install dependencies:

pip install pyaudio requests soxr numpy pydub vosk
sudo apt install sox alsa-utils

βš™οΈ JSON Configuration

Place a config file at va_config.json:

{
  "volume": 8,
  "mic_name": "Plantronics",
  "audio_output_device": "Plantronics",
  "model_name": "gemma2:2b",
  "voice": "en_US-kathleen-low.onnx",
  "enable_audio_processing": false,
  "history_length": 6,
  "system_prompt": "You are a helpful assistant."
}

Note: if the configuration file is not found, defaults withing the main python app will be used:

# ------------------- CONFIG FILE LOADING -------------------
DEFAULT_CONFIG = {
    "volume": 9,
    "mic_name": "Plantronics",
    "audio_output_device": "Plantronics",
    "model_name": "qwen2.5:0.5b",
    "voice": "en_US-kathleen-low.onnx",
    "enable_audio_processing": False,
    "history_length": 4,
    "system_prompt": "You are a helpful assistant."
}

πŸ” What history_length Means

The history_length setting controls how many previous exchanges (user + assistant messages) are included when generating each new reply.

  • A value of 6 means the model receives the last 6 exchanges, plus the system prompt.
  • This allows the assistant to maintain short-term memory for more coherent conversations.
  • Setting it lower (e.g., 2) increases speed and memory efficiency.

βœ… requirements.txt

pyaudio
vosk
soxr
numpy
requests
pydub

If you plan to run this on a Raspberry Pi, you may also need:

soundfile  # for pydub compatibility on some distros

🐍 Install with Virtual Environment

# 1. Clone the repo

git clone https://github.com/your-username/voice-assistant-local.git
cd voice-assistant-local

# 2. Create and activate a virtual environment

python3 -m venv env
source env/bin/activate

# 3. Install dependencies

pip install -r requirements.txt

# 4. Install SoX and ALSA utilities (if not already installed)

sudo apt install sox alsa-utils

# 5. (Optional) Test PyAudio installation

python -m pip install --upgrade pip setuptools wheel

πŸ’‘ If you get errors installing PyAudio on Raspberry Pi, try:

sudo apt install portaudio19-dev
pip install pyaudio

πŸ†• πŸ”§ Piper Installation (Binary)

Piper is a standalone text-to-speech engine used by this assistant. It's not a Python package, so it must be installed manually.

βœ… Install Piper

  1. Download the appropriate Piper binary from: πŸ‘‰ https://github.com/rhasspy/piper/releases

    For Ubuntu Linux, download: piper_linux_x86_64.tar.gz

  2. Extract it:

    tar -xvzf piper_linux_x86_64.tar.gz
    
  3. Move the binary into your project directory:

    mkdir -p bin/piper
    mv piper bin/piper/
    chmod +x bin/piper/piper
    
  4. βœ… Done! The script will automatically call it from bin/piper/piper.

πŸ“‚ Directory Example

voice_assistant.py
va_config.json
requirements.txt
bin/
└── piper/
    └── piper        ← (binary)
voices/
└── en_US-kathleen-low.onnx
└── en_US-kathleen-low.onnx.json

πŸ”Œ Finding Your USB Microphone & Speaker

To configure the correct audio devices, use these commands on your Raspberry Pi or Linux terminal:

  1. List Microphones (Input Devices)
python3 -m pip install pyaudio
python3 -c "import pyaudio; p = pyaudio.PyAudio(); \
[print(f'{i}: {p.get_device_info_by_index(i)}') for i in range(p.get_device_count())]"

Look for your microphone name (e.g., Plantronics) and use that as mic_name. 2. List Speakers (Output Devices)

aplay -l

Example output:

card 3: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]

Use this info to set your audio_output_device to something like:

"audio_output_device": "USB PnP"

πŸ”§ Ollama Installation (Required)

Ollama is a local model runner for LLMs. You need to install it separately (outside of Python).

πŸ’» Install Ollama

On Linux (x86 or ARM):

curl -fsSL https://ollama.com/install.sh | sh

Or follow detailed instructions: πŸ‘‰ https://ollama.com/download

Then start the daemon:

ollama serve

πŸ“₯ Download the Models

After Ollama is installed and running, open a terminal and run:

βœ… For Gemma 2B:
ollama run gemma2:2b
For Qwen 0.5B:
ollama run qwen2.5:0.5b

This will automatically download and start the models. You only need to run this once per model.

⚠️ Reminder

Ollama is not a Python package β€” it is a background service. Do not add it to requirements.txt. Just make sure it’s installed and running before launching the assistant.

🎀 Installing Piper Voice Models

To enable speech synthesis, you'll need to download a voice model (.onnx) and its matching config (.json) file.

βœ… Steps:

  1. Visit the official Piper voices list: πŸ“„ https://github.com/rhasspy/piper/blob/master/VOICES.md

  2. Choose a voice you like (e.g., en_US-lessac-medium or en_US-amy-low).

  3. Download both files for your chosen voice:

    • voice.onnx
    • config.json
  4. If you wish, you can rename the ONNX file and config file using the same base name. For example:

    amy-low.onnx
    amy-low.json
    
  5. Place both files in a directory called voices/ next to your script. Example Directory Structure:

    voice_assistant.py
    voices/
    β”œβ”€β”€ amy-low.onnx
    └── amy-low.json
    
  6. Update your config.json:

    "voice": "amy-low.onnx"
    

⚠️ Make sure both .onnx and .json are present in the voices/ folder with matching names (excluding the extension).

πŸ§ͺ Performance Report

The script prints out debug timing for the STT, LLM, and TTS parts of the pipeline. I asked ChatGPT4 to analyze some of the results i obtained.

System: Ubuntu laptop, Intel Core i5 Model: qwen2.5:0.5b (local via Ollama) TTS: piper with en_US-kathleen-low.onnx Audio: Plantronics USB headset


πŸ“Š Timing Metrics (avg)

Stage Metric (ms) Notes
STT Parse 4.5 ms avg Vosk transcribes near-instantly
LLM Inference ~2,200 ms avg Ranges from ~1s (short queries) to 5s
TTS Generation ~1,040 ms avg Piper ONNX performs well on CPU
Audio Playback ~7,250 ms avg Reflects actual audio length, not delay

βœ… Observations

  • STT speed is excellent β€” under 10 ms consistently.
  • LLM inference is snappy for a 0.5b model running locally. Your best response came in under 1.1 sec.
  • TTS is consistent and fast β€” Kathleen-low voice is fully synthesized in ~800–1600 ms.
  • Playback timing matches response length β€” no lag, just actual audio time.
  • End-to-end round trip time from speaking to hearing a reply is about 8–10 seconds, including speech and playback time.

πŸ’‘ Use Cases

  • ​ Offline smart assistants

  • ​ Wearable or embedded AI demos

  • ​ Voice-controlled kiosks

  • ​ Character-based roleplay agents

πŸ“„ License

MIT Β© 2024 M15.ai

About

A real-time, offline voice assistant for Linux and Raspberry Pi. Uses local LLMs (via Ollama), speech-to-text (Vosk), and text-to-speech (Piper) for fast, wake-free voice interaction. No cloud. No APIs. Just Python, a mic, and your voice.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages