Skip to main content

Deploying an Offline Smart Voice Assistant End-to-End

Overview

This project demonstrates a fully local smart voice assistant designed for smart offices and smart spaces. It uses the ReSpeaker XVF3800 microphone array for high-quality voice capture, combined with on-device speech-to-text (STT) for accurate transcription. A local large language model (LLM) processes user queries intelligently without relying on the cloud, ensuring privacy and low latency. Text-to-speech (TTS) generates natural voice responses, enabling real-time interaction. The system is ideal for environments such as offices, malls, kiosks, and meeting rooms where secure, offline voice control is essential.

Hardware Required

ReSpeaker XVF3800 Jetson AGX Orin 32GB H01 Kit

Prepare the Devices

ReSpeaker XVF3800 – USB Firmware Installation

Ensure the ReSpeaker XVF3800 USB Mic Array is updated with the latest firmware before use.

  • Follow the official firmware update guide:

This step ensures stable USB audio input and compatibility with downstream speech processing pipelines.


NVIDIA Jetson AGX Orin – Initial Setup

If your Jetson AGX Orin is not yet set up, flash it with the appropriate JetPack version.

After flashing and booting into Ubuntu, update the system and install JetPack components:

sudo apt update
sudo apt install nvidia-jetpack

CUDA Environment Configuration

Check Installed CUDA Version

Verify which CUDA directories are available:

ls /usr/local

You should see a folder such as cuda, cuda-12.x, or similar.


Add CUDA Paths Permanently

Edit your shell configuration file:

nano ~/.bashrc

Add the following lines at the bottom (replace with your actual CUDA version):

# CUDA paths
export PATH=/usr/local/cuda-(your_version)/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-(your_version)/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Apply the changes:

source ~/.bashrc

Verify CUDA Installation

Confirm CUDA is correctly installed and accessible:

nvcc --version

If the CUDA version is displayed, GPU support is ready.


Install Whisper with GPU Support

Clone Whisper Repository

Whisper is built from source to enable CUDA acceleration.

Install required dependencies:

sudo apt-get install libsdl2-dev

Build Whisper with CUDA Enabled

From the whisper.cpp directory, run:

cmake -B build -DGGML_CUDA=1 -DWHISPER_SDL2=ON
cmake --build build -j --config Release

This compiles Whisper with GPU acceleration and SDL support.


Download Whisper Model

Download the Whisper model from Hugging Face:

  • Model:

    • ggml-base-q8_0.bin

Place the downloaded model inside the models/ directory:

whisper.cpp/models/

Run Whisper Server

Start the Whisper server with GPU support:

cd whisper.cpp
./build/bin/whisper-server \
-m models/ggml-base.en.bin \
--host 0.0.0.0 \
--port 8080 \
--gpu

This launches a real-time speech-to-text server accessible over the network.


Install Ollama for Local LLM Inference

Ollama officially supports NVIDIA Jetson devices and provides CUDA-accelerated local LLM execution.

Install Ollama using the official installer:

curl -fsSL https://ollama.com/install.sh | sh

Run the Gemma 3 model:

ollama run gemma3:4b

Smart Voice AI Assistant – Quick Start Guide

Architecture Summary

  1. Wake Word Detection – Listens continuously for a predefined activation phrase.
  2. Speech-to-Text (STT) – Converts user speech into text using a local speech recognition engine.
  3. RAG-powered LLM – Retrieves relevant context from a vector database and generates intelligent responses using a local LLM.
  4. Text-to-Speech (TTS) – Converts the generated response into natural-sounding speech.

All processing is performed locally to ensure low latency, data privacy, and offline capability.

Project repository:

Clone the repo

git clone https://github.com/KasunThushara/LocalVoiceAssistant.git

Quick Start

Install Dependencies

Ensure Python and required system dependencies are installed, then run:

pip install -r requirements.txt

Download a Text-to-Speech (TTS) Model

This project uses Piper TTS models. Below is an example using a male English voice (Amy):

# Example: female voice (amy)
wget -O models/en_US-amy-low.onnx \
https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx

wget -O models/en_US-amy-low.onnx.json \
https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx.json

You may replace this with any compatible Piper voice model as needed.


Download Embedding Model (One-Time Setup)

The embedding model is required for building the vector database used by the RAG pipeline.

python download_sentence_tf.py

This step only needs to be run once.


Build the Vector Database

Create or rebuild the vector database used for contextual retrieval:

python test_scripts/rebuild_vector.py

This process indexes your documents and prepares them for fast semantic search.


Run the Application

Start the Smart Voice AI Assistant:

python app.py

Once running, the system will listen for the wake word and respond to voice queries in real time.


References

Tech Support & Product Discussion

Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer several communication channels to cater to different preferences and needs.

Loading Comments...