ROS2 Voice Pipeline on reSpeaker XVF3800

Introduction

pir

This project turns your voice into TurtleSim control using a reSpeaker XVF3800 microphone array and ROS2. Say “Hey Jarvis” followed by a command, and Jarvis records audio, transcribes it with Groq Whisper, interprets intent with Groq LLaMA, and responds with Groq Orpheus TTS. It supports forward/backward motion, angle turns, DoA-based facing, and instant stop commands. The system integrates wake-word detection, audio capture, ROS2 topics, and simulator command publishing in one seamless pipeline. It is designed for fast setup with Ubuntu and ROS2 Humble, making voice-driven robot experiments easy to run.

pir

Get One Now 🖱️

Part 1 — Install ROS2 Humble

Skip this part if ROS2 Humble is already installed on your machine. Check by running: ros2 --version

1.1 Set up the ROS2 apt repository

# Make sure your system is up to date
sudo apt update && sudo apt upgrade -y

# Install required tools
sudo apt install -y software-properties-common curl

# Add the ROS2 GPG key
sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key \
    -o /usr/share/keyrings/ros-archive-keyring.gpg

# Add the ROS2 repository to your sources
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] \
    http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) main" \
    | sudo tee /etc/apt/sources.list.d/ros2.list > /dev/null

1.2 Install ROS2 Humble Desktop

sudo apt update
sudo apt install -y ros-humble-desktop

The desktop variant includes TurtleSim, RViz, and all tools you need. This download is around 1 GB — it may take a few minutes.

1.3 Install build tools

sudo apt install -y python3-colcon-common-extensions python3-rosdep

1.4 Source ROS2 automatically on every terminal

echo "source /opt/ros/humble/setup.bash" >> ~/.bashrc
source ~/.bashrc

1.5 Verify the installation

ros2 --version
# Expected output: ros2 cli    version 0.18.x (or similar)

Then test TurtleSim to confirm everything works:

# Terminal 1
ros2 run turtlesim turtlesim_node

# Terminal 2
ros2 run turtlesim turtle_teleop_key

You should see a window with a turtle you can drive with arrow keys. Press Ctrl+C in both terminals when done.

Part 2 — Create a udev Rule for the Device

Create a new udev rule to ensure proper permissions for the ReSpeaker USB Mic Array:

sudo nano /etc/udev/rules.d/50-respeaker.rules

Add the following lines to the file:

# ReSpeaker USB Mic Array
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="0018", MODE="0666", GROUP="plugdev"
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666", GROUP="plugdev"

Reload udev Rules and Restart the Service

Reload the udev rules and restart the service for changes to take effect:

sudo udevadm control --reload-rules
sudo udevadm trigger
sudo service udev restart

Unplug and replug your reSpeaker USB Mic Array to apply the new rules.

Part 3 — Get a Groq API Key

This project uses Groq's free cloud API for:

Whisper — speech-to-text
LLaMA 3 — understanding your commands
Orpheus — Jarvis's voice reply

Go to console.groq.com and create a free account
Click API Keys in the left sidebar
Click Create API Key, give it a name (e.g. "jarvis"), and copy it
Save it somewhere safe — you will paste it into config.env shortly

The Groq free tier is generous enough for development and testing. No credit card is required to get started.

Part 4 — Install Python Dependencies (System-Wide)

Important: Do NOT use a virtual environment for this project. ROS2 uses the system Python and cannot see packages installed inside a venv. We install everything with --break-system-packages to make them available to ROS2 nodes.

pip install \
    groq \
    openwakeword \
    pyaudio \
    numpy<2 \
    python-dotenv \
    pyusb \

Download the "Hey Jarvis" wake word model

python3 -c "import openwakeword; openwakeword.utils.download_models()"

This downloads pre-trained models to ~/.openwakeword/. It takes about 30 seconds.

Install PortAudio (required by PyAudio)

sudo apt install -y portaudio19-dev python3-pyaudio

Part 5 — Create Your ROS2 Workspace

Skip to Part 6 if you already have a ~/ros2_ws workspace.

# Create the workspace directory
mkdir -p ~/ros2_ws/src
cd ~/ros2_ws

# Build the empty workspace to set it up
colcon build

# Source it and add to .bashrc so it loads automatically
echo "source ~/ros2_ws/install/setup.bash" >> ~/.bashrc
source ~/.bashrc

Part 6 — Clone and Configure the Project

6.1 Clone the repository

cd ~/ros2_ws/src
git clone https://github.com/KasunThushara/ros_voice_controller.git my_robot_controller

Your workspace should now look like this:

~/ros2_ws/
└── src/
    └── my_robot_controller/
        ├── my_robot_controller/
        │   ├── voice_node.py
        │   ├── rotate_doa.py
        │   ├── wakeword.py
        │   └── ...
        ├── launch/
        │   └── jarvis.launch.py
        └── config.env.example

6.2 Create your config file

cd ~/ros2_ws/src/my_robot_controller
cp config.env.example config.env
nano config.env

Open the file and fill in your values:

# ── Groq API (required) ────────────────────────────────
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   ← paste your key here

# ── Microphone ─────────────────────────────────────────
MIC_INDEX=1          ← find the correct number in Step 6.3 below
WAKEWORD_THRESHOLD=0.5
WAKEWORD_COOLDOWN=2

# ── Recording ──────────────────────────────────────────
RECORDING_SECONDS=4
SAMPLE_RATE=16000

# ── Models ─────────────────────────────────────────────
WAKEWORD_MODEL=hey jarvis
LLM_MODEL=llama-3.1-8b-instant
STT_MODEL=whisper-large-v3-turbo
TTS_MODEL=canopylabs/orpheus-v1-english
TTS_VOICE=autumn

Save with Ctrl+O, Enter, Ctrl+X.

6.3 Find your correct MIC_INDEX

Run this helper to list all audio input devices:

python3 -c "
import pyaudio
p = pyaudio.PyAudio()
print('\nAvailable INPUT devices:\n')
for i in range(p.get_device_count()):
    d = p.get_device_info_by_index(i)
    if d['maxInputChannels'] > 0:
        print(f'  [{i}]  {d[\"name\"]}')
        print(f'        channels={int(d[\"maxInputChannels\"])}  rate={int(d[\"defaultSampleRate\"])}Hz')
p.terminate()
"

Example output:

Available INPUT devices:

  [0]  HDA Intel PCH: ALC897 Analog
        channels=2  rate=44100Hz

  [1]  reSpeaker XVF3800
        channels=6  rate=16000Hz    ← this is the one you want

  [2]  USB PnP Sound Device
        channels=2  rate=16000Hz

Find the line that says reSpeaker or XVF3800. Note the number in brackets — that is your MIC_INDEX. Update config.env with that number.

Part 7 — Build the Package

cd ~/ros2_ws
colcon build --packages-select my_robot_controller
source ~/.bashrc

Expected output:

Starting >>> my_robot_controller
Finished <<< my_robot_controller [3.2s]

Summary: 1 package finished [3.5s]

You must run colcon build and source ~/.bashrc every time you change any Python file.

Part 8 — Run the Project

ros2 launch my_robot_controller jarvis.launch.py

You should see three processes start:

[turtlesim_node-1]   [INFO] Spawning turtle [turtle1] at x=[5.54], y=[5.54]
[angle_controller-2] [INFO] AngleController ready — listening on /target_angle
[voice_command-3]    [INFO] reSpeaker XVF3800 found — DoA ready
[voice_command-3]    [WakeWord] Listening on device 1 (6ch → mono) for 'hey jarvis' ...
[voice_command-3]    [INFO] Jarvis is listening ...

A window will open showing the turtle. Now speak:

"Hey Jarvis, move forward"

Jarvis will reply "Moving forward!" and the turtle will move.

Voice Command Reference

Say this	What happens
`"Hey Jarvis, move forward"`	Turtle moves forward 1 step
`"Hey Jarvis, move backward"`	Turtle moves backward 1 step
`"Hey Jarvis, turn left"`	Turtle rotates left 90°
`"Hey Jarvis, turn left 45"`	Turtle rotates left 45°
`"Hey Jarvis, turn right"`	Turtle rotates right 90°
`"Hey Jarvis, turn right 30 degrees"`	Turtle rotates right 30°
`"Hey Jarvis, turn to my direction"`	Turtle faces toward your voice (DoA)
`"Hey Jarvis, face me"`	Same as above
`"Hey Jarvis, turn to 90"`	Turtle rotates to absolute 90°
`"Hey Jarvis, face 180 degrees"`	Turtle rotates to absolute 180°
`"Hey Jarvis, spin around"`	Turtle does a full 360° spin
`"Hey Jarvis, do a 360"`	Same as above
`"Hey Jarvis, stop"`	Turtle stops immediately

Project File Structure

my_robot_controller/
│
├── my_robot_controller/          # Python package (ROS2 nodes)
│   ├── __init__.py
│   ├── voice_node.py             # Main voice pipeline node
│   ├── rotate_doa.py             # PID angle controller node
│   ├── wakeword.py               # Wake word detection (openwakeword)
│   ├── audio_recorder.py         # Mic recording after wake word
│   ├── stt.py                    # Speech-to-text (Groq Whisper)
│   ├── llm.py                    # Intent parsing (Groq LLaMA)
│   ├── tts.py                    # Text-to-speech (Groq Orpheus)
│   └── config.py                 # Loads settings from config.env
│
├── launch/
│   └── jarvis.launch.py          # Starts all 3 nodes together
│
├── config.env                    # Your secrets (not in git)
├── config.env.example            # Template — copy to config.env
├── package.xml
└── setup.py

How the nodes connect

reSpeaker XVF3800 (USB)
        │
        ├── [voice_command node]
        │     openwakeword → Groq Whisper → Groq LLaMA → Groq Orpheus
        │     │                                          │
        │     │ publishes /target_angle (Float32)        │ speaks reply
        │     │ publishes /turtle1/cmd_vel (Twist)       │
        │     │                                          ▼
        │     ▼                                     Speaker output
        └── [angle_controller node]
              subscribes /target_angle
              subscribes /turtle1/pose
              PID control → publishes /turtle1/cmd_vel
                                        │
                                        ▼
                              [turtlesim_node]

Configuration Reference

All settings live in config.env. Edit and rebuild to apply changes.

Variable	Default	Description
`GROQ_API_KEY`	(required)	Your Groq API key
`MIC_INDEX`	`1`	PyAudio device index of the reSpeaker
`WAKEWORD_MODEL`	`hey jarvis`	Wake word phrase
`WAKEWORD_THRESHOLD`	`0.5`	Detection sensitivity (0.0–1.0, lower = more sensitive)
`WAKEWORD_COOLDOWN`	`2`	Seconds before wake word can re-trigger
`RECORDING_SECONDS`	`4`	How long to record after wake word fires
`SAMPLE_RATE`	`16000`	Audio sample rate in Hz
`LLM_MODEL`	`llama-3.1-8b-instant`	Groq LLM model for intent parsing
`STT_MODEL`	`whisper-large-v3-turbo`	Groq Whisper model for transcription
`TTS_MODEL`	`canopylabs/orpheus-v1-english`	Groq TTS model
`TTS_VOICE`	`autumn`	Voice for speech output (`tara`, `leah`, `leo`, `dan`, `mia`)

Resources

Built with:

Seeed Studio reSpeaker XVF3800 — mic array with XMOS XVF3800
ROS2 Humble — robot middleware
openwakeword — local wake word detection
Groq — ultra-fast Whisper STT, LLaMA LLM, Orpheus TTS
TurtleSim — ROS2 turtle simulator

Tech Support & Product Discussion

Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer several communication channels to cater to different preferences and needs.

Introduction​

Part 1 — Install ROS2 Humble​

Part 2 — Create a udev Rule for the Device​

Part 3 — Get a Groq API Key​

Part 4 — Install Python Dependencies (System-Wide)​

Part 5 — Create Your ROS2 Workspace​

Part 6 — Clone and Configure the Project​

Part 7 — Build the Package​

Part 8 — Run the Project​

Voice Command Reference​

Project File Structure​

How the nodes connect​

Configuration Reference​

Resources​

Tech Support & Product Discussion​