Add Voice Interaction to Your SO-ARM10x with reSpeaker Flex

Overview

pir

The LeRobot SO-ARM Voice Controller lets you control a SO-ARM100 robotic arm using natural voice commands powered by AI. The system combines wake word detection, Groq Whisper speech-to-text, LLaMA 3 language understanding, and Orpheus text-to-speech to create a fully interactive hands-free robotics experience. Built on top of the LeRobot framework, it runs on Ubuntu x86 systems and NVIDIA Jetson devices using a ReSpeaker USB microphone array for voice input. Users can create custom arm poses, gestures, and conversational triggers to build intelligent robotic interactions for research, education, and robotics development.

Hardware Required

SO-ARM101	reSpeaker Flex XVF3800 Circular	reComputer Super J4012

Get One Now 🖱️	Get One Now 🖱️	Get One Now 🖱️

How It Works

You speak → Wake word detected → Audio recorded → Whisper STT → LLaMA LLM → Orpheus TTS speaks back → SO-ARM100 moves

Services Required

Service	Purpose	Cost
Groq	Whisper STT, LLaMA LLM, Orpheus TTS	Free tier is enough

Part 1 — Install LeRobot

Install Miniforge

For Jetson (ARM64):

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-aarch64.sh
chmod +x Miniforge3-Linux-aarch64.sh
./Miniforge3-Linux-aarch64.sh
source ~/.bashrc

For x86 Ubuntu 22.04:

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
source ~/.bashrc
conda init --all

Create the Conda Environment

conda create -y -n lerobot python=3.10
conda activate lerobot

Clone and Install LeRobot

git clone https://github.com/KasunThushara/lerobot
conda install ffmpeg -c conda-forge
cd lerobot
pip install -e ".[feetech]"

Part 2 — Set Up the Arms

Configure Motor IDs

Each servo needs a unique ID assigned before assembly. Follow the official guide: Configure the Motors

Assemble the Arms

Follow the assembly tutorial for the SO-ARM100: Assembly Guide

Find the USB Ports

Plug in each arm and run this utility to identify which port belongs to which arm:

lerobot-find-port

Run it once per arm (plug one in at a time). Note down the port paths — typically /dev/ttyACM0 and /dev/ttyACM1.

Calibrate Both Arms

Calibration maps raw motor values to normalized positions. Follow the guide for both the leader and follower arms: Calibration Guide

The calibration file will be saved automatically at:

~/.cache/huggingface/lerobot/calibration/robots/so_follower/<your_arm_id>.json

Part 3 — Set Up the Voice Controller

cd ~/lerobot/examples/voice_arm

Install Dependencies

# System dependency required for PyAudio
sudo apt-get install -y portaudio19-dev

pip install -r requirements.txt

Download the Wake Word Model

Downloads the pre-trained "Hey Jarvis" model from openwakeword into ~/.openwakeword/:

python download_model.py

Find Your Microphone Index

Plug in your ReSpeaker Flex, then run:

python list_mics.py

Example output:

Available audio INPUT devices:

  [0] bcm2835 Headphones      (rate=44100Hz)
  [1] ReSpeaker 4 Mic Array   (rate=16000Hz)
  [2] USB PnP Sound Device    (rate=16000Hz)

Note the index number next to your ReSpeaker — that's your MIC_INDEX.

Configure the Project

cp config.env.example config.env
nano config.env

At minimum, update these two values:

# Your Groq API key (required) — get one free at console.groq.com
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxxxxxx

# The number from list_mics.py
MIC_INDEX=1

Part 4 — Define Your Arm Actions

Step 1 — Read Current Joint Positions

Move the arm physically into a pose you want to save, then run:

python read_positions.py

The script prints live normalized joint values as you move the arm. When you're happy with the pose, press Ctrl+C and the final position will be printed for you to copy.

Step 2 — Add the Pose to `robot_arm.py`

Open robot_arm.py and find the ACTION_MAP dictionary. Add your pose:

pir

"my_custom_pose": _pose(**{
    "shoulder_pan.pos":  20.0,
    "shoulder_lift.pos": 40.0,
    "elbow_flex.pos":    60.0,
    "wrist_flex.pos":   -30.0,
    "gripper.pos":       80.0,
}),

For animated gestures (like a wave), use a list of poses — each step runs with ARM_GESTURE_DELAY between them:

"wave_hi": [
    _pose(**{"shoulder_lift.pos": -70.0, "wrist_flex.pos":  60.0, ...}),
    _pose(**{"shoulder_lift.pos": -70.0, "wrist_flex.pos": -60.0, ...}),
    _HOME,  # return to neutral
],

Step 3 — Update the LLM System Prompt in `llm.py`

Add your new action to the valid actions list and trigger rules so the LLM knows about it:

pir

Valid actions:
  my_custom_pose  = describe what it does

Trigger rules:
  - "your trigger phrase"  → my_custom_pose

Run the Voice Controller

Make sure your conda environment is active, then:

conda activate lerobot
python pipeline.py

You should see:

======================================================
  SO100 Arm Voice Controller — Ready
  Wake word  : hey jarvis
  LLM model  : llama-3.1-8b-instant
  STT model  : whisper-large-v3-turbo
  TTS voice  : autumn
  Arm port   : /dev/ttyACM0  id='my_awesome_follower_arm'
======================================================
[WakeWord] Listening for 'hey jarvis' ...

Now say "Hey Jarvis" and give a command!

Example Voice Commands

You say	What happens
"Hey Jarvis, open the gripper"	Gripper opens fully
"Hey Jarvis, grab it"	Gripper closes
"Hey Jarvis, go to pick up mode"	Arm moves to grasp pose
"Hey Jarvis, can you turn around"	Base rotates to the side
"Hey Jarvis, wave at the camera"	Arm waves, returns to neutral
"Hey Jarvis, go home"	All joints return to neutral

Project File Overview

examples/voice_arm/
├── pipeline.py          # Main entry point — orchestrates the full flow
├── robot_arm.py         # SO100 arm controller — add your poses here
├── llm.py               # LLM prompt — add your voice triggers here
├── wakeword.py          # Listens for "Hey Jarvis" in a background thread
├── audio_recorder.py    # Records audio after wake word fires
├── stt.py               # Sends audio to Groq Whisper → returns text
├── tts.py               # Sends reply to Groq Orpheus → plays audio
├── config.py            # Loads all settings from config.env
├── config.env.example   # Template — copy to config.env and fill in
├── read_positions.py    # Helper: read live joint positions for tuning poses
├── list_mics.py         # Helper: find your MIC_INDEX
└── download_model.py    # Downloads the openwakeword model files

Configuration Reference

Variable	Default	Description
`GROQ_API_KEY`	(required)	Your Groq API key
`WAKEWORD_MODEL`	`hey jarvis`	Wake word phrase
`MIC_INDEX`	`1`	PyAudio device index
`WAKEWORD_THRESHOLD`	`0.5`	Detection sensitivity (0.0–1.0)
`WAKEWORD_COOLDOWN`	`2`	Seconds between re-triggers
`RECORDING_SECONDS`	`3`	How long to record after wake word
`LLM_MODEL`	`llama-3.1-8b-instant`	Groq LLM model
`STT_MODEL`	`whisper-large-v3-turbo`	Groq Whisper model
`TTS_VOICE`	`autumn`	Voice for speech output
`ARM_PORT`	`/dev/ttyACM0`	Follower arm USB port
`ARM_ID`	`my_awesome_follower_arm`	Arm ID (matches calibration filename)
`ARM_MOVE_DELAY`	`1.5`	Seconds to wait after a pose move
`ARM_GESTURE_DELAY`	`0.4`	Seconds between gesture sequence steps

Troubleshooting

PyAudio fails to install Install the PortAudio system library first:

sudo apt-get install -y portaudio19-dev

Wake word never triggers Run list_mics.py again and confirm MIC_INDEX matches your ReSpeaker. Try lowering WAKEWORD_THRESHOLD to 0.3. Speak clearly within ~1 metre of the mic.

Arm not moving after a command Check that ARM_PORT is correct (lerobot-find-port). Verify the calibration file exists at ~/.cache/huggingface/lerobot/calibration/robots/so_follower/<ARM_ID>.json.

Arm moves to wrong position The default pose values in ACTION_MAP are starting estimates. Run read_positions.py, physically move the arm to the desired pose, and copy the printed values into robot_arm.py.

TTS / STT errors Double-check GROQ_API_KEY in config.env. The Groq free tier has rate limits — wait a few seconds between commands if you hit errors.

Audio plays but sounds distorted On Raspberry Pi, set audio output to the correct device via raspi-config → System Options → Audio.

Credits

Built with:

LeRobot — open-source robotics framework by Hugging Face
SO-ARM100 — low-cost open-source robotic arm by Seeed Studio
openwakeword — local wake word detection
Groq — ultra-fast Whisper STT, LLaMA LLM, and Orpheus TTS
ReSpeaker Flex — USB microphone array

Overview​

Hardware Required​

How It Works​

Services Required​

Part 1 — Install LeRobot​

Install Miniforge​

Create the Conda Environment​

Clone and Install LeRobot​

Part 2 — Set Up the Arms​

Configure Motor IDs​

Assemble the Arms​

Find the USB Ports​

Calibrate Both Arms​

Part 3 — Set Up the Voice Controller​

Install Dependencies​

Download the Wake Word Model​

Find Your Microphone Index​

Configure the Project​

Part 4 — Define Your Arm Actions​

Step 1 — Read Current Joint Positions​

Step 2 — Add the Pose to robot_arm.py​

Step 3 — Update the LLM System Prompt in llm.py​

Run the Voice Controller​

Example Voice Commands​

Project File Overview​

Configuration Reference​

Troubleshooting​

Credits​