Skip to main content

Porting the MediaPipe Hand Gesture Recognition Model to reCamera

Introduction

This project demonstrates how to fully port the official Google MediaPipe hand gesture recognition suite onto the reCamera to perform real-time gesture recognition, and to stream the video and recognition results to a PC for visualization via UDP.

The system can recognize 8 gesture categories (None / Closed_Fist / Open_Palm / Pointing_Up / Thumb_Down / Thumb_Up / Victory / ILoveYou), while also outputting 21 hand landmarks and handedness (left/right hand) information. It is suitable for the following application scenarios:

  • Smart home gesture control: Control lights, curtains, and appliance switches through predefined gestures, without the need for voice or a phone app.
  • Industrial touch-free interaction: Workers wearing gloves or with both hands occupied can send commands to equipment via simple gestures.
  • Education and exhibition interaction: In science museums or exhibition halls, visitors can trigger multimedia content through gestures for an immersive experience.
  • Accessibility assistance: Provides a gesture-based device control entry point for users with hearing impairments or limited mobility.

Hardware Preparation

To run this demo, the following hardware is required:

  • One reCamera device (all reCamera variants are supported)
  • One PC (used to run the Python receiver for visualization; it must be on the same local network as the reCamera)

You can choose any version of reCamera according to your deployment needs:

  • reCamera 2002 series (Wi-Fi)
  • reCamera Gimbal
  • reCamera HQ PoE (Ethernet + PoE)

Note:
The PoE version does not support Wi-Fi and must be connected to the same local network via a PoE-enabled switch.

reCamera 2002 SeriesreCamera GimbalreCamera HQ PoE

How It Works

Model Conversion Pipeline (TFLite → ONNX → cvimodel)

Download the TFLite format models from the official MediaPipe repository. They need to be converted into the .cvimodel format supported by the reCamera TPU:

MediaPipe TFLite (FLOAT16)
│ tf2onnx (--channel_format none, keep NHWC)

ONNX (FLOAT32, NHWC) ← numerical reference (cos=1.0 vs TFLite)
│ tpu-mlir model_transform + model_deploy
├─ BF16
└─ INT8 (per-channel + real-data calibration)

CVIMODEL (cv181x)

Accuracy Verification

After conversion, the models are verified through a three-way comparison (TFLite vs ONNX vs cvimodel):

ModelOutputBF16 cosINT8 cos
detectorscores1.00000.9896
detectorboxes0.99990.9748
landmarklm631.00000.9999
landmarkworld630.99970.8098
embedderembedding1.00000.9992
classifierprobs1.00000.9978

Note: After INT8 quantization, the accuracy of world63 (world-coordinate landmarks) has some loss (cos=0.81), but the end-to-end gesture classification result is consistent with TFLite (the category judgment is reliable). If your application strongly depends on world-coordinate accuracy, it is recommended to use the BF16 version of this model.

Building the Demo

To build this example, you need to:

  1. Cross-compile the C++ program on your PC
  2. Run the compiled executable on the reCamera
  3. Run the Python receiver script on your PC

Step 1: Compile the C++ Program

note

Before building this solution, make sure you have configured the ReCamera-OS environment (version 0.2.1 or higher) according to the main project documentation, including the SDK path and the cross-compilation toolchain.

Set the cross-compilation toolchain environment variable:

export PATH='current compile chain path'/host-tools/gcc/riscv64-linux-musl-x86_64/bin:$PATH

Clone the repository and enter the solution directory to build:

git clone https://github.com/RobotXTeam/sscma-example-sg200x.git
cd sscma-example-sg200x/solutions/sesg-project/hand_gesture
export SG200X_SDK_PATH='current clone path'/sg2002_recamera_emmc
rm -rf build && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-std=c++17" ..
make -j$(nproc)

The compiled executable is located at: build/hand_gesture

Step 2: Prepare the Model Files

This example requires 4 .cvimodel model files (INT8 quantized versions) already provided in the repository. If you need to convert the models yourself, please refer to the Model Conversion Guide:

ModelFilenameDescription
Palm Detectionhand_detector_cv181x_int8.cvimodelModel 1: SSD palm detection
Landmark Detectionhand_landmarks_detector_cv181x_int8.cvimodelModel 2: 21 landmarks
Gesture Embeddinggesture_embedder_cv181x_int8.cvimodelModel 3: 128-D embedding
Gesture Classificationcanned_gesture_classifier_cv181x_int8.cvimodelModel 4: 8-class classification

Upload the compiled executable and the model files to /home/recamera/ on the reCamera:

scp hand_gesture hand_detector_cv181x_int8.cvimodel hand_landmarks_detector_cv181x_int8.cvimodel \
gesture_embedder_cv181x_int8.cvimodel canned_gesture_classifier_cv181x_int8.cvimodel \
recamera@<reCamera_IP>:/home/recamera/ # Make sure the PC and reCamera are on the same network segment, then replace <reCamera_IP> with the corresponding IP address

Step 3: Configure the reCamera

warning

Before running the C++ program, you must stop the default Node-RED services because they will occupy the camera resources. Run the following commands via SSH:

sudo /etc/init.d/S03node-red stop
sudo /etc/init.d/S91sscma-node stop
sudo /etc/init.d/S93sscma-supervisor stop

Step 4: Run the Executable on the reCamera

Log in to the reCamera via SSH, grant execute permission, then run it:

cd /home/recamera/
chmod +x hand_gesture

Parameter Description

ParameterDescriptionDefault
palm_modelPalm detection model (required)-
landmark_modelLandmark detection model (required)-
embedder_modelGesture embedding model (required)-
classifier_modelGesture classification model (required)-
min_scorePalm detection threshold0.5
udp_ipPC IP address (enables UDP streaming)-
udp_portUDP port number-
jpeg_wJPEG streaming frame width320
jpeg_hJPEG streaming frame height240
jpeg_fpsJPEG streaming frame rate10
skip_multiWith multiple hands (≥2), run inference once every N frames3
skip_singleWith a single hand, run inference every frame1

Example Commands

Basic usage (no UDP streaming, local inference only):

sudo ./hand_gesture \
hand_detector_cv181x_int8.cvimodel \
hand_landmarks_detector_cv181x_int8.cvimodel \
gesture_embedder_cv181x_int8.cvimodel \
canned_gesture_classifier_cv181x_int8.cvimodel

Full usage (UDP streaming + custom parameters):

sudo ./hand_gesture \
hand_detector_cv181x_int8.cvimodel \
hand_landmarks_detector_cv181x_int8.cvimodel \
gesture_embedder_cv181x_int8.cvimodel \
canned_gesture_classifier_cv181x_int8.cvimodel \
0.5 \
192.168.XX.XX 5001 \
320 240 10 \
3 1
note
  1. Please replace 192.168.XX.XX with the actual IP address of the PC that is on the same network as your reCamera. UDP streaming is only enabled when both udp_ip and udp_port are provided.
  2. If the program displays "[Heartbeat] Before the first retrieveFrame(RGB888) call..." and then hangs, please restart the reCamera.

Step 5: Run the Python Receiver on the PC

On your PC, make sure the required Python libraries are installed:

pip install opencv-python numpy

Enter the solution directory and run the receiver script:

cd sscma-example-sg200x/solutions/sesg-project/hand_gesture
python3 tools/udp_receiver.py 5001

The PC will display a real-time video window, including:

  • JPEG video stream
  • Palm detection box (blue rectangle)
  • 21 hand landmarks (red dots + connected skeleton)
  • Gesture classification label (gesture name and confidence shown in the upper-left corner)
  • Handedness (left/right hand) information

Real-time gesture recognition result on the PC side

Expected Output

On the reCamera Terminal

After the program runs, it will display inference performance logs:


[Perf] FPS=5.88 (inference=2.94) | palm=120.7ms | landmark=169.1ms | gesture=0.6ms | total=290.4ms | avg_hands=1.00
[Gesture] Open_Palm (70%) [R] palm=(0.43,0.34,0.69,0.69) score=0.85
[LB-DIAG] #2 warpAffine sx=0.3000 sy=0.3000 tx=0.0 ty=24.0
[LB-DIAG] #2 canvas 192x192: nonzero=82944 min=0 max=255 mean=80.7
[DET-DIAG] setInput ret=0, run ret=0
[Gesture] Open_Palm (70%) [R] palm=(0.45,0.36,0.72,0.73) score=0.85
[Gesture] Open_Palm (70%) [R] palm=(0.45,0.36,0.72,0.73) score=0.85
[LB-DIAG] #2 warpAffine sx=0.3000 sy=0.3000 tx=0.0 ty=24.0
[LB-DIAG] #2 canvas 192x192: nonzero=82944 min=0 max=255 mean=82.0
[DET-DIAG] setInput ret=0, run ret=0
[Gesture] Open_Palm (60%) [R] palm=(0.45,0.41,0.72,0.77) score=0.88
[Gesture] Open_Palm (60%) [R] palm=(0.45,0.41,0.72,0.77) score=0.88
[LB-DIAG] #2 warpAffine sx=0.3000 sy=0.3000 tx=0.0 ty=24.0
[LB-DIAG] #2 canvas 192x192: nonzero=82944 min=0 max=255 mean=81.9
[DET-DIAG] setInput ret=0, run ret=0
[Gesture] Open_Palm (60%) [R] palm=(0.47,0.42,0.73,0.76) score=0.81
[Perf] FPS=5.93 (inference=2.97) | palm=120.6ms | landmark=177.2ms | gesture=0.6ms | total=298.4ms | avg_hands=1.00
[Gesture] Open_Palm (60%) [R] palm=(0.47,0.42,0.73,0.76) score=0.81
[LB-DIAG] #2 warpAffine sx=0.3000 sy=0.3000 tx=0.0 ty=24.0
[LB-DIAG] #2 canvas 192x192: nonzero=82944 min=0 max=255 mean=81.8

Note: The palm model requires a 192×192 input, which is below the minimum scaling resolution of the VPSS. Therefore, CH0 uses 640×480 (supported by the VPSS), and the model internally scales it to 192×192 via software letterbox.

Camera Access Error

If you see a "No camera" or "Camera device not found" error:

  • Make sure the Node-RED services are stopped (see Step 3)
  • Check the camera connection

UDP Connection Failure

If the PC does not receive data:

  • Confirm that the PC and reCamera are on the same network
  • Check the firewall settings on the PC
  • Make sure the UDP port is not blocked
  • Use ping to test the connectivity between the devices

Abnormal Gesture Recognition Confidence

If the recognized gesture confidence is obviously wrong:

  • Confirm that the C++ softmax patch after the classifier model is correctly implemented
  • Check whether the ONNX output (containing Softmax) was mistakenly used instead of the cvimodel output (logits)

C++ Code Structure

hand_gesture/
├── main/
│ ├── main.cpp # Entry: get frame → mmap → inference → UDP push
│ ├── hand_detector.{h,cpp} # Model 1: palm detection (SSD post-processing + NMS)
│ ├── hand_landmarker.{h,cpp} # Model 2: 21 landmarks (ROI warpAffine)
│ ├── gesture_recognizer.{h,cpp}# Model 3+4: embedder + classifier (with softmax patch)
│ ├── gesture_math.{h,cpp} # letterbox / math utilities
│ ├── engine_utils.h # tensor packing helpers
│ └── hand_types.h # data structures + UDP POD protocol
├── tools/udp_receiver.py # Python host receiver
└── CMakeLists.txt

Tech Support & Product Discussion

Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer a variety of communication channels to meet different preferences and needs.

Loading Comments...