Skip to main content

Convert and Quantize AI Models

The AI model conversion tool of reCamera currently supports frameworks such as PyTorch, ONNX, TFLite, and Caffe. Models from other frameworks need to be converted into ONNX format. For instructions on how to convert models from other deep learning architectures to ONNX, you can refer to the official ONNX website: https://github.com/onnx/tutorials.

The flow diagram for deploying AI models on reCamera is shown below.

This article introduces how to use reCamera's AI model conversion tool through simple examples.

Set up the working environment

Method 1: Local Installation

First check whether the current system environment meets:

If it is not satisfied or the installation fails, choose Method 2 to install the model conversion tool.

Install tpu_mlir using pip:

pip install tpu_mlir

The dependencies required by tpu_mlir vary when handling models from different frameworks. For model files generated by ONNX or Torch, install the additional dependencies using the following command:

pip install tpu_mlir[onnx]
pip install tpu_mlir[torch]

Currently, five configurations are supported: onnx, torch, tensorflow, caffe, and paddle. Or you can use the following command to install all dependencies:

pip install tpu_mlir[all]

When the tpu_mlir-{version}.whl file already exists locally, you can also use the following command to install it:

pip install path/to/tpu_mlir-{version}.whl[all]

Method 2: Installation in a Docker Image (for Incompatible Local Environments)

Download the required image from DockerHub (click here) and enter a command similar to the following:

docker pull sophgo/tpuc_dev:v3.1

If you are using Docker for the first time, you can run the following commands for installation and configuration (only needed for the first-time setup):

sudo apt install docker.io
sudo systemctl start docker
sudo systemctl enable docker
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

Make sure the installation package is in the current directory, and then create a container in the current directory as follows:

docker run --privileged --name MyName -v $PWD:/workspace -it sophgo/tpuc_dev:v3.1

* Replace "MyName" with the desired name for your container

Use pip to install tpu_mlir in the /workspace directory inside the Docker container, just like in Method 1:

pip install tpu_mlir[all]

Convert and Quantize AI Models to the cvimodel Format

reCamera has already adapted the YOLO series for local inference. Therefore, this section uses yolo11n.onnx as an example to demonstrate how to convert an ONNX model to the cvimodel. The cvimodel is the AI model format used for local inference on reCamera.

The method for converting and quantizing PyTorch, TFLite, and Caffe models is the same as that in this section.

Here is the download link for yolo11n.onnx. You can click the link to download the model and copy it to your workspace for further use.

Download the model:
Download yolo11n.onnx

After downloading the model, please place it into your workspace for the next steps.

Preparing the Workspace

Create the model_yolo11n directory at the same level as tpu-mlir, and place both the model file and image files inside the model_yolo11n directory. The image files are usually part of the model’s training dataset, used for calibration during the subsequent quantization process. Enter the following command in the terminal:

git clone https://github.com/sophgo/tpu-mlir.git
cd tpu-mlir
source ./envsetup.sh
./build.sh
mkdir model_yolo11n && cd model_yolo11n
cp -rf ${REGRESSION_PATH}/dataset/COCO2017 .
cp -rf ${REGRESSION_PATH}/image .
mkdir workspace && cd workspace

ONNX to MLIR

The conversion from ONNX to MLIR is an intermediate step in the model transformation process. Before obtaining a model suitable for inference on reCamera, you need to first convert the ONNX model to the MLIR format. This MLIR file serves as a bridge to generate the final model optimized for reCamera's inference engine.

If the input is image, we need to know the preprocessing of the model before transferring it. If the model uses preprocessed npz files as input, no preprocessing needs to be considered. The preprocessing process is formulated as follows ( x represents the input):

y = (x − mean) × scale

The normalization range of yolo11 is [0, 1], and the image of the official yolo11 is RGB. Each value will be multiplied by 1/255, respectively corresponding to 0.0, 0.0, 0.0 and 0.0039216, 0.0039216, 0.0039216 when it is converted into mean and scale. The parameters for mean and scale differ depending on the model, as they are determined by the normalization method used for each specific model.

You can refer to the following model conversion command in terminal:

model_transform \
--model_name yolo11n \
--model_def yolo11n.onnx \
--input_shapes "[[1,3,640,640]]" \
--mean "0.0,0.0,0.0" \
--scale "0.0039216,0.0039216,0.0039216" \
--keep_aspect_ratio \
--pixel_format rgb \
--output_names "/model.23/cv2.0/cv2.0.2/Conv_output_0,/model.23/cv3.0/cv3.0.2/Conv_output_0,/model.23/cv2.1/cv2.1.2/Conv_output_0,/model.23/cv3.1/cv3.1.2/Conv_output_0,/model.23/cv2.2/cv2.2.2/Conv_output_0,/model.23/cv3.2/cv3.2.2/Conv_output_0" \
--test_input ../image/dog.jpg \
--test_result yolo11n_top_outputs.npz \
--mlir yolo11n.mlir

After converting to an mlir file, a ${model_name}_in_f32.npz file will be generated, which is the input file for the subsequent models.

Regarding the selection of the --output_names parameter, the YOLO11 model conversion in this example does not choose the final output named output0. Instead, it selects the six outputs before the model's head as the parameter. You can import the ONNX file into Netron to view the model structure.

The operators in the YOLO's head have very low accuracy after INT8 quantization. If output0 at the very end were chosen as the parameter, mixed-precision quantization would be required.

Since the subsequent sections of this article will provide examples of mixed-precision quantization, and this section uses a single quantization precision for the example, the outputs before the head are chosen as parameters. By visualizing the ONNX model in Netron, you can see the positions of the six output names:

Description of Main Parameters for model_transform:

Parameter NameRequired?Description
model_nameYesSpecify the model name.
model_defYesSpecify the model definition file, such as '.onnx', '.tflite', or '.prototxt'.
input_shapesNoSpecify the input shape, e.g., [[1,3,640,640]]. A two-dimensional array that can support multiple inputs.
input_typesNoSpecify the input types, such as int32. Use commas to separate multiple inputs. Default is float32.
resize_dimsNoSpecify the dimensions to which the original image should be resized. If not specified, it will be resized to the model's input size.
keep_aspect_ratioNoWhether to keep the aspect ratio when resizing. Default is false; if true, padding with zeros will be used for missing areas.
meanNoMean value for each channel of the image. Default is 0,0,0,0.
scaleNoScale value for each channel of the image. Default is 1.0,1.0,1.0.
pixel_formatNoImage type, which can be one of 'rgb', 'bgr', 'gray', or 'rgbd'. Default is 'bgr'.
channel_formatNoChannel type for image input, which can be 'nhwc' or 'nchw'. For non-image inputs, use 'none'. Default is 'nchw'.
output_namesNoSpecify output names. If not specified, the model's default output names are used.
test_inputNoSpecify an input file for validation, such as an image, npy, or npz file. If not specified, no accuracy validation is performed.
test_resultNoSpecify the output file for the validation result.
exceptsNoSpecify network layers to exclude from validation, separated by commas.
mlirYesSpecify the output MLIR file name and path.

MLIR to F16 cvimodel

If you want to convert from mlir to F16-precision cvimodel, you can enter the following reference command in the terminal:

model_deploy \
--mlir yolo11n.mlir \
--quantize F16 \
--processor cv181x \
--test_input yolo11n_in_f32.npz \
--test_reference yolo11n_top_outputs.npz \
--fuse_preprocess \
--tolerance 0.99,0.99 \
--model yolo11n_1684x_f16.cvimodel

After a successful conversion, you will obtain an FP16-precision cvimodel file that can be directly used for inference. If you need an INT8-precision or mixed-precision cvimodel file, please refer to the content in the later sections of the following article.

Description of Main Parameters for model_deploy:

Parameter NameRequired?Description
mlirYesMLIR file
quantizeYesQuantization type (F32/F16/BF16/INT8)
processorYesIt depends on the platform being used. The 2024 version of reCamera selects "cv181x" as a parameter.
calibration_tableNoThe calibration table path. Required when it is INT8 quantization
toleranceNoTolerance for the minimum similarity between MLIR quantized and MLIR fp32 inference results
test_inputNoThe input file for validation, which can be an image, npy or npz. No validation will be carried out if it is not specified
test_referenceNoReference data for validating mlir tolerance (in npz format). It is the result of each operator
compare_allNoCompare all tensors, if set
exceptsNoNames of network layers that need to be excluded from validation. Separated by comma
op_divideNoTry to split the larger op into multiple smaller op to achieve the purpose of ion memory saving, suitable for a few specific models
modelYesName of output model file (including path)
skip_validationNoSkip cvimodel correctness verification to boost deployment efficiency; cvimodel verification is on by default

After compilation, a file named yolo11n_1684x_f16.cvimodel is generated. The quantized model may have a slight loss in accuracy, but it will be more lightweight and have faster inference speed.

MLIR to INT8 cvimodel

Calibration table generation

Before converting to the INT8 model, you need to run calibration to get the calibration table. The number of input data is about 100 to 1000 according to the situation. Then use the calibration table to generate a symmetric or asymmetric cvimodel. It is gen- erally not recommended to use the asymmetric one if the symmetric one already meets the requirements, because the performance of the asymmetric model will be slightly worse than the symmetric model. Here is an example of the existing 100 images from COCO2017 to perform calibration:

run_calibration \
yolo11n.mlir \
--dataset ../COCO2017 \
--input_num 100 \
-o yolo11n_calib_table

After running the command above, a file named yolo11n_cali_table will be generated, which is used as the input file for subsequent compilation of the INT8 model.

Description of Main Parameters for run_calibration:

ParameterRequired?Description
N/AYesSpecify the MLIR file
datasetNoSpecify the input sample directory, where the path contains corresponding images, npz, or npy files
data_listNoSpecify the sample list; either dataset or data_list must be selected
input_numNoSpecify the number of calibration samples; if set to 0, all samples are used
tune_numNoSpecify the number of tuning samples; default is 10
histogram_bin_numNoNumber of bins for the histogram; default is 2048
oYesOutput the calibration table file

Compile to INT8 symmetric quantized cvimodel

After obtaining the yolo11n_cali_table file, run the following command to convert it into an INT8 symmetric quantized model:

model_deploy \
--mlir yolo11n.mlir \
--quantize INT8 \
--quant_input \
--processor cv181x \
--calibration_table yolo11n_calib_table \
--test_input ../image/dog.jpg \
--test_reference yolo11n_top_outputs.npz \
--customization_format RGB_PACKED \
--fuse_preprocess \
--aligned_input \
--model yolo11n_1684x_int8_sym.cvimodel

After compilation, a file named yolo11n_1684x_int8_sym.cvimodel is generated. The model quantized to INT8 is more lightweight and has faster inference speed compared to models quantized to F16/BF16.

Quick Test

You can use Node-RED on reCamera for visualization to quickly verify the converted yolo11n_1684x_int8_sym.cvimodel. Simply set up a few nodes, as shown in the example video below:

You need to select the yolo11n_1684x_int8_sym.cvimodel in the model node for quick verification. Double-click the model node, click "Upload" to import the quantized model, then click "Done", and finally click "Deploy".

We can view the inference results of the INT8 quantized model in the preview node. The cvimodel obtained through correct conversion and quantization methods is still reliable:

tip

Currently, reCamera's Node-RED only supports preview testing for a limited number of models. In the future, we will adapt more models. If you import a custom model into Node-RED or do not set the specified output tensor as shown in our example, Node-RED's backend does not support preview testing, even if your cvimodel is correct.

We will release tutorials on preprocessing and postprocessing for various models, so you can write your own code to infer your custom cvimodel.

Mixed-Precision Quantization

When the precision of certain layers in a model is easily affected by quantization, but we still need faster inference speed, a single precision quantization may no longer be suitable. In such cases, mixed-precision quantization can better address the situation. For layers that are more sensitive to quantization, we can choose F16/BF16 quantization, while for layers with minimal precision loss, we can use INT8.

Next, we will use yolov5s.onnx as an example to demonstrate how to quickly convert and quantize the model into a mixed-precision cvimodel. Before reading this section, make sure you have gone through the previous sections of the article, as the operations in this section build upon the content covered earlier.

Here is the download link for yolov5s.onnx. You can click the link to download the model and copy it to your workspace for further use.

Download the model:
Download yolov5s.onnx

After downloading the model, please place it into your workspace for the next steps.

mkdir model_yolov5s && cd model_yolov5s
cp -rf ${REGRESSION_PATH}/dataset/COCO2017 .
cp -rf ${REGRESSION_PATH}/image .
mkdir workspace && cd workspace

The first step is still to convert the model to the .mlir file. Because the precision loss in the YOLO's head is minimal when using mixed-precision quantization, unlike the previous approach, we will choose the final output name at the end rather than the outputs before the head in the --output_names parameter.Visualize the ONNx in Netron:

Since the normalization parameters of yolov5 are the same as those of yolo11, we can obtain the following command for model_transform:

model_transform \
--model_name yolov5s \
--model_def yolov5s.onnx \
--input_shapes [[1,3,640,640]] \
--mean 0.0,0.0,0.0 \
--scale "0.0039216,0.0039216,0.0039216" \
--keep_aspect_ratio \
--pixel_format rgb \
--output_names output \
--test_input ../image/dog.jpg \
--test_result yolov5s_top_outputs.npz \
--mlir yolov5s.mlir

Then we also need to generate the calibration table, and this step is the same as in the previous section:

run_calibration \
yolov5s.mlir \
--dataset ../COCO2017 \
--input_num 100 \
-o yolov5s_calib_table

Unlike the section where we converted the int8 symmetric quantized yolo11 model, before executing model_deploy, we need to generate a mixed-precision quantization table.** The reference command is as follows:

run_qtable \
yolov5s.mlir \
--dataset ../COCO2017 \
--calibration_table yolov5s_calib_table \
--processor cv181x \
--min_layer_cos 0.99 \
--expected_cos 0.999 \
-o yolov5s_qtable

The parameter description for run_qtable is shown in the table below:

ParameterRequired?Description
N/AYesSpecify the MLIR file
datasetNoSpecify the input sample directory, which contains images, npz, or npy files
data_listNoSpecify the sample list; either `dataset` or `data_list` must be selected
calibration_tableYesInput calibration table
processorYesIt depends on the platform being used. The 2024 version of reCamera selects "cv181x" as a parameter.
fp_typeNoSpecify the floating-point precision type for mixed precision, supports auto, F16, F32, BF16; default is auto
input_numNoSpecify the number of input samples; default is 10
expected_cosNoSpecify the minimum expected cosine similarity for the final network output layer; default is 0.99
min_layer_cosNoSpecify the minimum cosine similarity for each layer's output; values below this threshold will use floating-point computation; default is 0.99
debug_cmdNoSpecify debugging command string for development use; default is empty
global_compare_layersNoSpecify the layers to replace for final output comparison, e.g., 'layer1,layer2' or 'layer1:0.3,layer2:0.7'
loss_tableNoSpecify the file name to save the loss values of all layers quantized to floating-point types; default is full_loss_table.txt

After each layer's predecessor layer is converted to the corresponding floating-point mode based on its cos, the cos value calculated for that layer is checked. If the cos is still smaller than the min_layer_cos parameter, the current layer and its direct successor layers will be set to use floating-point operations.

run_qtable recalculates the cos of the entire network's output after setting each pair of adjacent layers to use floating-point computation. If the cos exceeds the specified expected_cos parameter, the search terminates. Therefore, setting a larger expected_cos will result in more layers being attempted for floating-point operations.

Finally, run model_deploy to obtain the mixed-precision cvimodel:

model_deploy \
--mlir yolov5s.mlir \
--quantize INT8 \
--quantize_table yolov5s_qtable \
--calibration_table yolov5s_calib_table \
--customization_format RGB_PACKED \
--fuse_preprocess \
--aligned_input \
--processor cv181x \
--model yolov5s_mix-precision.cvimodel

After obtaining yolov5s_mix-precision.cvimodel, we can use model_tool to view detailed information about the model:

model_tool --info yolov5s_mix-precision.cvimodel

Key information such as TensorMap and WeightMap will be printed to the terminal:

We can run an example in reCamera to verify the mixed-precision quantized YOLOv5 model. Pull the compiled test example:

git clone https://github.com/jjjadand/yolov5_Test_reCamera.git

Copy the compiled examples and yolov5s_mix-precision.cvimodelusing software like FileZilla to reCamera. (You can review Getting Started with reCamera)

After the copy is complete, run the command in the recamera terminal:

cp /path/to/yolov5s_mix-precision.cvimodel /path/to/yolov5_Test_reCamera/solutions/sscma-model/build/
cd yolov5_Test_reCamera/solutions/sscma-model/build/
sudo ./sscma-model yolov5s_mix-precision.cvimodel Dog.jpg Out.jpg

Preview Out.jog, the mixed-precision quantized yolov5 model inference results are as follows:

Resources

reCamera OS

reCamera Series

reCamera example

Tech Support & Product Discussion

Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer several communication channels to cater to different preferences and needs.

Loading Comments...