Skip to main content

Getting Started with Deci on NVIDIA® Jetson Devices

pir

Deci platform enables you to manage, optimize, deploy and serve models in your production environment with ease. You can continue using popular DL frameworks, such as TensorFlow, PyTorch, Keras and ONNX. All you need is Deci web-based platform or Deci Python client in order to run it from your code.

Deci provides:

  • Performance Acceleration – Accelerate model inference performance by 2x – 10x on any hardware, without compromising accuracy, by using Deci’s Automated Neural Architecture Construction (AutoNAC) technology.

  • Scaling on Any Hardware – Cut up to 80% of cloud computation costs and BOM to enable inference at scale, regardless of whether it’s from a private or public cloud, from your own server or from any computer, edge or mobile device.

  • Inference Benchmarking – Benchmark your models across any target hardware environment and batch size to find your model’s optimal Throughput, Latency, Memory Usage and Cloud Costs.

  • Model Packaging – Quickly and Easily Deploy to Production – Seamlessly deploy trained models from the Deci Lab to any production environment, including all environmental library dependencies in a single encapsulated container.

  • Model Serving – Deci’s proprietary deep-learning run-time inference engine can be deployed on your own machine (on any hardware – on-prem / edge / cloud). Deci provides the following options for deploying your Deci Optimized Model as a siloed efficient run-time server:

    • Deci’s Runtime Inference Container (RTiC), which is containerized machine learning runtime engine.
    • Deci’s INFERY (from the word inference) that enables you to run a model from a Python package.

Hardware supported

Deci is supported by the following Jetson-related hardware:

  • Kits by Seeed:

    • reComputer J1010 built with Jetson Nano
    • reComputer J1020 built with Jetson Nano
    • reComputer J2011 built with Jetson Xavier NX 8GB
    • reComputer J2012 built with Jetson Xavier NX 16GB
  • Carrier Boards by Seeed:

    • Jetson Mate
    • Jetson SUB Mini PC
    • Jetson Xavier AGX H01 Kit
    • A203 Carrier Board
    • A203 (Version 2) Carrier Board
    • A205 Carrier Board
    • A206 Carrier Board
  • Official Development Kits by NVIDIA:

    • NVIDIA® Jetson Nano Developer Kit
    • NVIDIA® Jetson Xavier NX Developer Kit
    • NVIDIA® Jetson AGX Xavier Developer Kit
    • NVIDIA® Jetson TX2 Developer Kit
    • NVIDIA® Jetson AGX Orin Developer Kit
  • Official SoMs by NVIDIA:

    • NVIDIA® Jetson Nano module
    • NVIDIA® Jetson Xavier NX module
    • NVIDIA® Jetson TX2 NX module
    • NVIDIA® Jetson TX2 module
    • NVIDIA® Jetson AGX Xavier module

If you own any of the above hardware, you can proceed to working with deci on your hardware.

Hardware prerequisites

Prepare the following hardware:

  • Any of the above Jetson Devices running JetPack 4.6
  • Monitor, keyboard, mouse (optional)

Sign Up for Deci Account

  • Step 1. Visit this page to sign up for a Deci account

  • Step 2. Enter the required details and finish the sign up process

You will now be presented with Deci Lab platform

pir

Deci Lab model zoo with pre-optimized models

By default, Deci Lab includes ResNet50 Baseline model which is already loaded into the interface with a couple of optimizations for different hardware. That's not all. Deci offers a massive collection of base models with the correponding optimized versions of the models for different hardware in Deci Model Zoo. Click Model Zoo and List to view all the available models.

pir

As an example, we will search for YOLOX in the search bar to view all the YOLOX models.

pir

As you can see, there are base models such as YOLOX_Nano, YOLOX_Small and optimized models such as YOLOX_Nano Jetson Nano Optimized, YOLOX_Nano Jetson Xavier Optimized

Optimize your own model

As explained above, you can use pre-optimized models directly without needing to manually optimize them. However, if you want to use your own model, you can upload your model to Deci Lab and optimize it according to your target hardware

Step 1: On Deci Lab, click + New Model

pir

Step 2: Choose an appropriate task according to your model. Here we have chosen Object Detection

pir

Step 3: Enter a name for the model and click Next

pir

Step 4: Choose a model framework (ONNX in this case), upload a model according to the chosen framework and click Next. Here we have uploaded yolov6n.onnx model.

pir

Step 5: Select Primary hardware, Inference batch size, Quantization level and click Next

pir

Step 6: Add performance goals and constraints. This will be mainly useful if you use AutoNAC which is a feature included in premium version. AutoNAC can dramatically increase model inference performance while decreasing model size and much more. If you are not using AutoNAC, you need to fill a value for Throughput and here we have set it as 40 (it can be a random value). Finally click start to begin the optimization process.

pir

Now the optimization process will show it's progress as follows and will be done after a few minutes.

pir

Compare model performance

We can either use the Deci Lab platform to compare the model performance between the base models and the optimized models, or else deploy the model onto the target hardware and run benchmarks. Eventhough it is easier to visualize everything on Deci Lab, it is recommended to deploy the model and run benchmark on the target device to make sure the performance metrics are accurate for the specific hardware.

Visualize on Deci Lab

Here we will use the YOLOX_Nano base model and YOLOX_Nano Jetson Xavier NX Optimized models to compare.

Step 1: Navigate to Model Zoo and click clone next to YOLOX_Nano base model and YOLOX_Nano Jetson Xavier NX Optimized models

pir

Step 2: On Deci Lab, click on the YOLOX_Nano model under MODEL_VERSIONS to go to model insights section.

Step 3: Select Jetson Xavier as Target Hardware

pir

Now you will see all the performance metrics for the YOLOX_Nano model, if it is to be deployed into a Jetson Xavier NX device.

Step 4: Go back to homepage of Deci Lab, click on the YOLOX_Nano Jetson Xavier NX Optimized model under MODEL_VERSIONS

pir

Now you will see all the performance metrics for the YOLOX_Nano Jetson Xavier NX Optimized model, if it is to be deployed into a Jetson Xavier NX device.

Performance comparison

We can compare the results we obtained previously for the Jetson Xavier target hardware using the table below

YOLOX_NanoYOLOX_Nano Jetson Xavier NX Optimized
Accuracy25.825.8
Throughput62.8fps175.8fps
Latency15.9361ms5.6897ms
GPU memory footprint1.05MB1.01MB
Model size3.66MB9.74MB

As you you can see the main performance increase is the throughput where the optimized model is nearly 2.7 times faster than the base model.

Deploy on Jetson device and benchmark

We will now deploy the above two models onto a Jetson Xavier NX device and perform benchmark to make sure we get accurate performance results.

Install INFERY

  • Step 1. Open a terminal window on the Jetson device and update the packages list
sudo apt update 
  • Step 2. Install pip package manager
sudo apt install python3-pip
  • Step 3. Update pip to the latest version
python3 -m pip install -U pip
  • Step 4. Install INFERY for Jetson
sudo python3 -m pip install https://deci-packages-public.s3.amazonaws.com/infery_jetson-3.2.2-cp36-cp36m-linux_aarch64.whl

Load the model

  • Step 1. On Deci Lab, hover your mouse over the model name and click Deploy from the pop-up

pir

  • Step 2. Click Download model to download the model to the PC and then copy this model file to home directory of the Jetson device

pir

  • Step 3. Open a terminal window on the Jetson device and execute
lakshanthad@nano:~$ python3
Python 3.6.9 (default, Dec 8 2021, 21:08:43)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import infery, numpy as np
  • Step 4. Copy the 2nd command under LOAD MODEL on the Deci lab Deploy Model window onto the terminal window of Jetson device (Make sure Jetson is selected for target hardware)

pir

ex: model = infery.load(model_path='YOLOX_Nano.onnx', framework_type='onnx', inference_hardware='gpu')

Note: Make sure to adjust the model_path parameter according to where you copied the model before. If you copied the model file to home directory, you can keep the path as it is.

You will see the following output, if the model is loaded successfully

infery_manager -INFO- Loading model YOLOX_Nano.onnx to the GPU
infery_manager -INFO- Successfully loaded YOLOX_Nano.onnx to the GPU.

Measure performance of a model

To measure a model’s performance using INFERY, run the model.benchmark command from your application

model.benchmark(batch_size=1)

The following will be the output for the YOLOX_Nano model

base_inferencer -INFO- Benchmarking the model in batch size 1 and dimensions [(3, 416, 416)]...
<ModelBenchmarks: {
"batch_size": 1,
"batch_inf_time": "13.63 ms",
"batch_inf_time_variance": "1.12 ms",
"memory": "3537.89 mb",
"pre_inference_memory_used": "3532.94 mb",
"post_inference_memory_used": "3537.89 mb",
"total_memory_size": "7765.41 mb",
"throughput": "73.36 fps",
"sample_inf_time": "13.63 ms",
"include_io": true,
"framework_type": "onnx",
"framework_version": "1.8.0",
"inference_hardware": "GPU",
"infery_version": "3.2.2",
"date": "18:23:57__07-06-2022",
"ctime": 1657112037,
"h_to_d_mean": null,
"d_to_h_mean": null,
"h_to_d_variance": null,
"d_to_h_variance": null
}>

where:

  • 'batch_size' – Specifies batch size that was used for benchmark.
  • 'batch_inf_time' – Specifies the latency for the entire batch.
  • 'sample_inf_time' – Specifies the latency for a single sample within the batch. equivalent to batch_inf_time divided by the batch_size.
  • 'memory' – Specifies the memory footprint that the model utilizes while inferencing.
  • 'throughput' – Specifies the number of requests that are processed (forward passes) per second.
  • 'batch_inf_time_variance' – Specifies the variance of the batch inference times during benchmark. If the variance is high, we recommend increasing the number passed to 'repetitions' to make the benchmarks more reliable.

Repeat the same steps for YOLOX_Nano Jetson Xavier NX Optimized model, perform the benchmark and you wil see the results as follows:

base_inferencer -INFO- Benchmarking the model in batch size 1 and dimensions [(3, 416, 416)]...
<ModelBenchmarks: {
"batch_size": 1,
"batch_inf_time": "5.28 ms",
"batch_inf_time_variance": "0.05 ms",
"memory": "2555.62 mb",
"pre_inference_memory_used": "2559.38 mb",
"post_inference_memory_used": "2555.62 mb",
"total_memory_size": "7765.41 mb",
"throughput": "189.25 fps",
"sample_inf_time": "5.28 ms",
"include_io": true,
"framework_type": "trt",
"framework_version": "8.0.1.6",
"inference_hardware": "GPU",
"infery_version": "3.2.2",
"date": "18:30:05__07-06-2022",
"ctime": 1657112405,
"h_to_d_mean": "0.43 ms",
"d_to_h_mean": "0.20 ms",
"h_to_d_variance": "0.00 ms",
"d_to_h_variance": "0.00 ms"
}>

Performance comparison

We can mainly compare the throughput for these results

YOLOX_NanoYOLOX_Nano Jetson Xavier NX Optimized
Throughput73.36fps189.25fps

It can be seen that the optimized model is nearly 2.57 times faster than the base model.

Resources

Tech Support & Product Discussion

Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer several communication channels to cater to different preferences and needs.

Loading Comments...