Deploy DeepSeek on reComputer Jetson with MLC
Introduction
DeepSeek is a cutting-edge AI model suite optimized for efficiency, accuracy, and real-time processing. With advanced optimization for edge computing, DeepSeek enables fast, low-latency AI inference directly on Jetson devices, reducing dependency on cloud computing while maximizing performance.
In a previous wiki, we have provided a quick guide to deploying DeepSeek on Jetson. However, the model deployed successfully did not achieve optimal inference speed.
This wiki provides a step-by-step guide to deploying DeepSeek on reComputer Jetson devices with MLC for efficient AI inference on the edge.
Prerequisites
- Jetson device with more than 8GB of memory.
- The jetson device needs to be pre-flashed with the jetpack 5.1.1 operating system or later.
In this wiki, we will accomplish the following tasks using the reComputer J4012 - Edge AI Computer with NVIDIA® Jetson™ Orin™ NX 16GB, but you can also try using other Jetson devices.

Getting Started
Hardware Connection
- Connect the Jetson device to the network, mouse, keyboard, and monitor.
Of course, you can also remotely access the Jetson device via SSH over the local network.
Install and Configure Jetson's Docker
First, we need to follow the tutorial provided by the Jetson AI Lab to install Docker.
step1. Install nvidia-container
package.
sudo apt update
sudo apt install -y nvidia-container
If you flash **Jetson Linux (L4T) R36.x (JetPack 6.x) on your Jetson using SDK Manager, and install nvidia-container using apt , on JetPack 6.x it no longer automatically installs Docker.
Therefore, you need to run the following to manually install Docker and set it up.
sudo apt update
sudo apt install -y nvidia-container curl
curl https://get.docker.com | sh && sudo systemctl --now enable docker
sudo nvidia-ctk runtime configure --runtime=docker
step2. Restart the Docker service and add your user to the docker group.
sudo systemctl restart docker
sudo usermod -aG docker $USER
newgrp docker
step3. Add default runtime in /etc/docker/daemon.json
.
sudo apt install -y jq
sudo jq '. + {"default-runtime": "nvidia"}' /etc/docker/daemon.json | \
sudo tee /etc/docker/daemon.json.tmp && \
sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json
step4. Restart Docker.
sudo systemctl daemon-reload && sudo systemctl restart docker
Load and Run DeepSeek
We can refer to the Docker container provided by the Jetson AI Lab
to quickly deploy the MLC-quantized DeepSeek model on Jetson.
Open the Jetson AI Lab website and find the deployment command.
Models
--> Orin NX
--> docker run
--> copy
Before we copy the installation commands, we can modify the relevant parameters on the left.

Open the terminal window on the Jetson device, paste the installation command we just copied into the terminal, and press the Enter
key on the keyboard to run the command.
When we see the following content in the terminal window, it means the deepseek model has been successfully loaded on the Jetson device.

At this point, we can open a new terminal window and enter the following command to test if the model can perform inference correctly.
Please note, do not close the terminal window running the deepseek model.
curl http://0.0.0.0:9000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer none" \
-d '{
"model": "*",
"messages": [{"role":"user","content":"Why did the LLM cross the road?"}],
"temperature": 0.6,
"top_p": 0.95,
"stream": false,
"max_tokens": 100
}'

Install Open WebUI
sudo docker run -d --network=host \
-v ${HOME}/open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
After the installer finishes running, you can enter http://<ip_of_jetson>:8080 in the browser to launch the UI interface.

Then, we need to configure the large model inference engine for OpenWebUI.
User(top right corner)
--> Settings
--> Admin Settings
--> Connections
Change the OpenAI URL to the local MLC inference server where DeepSeek is already loaded.
For example, if the IP address of my Jetson device is 192.168.49.241
, my URL should be http://192.168.49.241:9000/v1

After saving the configuration, we can create a new chat window to experience the extremely fast inference speed of the local DeepSeek model!

Test Inference Speed
Here, we can use this Python script to roughly test the model's inference speed.
On the Jetson device, create a new Python file named test_inference_speed.py
and fill it with the following code.
Then, execute the script by running the command python test_inference_speed.py
in the terminal.
test_inference_speed.py
import time
import requests
url = "http://0.0.0.0:9000/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": "Bearer none"
}
data = {
"model": "*",
"messages": [{"role": "user", "content": "Why did the LLM cross the road?"}],
"temperature": 0.6,
"top_p": 0.95,
"stream": True,
"max_tokens": 1000
}
start_time = time.time()
response = requests.post(url, headers=headers, json=data, stream=True)
token_count = 0
for chunk in response.iter_lines():
if chunk:
token_count += 1
print(chunk)
end_time = time.time()
elapsed_time = end_time - start_time
tokens_per_second = token_count / elapsed_time
print(f"Total Tokens: {token_count}")
print(f"Elapsed Time: {elapsed_time:.3f} seconds")
print(f"Tokens per second: {tokens_per_second:.2f} tokens/second")

The calculation results show that the inference speed of the MLC-compiled deepseek1.5B model deployed on the Jetson Orin NX device is approximately 60 tokens/s.
Effect Demonstration
In the demonstration video, the Jetson device operates at just under 20W yet achieves an impressive inference speed.
References
- https://www.jetson-ai-lab.com/models.html
- https://www.deepseek.com/
- https://wiki.seeedstudio.com/deploy_deepseek_on_jetson/
- https://www.seeedstudio.com/tag/nvidia.html
Tech Support & Product Discussion
Thank you for choosing our products! We are here to provide you with different support to ensure that your experience with our products is as smooth as possible. We offer several communication channels to cater to different preferences and needs.