Skip to main content

Local RAG based on Jetson with LlamaIndex


Nowadays, more and more people are starting to use large language models to solve everyday problems. However, large language models can exhibit illusions and provide users with incorrect information when answering certain questions. Nevertheless, RAG technology can reduce the occurrence of illusions by providing relevant data to the large language models. Therefore, using RAG technology to reduce the generation of illusions in large language models has become a trend.


And here we introduce you RAG based on Jetson, which using LlamaIndex as the RAG framework, ChromaDB as the vector database, and the quantized Llama2-7b model LLM MLC as the question-answering model. With this local RAG project, it can protect your data privacy and provide you with low-latency communication experience.

Hardware components

reComputer (based on Jetson with RAM >= 16GB)

Prepare the runtime environment

Step 1: Install MLC Jetson Container

# Install jetson-container and its requirements
git clone --depth=1
cd jetson-containers
pip install -r requirements.txt

Step 2: Install project

# Install RAG project
cd data
git clone

Step 3: Install Llama2-7b model quantified by MLC LLM

# Install LLM model
sudo apt-get install git-lfs
cd RAG_based_on_Jetson
git clone

Step 4: Run the docker and install requirements

cd ../../
./ $(./autotag mlc)
# Here you will enter the Docker, and the commands below will run inside the Docker
cd data/RAG_based_on_Jetson/
pip install -r requirements.txt
pip install chromadb==0.3.29

After you run pip install chromadb==0.3.29 you will get the interface as shown below.



It is fine to ignore the error.

Let's run it

# Run in the docker

Project Outlook

In this project, TXT and PDF documents were parsed as vector databases, and RAG technology was used to reduce the model's illusions about specific problems. In the future, we will use multimodal models to support retrieval of images and videos.

Loading Comments...