Local RAG based on Jetson with LlamaIndex
Introduction
Nowadays, more and more people are starting to use large language models to solve everyday problems. However, large language models can exhibit illusions and provide users with incorrect information when answering certain questions. Nevertheless, RAG technology can reduce the occurrence of illusions by providing relevant data to the large language models. Therefore, using RAG technology to reduce the generation of illusions in large language models has become a trend.
And here we introduce you RAG based on Jetson, which using LlamaIndex as the RAG framework, ChromaDB as the vector database, and the quantized Llama2-7b model LLM MLC as the question-answering model. With this local RAG project, it can protect your data privacy and provide you with low-latency communication experience.
Hardware components
reComputer (based on Jetson with RAM >= 16GB) |
---|
Prepare the runtime environment
Step 1: Install MLC Jetson Container
# Install jetson-container and its requirements
git clone --depth=1 https://github.com/dusty-nv/jetson-containers
cd jetson-containers
pip install -r requirements.txt
Step 2: Install project
# Install RAG project
cd data
git clone https://github.com/Seeed-Projects/RAG_based_on_Jetson.git
Step 3: Install Llama2-7b model quantified by MLC LLM
# Install LLM model
sudo apt-get install git-lfs
cd RAG_based_on_Jetson
git clone https://huggingface.co/JiahaoLi/llama2-7b-MLC-q4f16-jetson-containers
Step 4: Run the docker and install requirements
cd ../../
./run.sh $(./autotag mlc)
# Here you will enter the Docker, and the commands below will run inside the Docker
cd data/RAG_based_on_Jetson/
pip install -r requirements.txt
pip install chromadb==0.3.29
After you run pip install chromadb==0.3.29
you will get the interface as shown below.
It is fine to ignore the error.
Let's run it
# Run in the docker
python3 RAG.py
Project Outlook
In this project, TXT and PDF documents were parsed as vector databases, and RAG technology was used to reduce the model's illusions about specific problems. In the future, we will use multimodal models to support retrieval of images and videos.