llama_ros

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

https://github.com/mgonzs13/llama_ros

Keywords

audio cpp embeddings ggml gguf gpt langchain llama llamacpp llava llavacpp llm multimodal rerank reranking ros2 vlm

Last synced: 6 months ago · JSON representation ·

Repository

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

Basic Info

Host: GitHub
Owner: mgonzs13
License: mit
Language: C++
Default Branch: main
Homepage:
Size: 13.4 MB

Statistics

Stars: 223
Watchers: 4
Forks: 40
Open Issues: 1
Releases: 100

Topics

audio cpp embeddings ggml gguf gpt langchain llama llamacpp llava llavacpp llm multimodal rerank reranking ros2 vlm

Created almost 3 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

llama_ros

This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. Using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs. You can also use features from llama.cpp such as GBNF grammars and modify LoRAs in real-time.

[![License: MIT](https://img.shields.io/badge/GitHub-MIT-informational)](https://opensource.org/license/mit) [![GitHub release](https://img.shields.io/github/release/mgonzs13/llama_ros.svg)](https://github.com/mgonzs13/llama_ros/releases) [![Code Size](https://img.shields.io/github/languages/code-size/mgonzs13/llama_ros.svg?branch=main)](https://github.com/mgonzs13/llama_ros?branch=main) [![Last Commit](https://img.shields.io/github/last-commit/mgonzs13/llama_ros.svg)](https://github.com/mgonzs13/llama_ros/commits/main) [![GitHub issues](https://img.shields.io/github/issues/mgonzs13/llama_ros)](https://github.com/mgonzs13/llama_ros/issues) [![GitHub pull requests](https://img.shields.io/github/issues-pr/mgonzs13/llama_ros)](https://github.com/mgonzs13/llama_ros/pulls) [![Contributors](https://img.shields.io/github/contributors/mgonzs13/llama_ros.svg)](https://github.com/mgonzs13/llama_ros/graphs/contributors) [![Python Formatter Check](https://github.com/mgonzs13/llama_ros/actions/workflows/python-formatter.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/python-formatter.yml?branch=main) [![C++ Formatter Check](https://github.com/mgonzs13/llama_ros/actions/workflows/cpp-formatter.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/cpp-formatter.yml?branch=main) | ROS 2 Distro | Branch | Build status | Docker Image | Documentation | | :----------: | :-------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------: | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Humble** | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) | [![Humble Build](https://github.com/mgonzs13/llama_ros/actions/workflows/humble-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/humble-docker-build.yml?branch=main) | [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-humble-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=humble) | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/latest) | | **Iron** | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) | [![Iron Build](https://github.com/mgonzs13/llama_ros/actions/workflows/iron-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/iron-docker-build.yml?branch=main) | [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-iron-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=iron) | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/latest) | | **Jazzy** | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) | [![Jazzy Build](https://github.com/mgonzs13/llama_ros/actions/workflows/jazzy-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/jazzy-docker-build.yml?branch=main) | [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-jazzy-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=jazzy) | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/latest) | | **Kilted** | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) | [![Kilted Build](https://github.com/mgonzs13/llama_ros/actions/workflows/kilted-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/kilted-docker-build.yml?branch=main) | [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-kilted-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=kilted) | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/latest) | | **Rolling** | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) | [![Rolling Build](https://github.com/mgonzs13/llama_ros/actions/workflows/rolling-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/rolling-docker-build.yml?branch=main) | [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-rolling-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=rolling) | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/latest) |

Related Projects

chatbot_ros → This chatbot, integrated into ROS 2, uses whisper_ros, to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with YASMIN.
explainable_ros → A ROS 2 tool to explain the behavior of a robot. Using the integration of LangChain, logs are stored in a vector database. Then, RAG is applied to retrieve relevant logs for user questions answered with llama_ros.

Installation

To run llamaros with CUDA, first, you must install the CUDA Toolkit. Then, you can compile llamaros with --cmake-args -DGGML_CUDA=ON to enable CUDA support.

shell cd ~/ros2_ws/src git clone https://github.com/mgonzs13/llama_ros.git pip3 install -r llama_ros/requirements.txt cd ~/ros2_ws rosdep install --from-paths src --ignore-src -r -y colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

Docker

Build the llamaros docker or download an image from DockerHub. You can choose to build llamaros with CUDA (USE_CUDA) and choose the CUDA version (CUDA_VERSION). Remember that you have to use DOCKER_BUILDKIT=0 to compile llama_ros with CUDA when building the image.

shell DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

Run the docker container. If you want to use CUDA, you have to install the NVIDIA Container Tollkit and add --gpus all.

shell docker run -it --rm --gpus all llama_ros

Usage

llama_cli

Commands are included in llama_ros to speed up the test of GGUF-based LLMs within the ROS 2 ecosystem. This way, the following commands are integrating into the ROS 2 commands:

launch

Using this command launch a LLM from a YAML file. The configuration of the YAML is used to launch the LLM in the same way as using a regular launch file. Here is an example of how to use it:

shell ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml

prompt

Using this command send a prompt to a launched LLM. The command uses a string, which is the prompt and has the following arguments:

(-r, --reset): Whether to reset the LLM before prompting
(-t, --temp): The temperature value
(--image-url): Image url to sent to a VLM

Here is an example of how to use it:

shell ros2 llama prompt "Do you know ROS 2?" -t 0.0

Launch Files

First of all, you need to create a launch file to use llamaros or llavaros. This launch file will contain the main parameters to download the model from HuggingFace and configure it. Take a look at the following examples and the predefined launch files.

llama_ros (Python Launch)

Click to expand

```python from launch import LaunchDescription from llama_bringup.utils import create_llama_launch def generate_launch_description(): return LaunchDescription([ create_llama_launch( n_ctx=2048, # context of the LLM in tokens n_batch=8, # batch size in tokens n_gpu_layers=0, # layers to load in GPU n_threads=1, # threads n_predict=2048, # max tokens, -1 == inf model_repo="TheBloke/Marcoroni-7B-v3-GGUF", # Hugging Face repo model_filename="marcoroni-7b-v3.Q4_K_M.gguf", # model file in repo system_prompt_type="Alpaca" # system prompt type ) ]) ``` ```shell ros2 launch llama_bringup marcoroni.launch.py ```

llama_ros (YAML Config)

Click to expand

```yaml n_ctx: 2048 # context of the LLM in tokens n_batch: 8 # batch size in tokens n_gpu_layers: 0 # layers to load in GPU n_threads: 1 # threads n_predict: 2048 # max tokens, -1 == inf model_repo: "cstr/Spaetzle-v60-7b-GGUF" # Hugging Face repo model_filename: "Spaetzle-v60-7b-q4-k-m.gguf" # model file in repo system_prompt_type: "Alpaca" # system prompt type ``` ```python import os from launch import LaunchDescription from llama_bringup.utils import create_llama_launch_from_yaml from ament_index_python.packages import get_package_share_directory def generate_launch_description(): return LaunchDescription([ create_llama_launch_from_yaml(os.path.join( get_package_share_directory("llama_bringup"), "models", "Spaetzle.yaml")) ]) ``` ```shell ros2 launch llama_bringup spaetzle.launch.py ```

llama_ros (YAML Config + model shards)

Click to expand

```yaml n_ctx: 2048 # context of the LLM in tokens n_batch: 8 # batch size in tokens n_gpu_layers: 0 # layers to load in GPU n_threads: 1 # threads n_predict: 2048 # max tokens, -1 == inf model_repo: "Qwen/Qwen2.5-Coder-7B-Instruct-GGUF" # Hugging Face repo model_filename: "qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf" # model shard file in repo system_prompt_type: "ChatML" # system prompt type ``` ```shell ros2 llama launch Qwen2.yaml ```

llava_ros (Python Launch)

Click to expand

```python from launch import LaunchDescription from llama_bringup.utils import create_llama_launch def generate_launch_description(): return LaunchDescription([ create_llama_launch( use_llava=True, # enable llava n_ctx=8192, # context of the LLM in tokens, use a huge context size to load images n_batch=512, # batch size in tokens n_gpu_layers=33, # layers to load in GPU n_threads=1, # threads n_predict=8192, # max tokens, -1 == inf model_repo="cjpais/llava-1.6-mistral-7b-gguf", # Hugging Face repo model_filename="llava-v1.6-mistral-7b.Q4_K_M.gguf", # model file in repo mmproj_repo="cjpais/llava-1.6-mistral-7b-gguf", # Hugging Face repo mmproj_filename="mmproj-model-f16.gguf", # mmproj file in repo system_prompt_type="Mistral" # system prompt type ) ]) ``` ```shell ros2 launch llama_bringup llava.launch.py ```

llava_ros (YAML Config)

Click to expand

```yaml use_llava: True # enable llava n_ctx: 8192 # context of the LLM in tokens use a huge context size to load images n_batch: 512 # batch size in tokens n_gpu_layers: 33 # layers to load in GPU n_threads: 1 # threads n_predict: 8192 # max tokens -1 : : inf model_repo: "cjpais/llava-1.6-mistral-7b-gguf" # Hugging Face repo model_filename: "llava-v1.6-mistral-7b.Q4_K_M.gguf" # model file in repo mmproj_repo: "cjpais/llava-1.6-mistral-7b-gguf" # Hugging Face repo mmproj_filename: "mmproj-model-f16.gguf" # mmproj file in repo system_prompt_type: "mistral" # system prompt type ``` ```python def generate_launch_description(): return LaunchDescription([ create_llama_launch_from_yaml(os.path.join( get_package_share_directory("llama_bringup"), "models", "llava-1.6-mistral-7b-gguf.yaml")) ]) ``` ```shell ros2 launch llama_bringup llava.launch.py ```

llava_ros (Audio)

Click to expand

```yaml use_llava: True n_ctx: 8192 n_batch: 512 n_gpu_layers: 29 n_threads: -1 n_predict: 8192 model_repo: "mradermacher/Qwen2-Audio-7B-Instruct-GGUF" model_filename: "Qwen2-Audio-7B-Instruct.Q4_K_M.gguf" mmproj_repo: "mradermacher/Qwen2-Audio-7B-Instruct-GGUF" mmproj_filename: "Qwen2-Audio-7B-Instruct.mmproj-f16.gguf" system_prompt_type: "ChatML" ``` ```python def generate_launch_description(): return LaunchDescription([ create_llama_launch_from_yaml(os.path.join( get_package_share_directory("llama_bringup"), "models", "Qwen2-Audio.yaml")) ]) ``` ```shell ros2 launch llama_bringup llava.launch.py ```

LoRA Adapters

You can use LoRA adapters when launching LLMs. Using llama.cpp features, you can load multiple adapters choosing the scale to apply for each adapter. Here you have an example of using LoRA adapters with Phi-3. You can lis the LoRAs using the /llama/list_loras service and modify their scales values by using the /llama/update_loras service. A scale value of 0.0 means not using that LoRA.

Click to expand

```yaml n_ctx: 2048 n_batch: 8 n_gpu_layers: 0 n_threads: 1 n_predict: 2048 model_repo: "bartowski/Phi-3.5-mini-instruct-GGUF" model_filename: "Phi-3.5-mini-instruct-Q4_K_M.gguf" lora_adapters: - repo: "zhhan/adapter-Phi-3-mini-4k-instruct_code_writing" filename: "Phi-3-mini-4k-instruct-adaptor-f16-code_writer.gguf" scale: 0.5 - repo: "zhhan/adapter-Phi-3-mini-4k-instruct_summarization" filename: "Phi-3-mini-4k-instruct-adaptor-f16-summarization.gguf" scale: 0.5 system_prompt_type: "Phi-3" ```

ROS 2 Clients

Both llamaros and llavaros provide ROS 2 interfaces to access the main functionalities of the models. Here you have some examples of how to use them inside ROS 2 nodes. Moreover, take a look to the llamademonode.py and llavademonode.py demos.

Tokenize

Click to expand

```python from rclpy.node import Node from llama_msgs.srv import Tokenize class ExampleNode(Node): def __init__(self) -> None: super().__init__("example_node") # create the client self.srv_client = self.create_client(Tokenize, "/llama/tokenize") # create the request req = Tokenize.Request() req.text = "Example text" # call the tokenize service self.srv_client.wait_for_service() tokens = self.srv_client.call(req).tokens ```

Detokenize

Click to expand

```python from rclpy.node import Node from llama_msgs.srv import Detokenize class ExampleNode(Node): def __init__(self) -> None: super().__init__("example_node") # create the client self.srv_client = self.create_client(Detokenize, "/llama/detokenize") # create the request req = Detokenize.Request() req.tokens = [123, 123] # call the tokenize service self.srv_client.wait_for_service() text = self.srv_client.call(req).text ```

Embeddings

Click to expand

_Remember to launch llama_ros with embedding set to true to be able of generating embeddings with your LLM._ ```python from rclpy.node import Node from llama_msgs.srv import Embeddings class ExampleNode(Node): def __init__(self) -> None: super().__init__("example_node") # create the client self.srv_client = self.create_client(Embeddings, "/llama/generate_embeddings") # create the request req = Embeddings.Request() req.prompt = "Example text" req.normalize = True # call the embedding service self.srv_client.wait_for_service() embeddings = self.srv_client.call(req).embeddings ```

Generate Response

Click to expand

```python import rclpy from rclpy.node import Node from rclpy.action import ActionClient from llama_msgs.action import GenerateResponse class ExampleNode(Node): def __init__(self) -> None: super().__init__("example_node") # create the client self.action_client = ActionClient( self, GenerateResponse, "/llama/generate_response") # create the goal and set the sampling config goal = GenerateResponse.Goal() goal.prompt = self.prompt goal.sampling_config.temp = 0.2 # wait for the server and send the goal self.action_client.wait_for_server() send_goal_future = self.action_client.send_goal_async( goal) # wait for the server rclpy.spin_until_future_complete(self, send_goal_future) get_result_future = send_goal_future.result().get_result_async() # wait again and take the result rclpy.spin_until_future_complete(self, get_result_future) result: GenerateResponse.Result = get_result_future.result().result ```

Generate Response (llava)

Click to expand

```python import cv2 from cv_bridge import CvBridge import rclpy from rclpy.node import Node from rclpy.action import ActionClient from llama_msgs.action import GenerateResponse class ExampleNode(Node): def __init__(self) -> None: super().__init__("example_node") # create a cv bridge for the image self.cv_bridge = CvBridge() # create the client self.action_client = ActionClient( self, GenerateResponse, "/llama/generate_response") # create the goal and set the sampling config goal = GenerateResponse.Goal() goal.prompt = self.prompt goal.sampling_config.temp = 0.2 # add your image to the goal image = cv2.imread("/path/to/your/image", cv2.IMREAD_COLOR) goal.images.append(self.cv_bridge.cv2_to_imgmsg(image)) # wait for the server and send the goal self.action_client.wait_for_server() send_goal_future = self.action_client.send_goal_async( goal) # wait for the server rclpy.spin_until_future_complete(self, send_goal_future) get_result_future = send_goal_future.result().get_result_async() # wait again and take the result rclpy.spin_until_future_complete(self, get_result_future) result: GenerateResponse.Result = get_result_future.result().result ```

LangChain

There is a llama_ros integration for LangChain. Thus, prompt engineering techniques could be applied. Here you have an example to use it.

llama_ros (Chain)

Click to expand

```python import rclpy from llama_ros.langchain import LlamaROS from langchain.prompts import PromptTemplate from langchain_core.output_parsers import StrOutputParser rclpy.init() # create the llama_ros llm for langchain llm = LlamaROS() # create a prompt template prompt_template = "tell me a joke about {topic}" prompt = PromptTemplate( input_variables=["topic"], template=prompt_template ) # create a chain with the llm and the prompt template chain = prompt | llm | StrOutputParser() # run the chain text = chain.invoke({"topic": "bears"}) print(text) rclpy.shutdown() ```

llama_ros (Stream)

Click to expand

```python import rclpy from llama_ros.langchain import LlamaROS from langchain.prompts import PromptTemplate from langchain_core.output_parsers import StrOutputParser rclpy.init() # create the llama_ros llm for langchain llm = LlamaROS() # create a prompt template prompt_template = "tell me a joke about {topic}" prompt = PromptTemplate( input_variables=["topic"], template=prompt_template ) # create a chain with the llm and the prompt template chain = prompt | llm | StrOutputParser() # run the chain for c in chain.stream({"topic": "bears"}): print(c, flush=True, end="") rclpy.shutdown() ```

llava_ros

Click to expand

```python import rclpy from llama_ros.langchain import LlamaROS rclpy.init() # create the llama_ros llm for langchain llm = LlamaROS() # bind the url_image image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" llm = llm.bind(image_url=image_url).stream("Describe the image") # run the llm for c in llm: print(c, flush=True, end="") rclpy.shutdown() ```

llamarosembeddings (RAG)

Click to expand

```python import rclpy from langchain_chroma import Chroma from llama_ros.langchain import LlamaROSEmbeddings rclpy.init() # create the llama_ros embeddings for langchain embeddings = LlamaROSEmbeddings() # create a vector database and assign it db = Chroma(embedding_function=embeddings) # create the retriever retriever = db.as_retriever(search_kwargs={"k": 5}) # add your texts db.add_texts(texts=["your_texts"]) # retrieve documents documents = retriever.invoke("your_query") print(documents) rclpy.shutdown() ```

llama_ros (Renranker)

Click to expand

```python import rclpy from llama_ros.langchain import LlamaROSReranker from llama_ros.langchain import LlamaROSEmbeddings from langchain_community.vectorstores import FAISS from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.retrievers import ContextualCompressionRetriever rclpy.init() # load the documents documents = TextLoader("../state_of_the_union.txt",).load() text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=100) texts = text_splitter.split_documents(documents) # create the llama_ros embeddings embeddings = LlamaROSEmbeddings() # create the VD and the retriever retriever = FAISS.from_documents( texts, embeddings).as_retriever(search_kwargs={"k": 20}) # create the compressor using the llama_ros reranker compressor = LlamaROSReranker() compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) # retrieve the documents compressed_docs = compression_retriever.invoke( "What did the president say about Ketanji Jackson Brown" ) for doc in compressed_docs: print("-" * 50) print(doc.page_content) print("\n") rclpy.shutdown() ```

llama_ros (LLM + RAG + Reranker)

Click to expand

```python import bs4 import rclpy from langchain_chroma import Chroma from langchain_community.document_loaders import WebBaseLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_core.messages import SystemMessage from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.retrievers import ContextualCompressionRetriever from llama_ros.langchain import ChatLlamaROS, LlamaROSEmbeddings, LlamaROSReranker rclpy.init() # load, chunk and index the contents of the blog loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header")) ), ) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) splits = text_splitter.split_documents(docs) vectorstore = Chroma.from_documents(documents=splits, embedding=LlamaROSEmbeddings()) # retrieve and generate using the relevant snippets of the blog retriever = vectorstore.as_retriever(search_kwargs={"k": 20}) # create prompt prompt = ChatPromptTemplate.from_messages( [ SystemMessage("You are an AI assistant that answer questions briefly."), HumanMessagePromptTemplate.from_template( "Taking into account the followin information:{context}\n\n{question}" ), ] ) # create rerank compression retriever compressor = LlamaROSReranker(top_n=3) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) def format_docs(docs): formated_docs = "" for d in docs: formated_docs += f"\n\n\t- {d.page_content}" return formated_docs # create and use the chain rag_chain = ( {"context": compression_retriever | format_docs, "question": RunnablePassthrough()} | prompt | ChatLlamaROS(temp=0.0) | StrOutputParser() ) for c in rag_chain.stream("What is Task Decomposition?"): print(c, flush=True, end="") rclpy.shutdown() ```

chatllamaros (Chat + VLM)

Click to expand

```python import rclpy from llama_ros.langchain import ChatLlamaROS from langchain_core.messages import SystemMessage from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate from langchain_core.output_parsers import StrOutputParser rclpy.init() # create chat chat = ChatLlamaROS( temp=0.2, penalty_last_n=8 ) # create prompt template with messages prompt = ChatPromptTemplate.from_messages([ SystemMessage("You are a IA that just answer with a single word."), HumanMessagePromptTemplate.from_template(template=[ {"type": "text", "text": "<__media__>Who is the character in the middle of the image?"}, {"type": "image_url", "image_url": "{image_url}"} ]) ]) # create the chain chain = prompt | chat | StrOutputParser() # stream and print the LLM output for text in chain.stream({"image_url": "https://pics.filmaffinity.com/Dragon_Ball_Bola_de_Dragaon_Serie_de_TV-973171538-large.jpg"}): print(text, end="", flush=True) print("", end="\n", flush=True) rclpy.shutdown() ```

chatllamaros (Chat + Audio)

Click to expand

```python import sys import time import rclpy from langchain_core.messages import SystemMessage from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate from langchain_core.output_parsers import StrOutputParser from llama_ros.langchain import ChatLlamaROS def main(): if len(sys.argv) < 2: prompt = "What's that sound?" else: prompt = " ".join(sys.argv[1:]) tokens = 0 initial_time = -1 eval_time = -1 rclpy.init() chat = ChatLlamaROS(temp=0.0) prompt = ChatPromptTemplate.from_messages( [ SystemMessage("You are an IA that answer questions."), HumanMessagePromptTemplate.from_template( template=[ {"type": "text", "text": f"<__media__>{prompt}"}, {"type": "image_url", "image_url": "{audio_url}"}, ] ), ] ) chain = prompt | chat | StrOutputParser() initial_time = time.time() for text in chain.stream( { "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/glass-breaking-151256.mp3" } ): tokens += 1 print(text, end="", flush=True) if eval_time < 0: eval_time = time.time() print("", end="\n", flush=True) end_time = time.time() print(f"Time to eval: {eval_time - initial_time} s") print(f"Prediction speed: {tokens / (end_time - eval_time)} t/s") rclpy.shutdown() if __name__ == "__main__": main() ```

chatllamaros (Structured output)

Click to expand

```python import rclpy from langchain_core.messages import HumanMessage from llama_ros.langchain import ChatLlamaROS from pydantic import BaseModel, Field rclpy.init() class Joke(BaseModel): """Joke to tell user.""" setup: str = Field(description="The setup of the joke") punchline: str = Field(description="The punchline to the joke") rating: Optional[int] = Field( default=None, description="How funny the joke is, from 1 to 10" ) chat = ChatLlamaROS(temp=0.6, penalty_last_n=8) structured_chat = chat.with_structured_output( Joke, method="function_calling" ) prompt = ChatPromptTemplate.from_messages( [ HumanMessagePromptTemplate.from_template( template=[ {"type": "text", "text": "{prompt}"}, ] ), ] ) chain = prompt | structured_chat res = chain.invoke({"prompt": "Tell me a joke about cats"}) print(f"Response: {response.content.strip()}") rclpy.shutdown() ```

chatllamaros (Tools)

Click to expand

The current implementation of Tools allows executing tools without requiring a model trained for that task. ```python from random import randint import rclpy from langchain.tools import tool from langchain_core.messages import HumanMessage from llama_ros.langchain import ChatLlamaROS rclpy.init() @tool def get_inhabitants(city: str) -> int: """Get the current temperature of a city""" return randint(4_000_000, 8_000_000) @tool def get_curr_temperature(city: str) -> int: """Get the current temperature of a city""" return randint(20, 30) chat = ChatLlamaROS(temp=0.6, penalty_last_n=8) messages = [ HumanMessage( "What is the current temperature in Madrid? And its inhabitants?" ) ] llm_tools = chat.bind_tools( [get_inhabitants, get_curr_temperature], tool_choice='any' ) all_tools_res = llm_tools.invoke(messages) messages.append(all_tools_res) for tool in all_tools_res.tool_calls: selected_tool = { "get_inhabitants": get_inhabitants, "get_curr_temperature": get_curr_temperature }[tool['name']] tool_msg = selected_tool.invoke(tool) formatted_output = f"{tool['name']}({''.join(tool['args'].values())}) = {tool_msg.content}" tool_msg.additional_kwargs = {'args': tool['args']} messages.append(tool_msg) res = llm_tools.invoke(messages) print(f"Response: {res.content}") rclpy.shutdown() ```

chatllamaros (Reasoning)

Click to expand

A reasoning model is required, such as Deepseek R1 ```python import time from random import randint import rclpy from langchain_core.messages import HumanMessage from llama_ros.langchain import ChatLlamaROS rclpy.init() chat = ChatLlamaROS(temp=0.6, penalty_last_n=8) messages = [ HumanMessage( "Here we have a book, a laptop, 9 eggs and a nail. Please tell me how to stack them onto each other in a stable manner." ) ] res = chat.invoke(messages) print(f"Response: {res.content.strip()}") print(f"Reasoning: {res.additional_kwargs["reasoning_content"]}") rclpy.shutdown() ```

chatllamaros (LangGraph)

Click to expand

```python import time from random import randint import rclpy from langchain.tools import tool from langchain_core.messages import HumanMessage from langgraph.prebuilt import create_react_agent from llama_ros.langchain import ChatLlamaROS rclpy.init() @tool def get_inhabitants(city: str) -> int: """Get the current temperature of a city""" return randint(4_000_000, 8_000_000) @tool def get_curr_temperature(city: str) -> int: """Get the current temperature of a city""" return randint(20, 30) chat = ChatLlamaROS(temp=0.0) agent_executor = create_react_agent( self.chat, [get_inhabitants, get_curr_temperature] ) response = self.agent_executor.invoke( { "messages": [ HumanMessage( content="What is the current temperature in Madrid? And its inhabitants?" ) ] } ) print(f"Response: {response['messages'][-1].content}") rclpy.shutdown() ```

Demos

LLM Demo

shell ros2 launch llama_bringup spaetzle.launch.py

shell ros2 run llama_demos llama_demo_node

https://github.com/mgonzs13/llama_ros/assets/25979134/9311761b-d900-4e58-b9f8-11c8efefdac4

Embeddings Generation Demo

shell ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/bge-base-en-v1.5.yaml

shell ros2 run llama_demos llama_embeddings_demo_node

https://github.com/user-attachments/assets/7d722017-27dc-417c-ace7-bf6b747e4ced

Reranking Demo

shell ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/jina-reranker.yaml

shell ros2 run llama_demos llama_rerank_demo_node

https://github.com/user-attachments/assets/4b4adb4d-7c70-43ea-a2c1-9be57d211484

VLM Demo

shell ros2 launch llama_bringup minicpm-2.6.launch.py

shell ros2 run llama_demos llava_demo_node --ros-args -p prompt:="your prompt" -p image_url:="url of the image" -p use_image:="whether to send the image"

https://github.com/mgonzs13/llama_ros/assets/25979134/4a9ef92f-9099-41b4-8350-765336e3503c

Chat Template Demo

shell ros2 llama launch MiniCPM-2.6.yaml

Click to expand MiniCPM-2.6.yaml

```yaml use_llava: True n_ctx: 8192 n_batch: 512 n_gpu_layers: 20 n_threads: -1 n_predict: 8192 model_repo: "openbmb/MiniCPM-V-2_6-gguf" model_filename: "ggml-model-Q4_K_M.gguf" mmproj_repo: "openbmb/MiniCPM-V-2_6-gguf" mmproj_filename: "mmproj-model-f16.gguf" ```

shell ros2 run llama_demos chatllama_demo_node

ChatLlamaROS demo

Chat Structed Output Demo

shell ros2 llama launch Qwen2.yaml

shell ros2 run llama_demos chatllama_structured_demo_node

Structured Output ChatLlama

Chat Tools Demo

shell ros2 llama launch Qwen2.yaml

shell ros2 run llama_demos chatllama_tools_demo_node

Tools ChatLlama

Chat Reasoning Demo (DeepSeek-R1)

shell ros2 llama launch DeepSeek-R1.yaml

shell ros2 run llama_demos chatllama_reasoning_demo_node

DeepSeekR1 ChatLlama

Langgraph Demo

shell ros2 llama launch Qwen2.yaml

Click to expand Qwen2.yaml

```yaml _ctx: 4096 n_batch: 256 n_gpu_layers: 29 n_threads: -1 n_predict: -1 model_repo: "Qwen/Qwen2.5-Coder-7B-Instruct-GGUF" model_filename: "qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf" ```

shell ros2 run llama_demos chatllama_langgraph_demo_node

Langgraph ChatLlama

RAG Demo (LLM + chat template + RAG + Reranking + Stream)

shell ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/bge-base-en-v1.5.yaml

shell ros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/jina-reranker.yaml

shell ros2 llama launch Qwen2.yaml

Click to expand Qwen2.yaml

```yaml _ctx: 4096 n_batch: 256 n_gpu_layers: 29 n_threads: -1 n_predict: -1 model_repo: "Qwen/Qwen2.5-Coder-3B-Instruct-GGUF" model_filename: "qwen2.5-coder-3b-instruct-q4_k_m.gguf" stopping_words: ["<|im_end|>"] ```

shell ros2 run llama_demos llama_rag_demo_node

https://github.com/user-attachments/assets/b4e3957d-1f92-427b-a1a8-cfc76737c0d6

Owner

Name: Miguel Ángel González Santamarta
Login: mgonzs13
Kind: user
Location: León
Company: University of León

Twitter: miggsant
Repositories: 2
Profile: https://github.com/mgonzs13

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "González-Santamarta"
    given-names: "Miguel Á."
title: "llama_ros"
date-released: 2023-04-03
url: "https://github.com/mgonzs13/llama_ros"

GitHub Events

Total

Create event: 44
Issues event: 6
Release event: 32
Watch event: 69
Delete event: 20
Issue comment event: 25
Push event: 281
Pull request review comment event: 1
Pull request review event: 3
Pull request event: 30
Fork event: 11

Last Year

Create event: 44
Issues event: 6
Release event: 32
Watch event: 69
Delete event: 20
Issue comment event: 25
Push event: 281
Pull request review comment event: 1
Pull request review event: 3
Pull request event: 30
Fork event: 11

Committers

Last synced: 6 months ago

All Time

Total Commits: 877
Total Committers: 6
Avg Commits per committer: 146.167
Development Distribution Score (DDS): 0.017

Past Year

Commits: 389
Committers: 4
Avg Commits per committer: 97.25
Development Distribution Score (DDS): 0.033

Top Committers

Name	Email	Commits
Miguel Ángel González Santamarta	m**s@u**s	862
Alejandro González	5****4	11
smellslikeml	s****l	1
b 0 r h	b****h	1
Zahi Kakish	z**h@g**m	1
Alberto Tudela	a**a@g**m	1

Committer Domains (Top 20 + Academic)

unileon.es: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 9
Total pull requests: 42
Average time to close issues: 23 days
Average time to close pull requests: 2 days
Total issue authors: 8
Total pull request authors: 6
Average comments per issue: 1.0
Average comments per pull request: 0.81
Merged pull requests: 35
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 30
Average time to close issues: about 15 hours
Average time to close pull requests: 3 days
Issue authors: 5
Pull request authors: 4
Average comments per issue: 1.4
Average comments per pull request: 1.07
Merged pull requests: 25
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

kyle-redyeti (2)
racket405 (1)
walkhan (1)
Harry-patter (1)
shengyuwoo (1)
semihc (1)
tangyubbb (1)
aidenyao (1)

Pull Request Authors

agonzc34 (20)
mgonzs13 (9)
ajtudela (8)
b0rh (2)
smellslikeml (2)
zmk5 (1)

llama_ros

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

llama_ros

Table of Contents

Related Projects

Installation

Docker

Usage

llama_cli

launch

prompt

Launch Files

llama_ros (Python Launch)

llama_ros (YAML Config)

llama_ros (YAML Config + model shards)

llava_ros (Python Launch)

llava_ros (YAML Config)

llava_ros (Audio)

LoRA Adapters

ROS 2 Clients

Tokenize

Detokenize

Embeddings

Generate Response

Generate Response (llava)

LangChain

llama_ros (Chain)

llama_ros (Stream)

llava_ros

llamarosembeddings (RAG)

llama_ros (Renranker)

llama_ros (LLM + RAG + Reranker)

chatllamaros (Chat + VLM)

chatllamaros (Chat + Audio)

chatllamaros (Structured output)

chatllamaros (Tools)

chatllamaros (Reasoning)

chatllamaros (LangGraph)

Demos

LLM Demo

Embeddings Generation Demo

Reranking Demo

VLM Demo

Chat Template Demo

Chat Structed Output Demo

Chat Tools Demo

Chat Reasoning Demo (DeepSeek-R1)

Langgraph Demo

RAG Demo (LLM + chat template + RAG + Reranking + Stream)

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies