jarvision

https://github.com/edmundnegan/jarvision

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: EdmundNegan
Language: Python
Default Branch: main
Size: 22.8 MB

Statistics

Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

JarVision: Intelligent Interface for Robotic Arm Control

UR3e Robotic Arm

JarVision is an intelligent interface developed for the UR3e robotic arm, integrating advanced computer vision and natural language processing (NLP) technologies. This project enables seamless human-robot interaction through natural language commands and real-time visual perception of the robot's environment.

Project Overview
Features
System Architecture
Technologies Used
Setup Instructions
Usage
Future Work
Acknowledgments

Project Overview

JarVision bridges the gap between humans and robots by combining natural language processing with computer vision to empower the UR3e robotic arm to interpret complex user commands, interact dynamically with its environment, and execute tasks with precision.

Features

Face and Object Detection: Utilizes OpenCV and YOLO algorithms for real-time face/object tracking and interaction.
Environment Description: Generates detailed, context-aware descriptions of surroundings using OpenAI’s Vision API.
Natural Language Command Processing: Leverages LangChain and OpenAI’s ChatGPT API to process nuanced user queries.
Modular Design: Tools like track_face and track_object are integrated via LangChain for scalable and maintainable functionality.

System Architecture

The system architecture includes the following components: 1. Video and Audio Input: Captures video feed from a camera mounted on the UR3e robotic arm and voice commands from microphone. 2. LangChain Agent: Processes user commands and determines the appropriate action. 3. Tools: Includes functionalities like face tracking, object tracking, environment description, and robot control. 4. Robot Control Output: Sends URScript commands to the robotic arm for task execution.

System Overview

Technologies Used

Programming Languages: Python
Computer Vision: OpenCV, YOLO
Natural Language Processing: OpenAI’s ChatGPT API, LangChain
Simulations: Universal Robots Offline Simulator (URSim)
Hardware: UR3e robotic arm
Others: Tavily API

Setup Instructions

Clone the repository: ```bash git clone https://github.com/EdmundNegan/JarVision.git
Navigate to the project directory: ```bash cd JarVision
Set up the Python environment: ```bash python -m venv jarvisionenv source jarvisionenv/bin/activate # For Linux/Mac jarvision_env\Scripts\activate # For Windows
Install the required dependencies: ```bash pip install -r requirements.txt

For Virtual Environment 5. Download the latest version of Oracle VirtualBox from https://www.virtualbox.org/ (or your preferred virtualization software).

Download the offline simulator for the UR3e robotic arm from Universal Robots https://www.universal-robots.com/download/software-e-series/simulator-non-linux/offline-simulator-e-series-ur-sim-for-non-linux-5126-lts/.
Set the virtual machine network type to "host only Ethernet adapter".
Find the IP address of the robot in URSim by clicking URSim UR3 and selecting the "About" option.
Create a .env file in the project root directory with the following format: ```plaintext OPENAIAPIKEY= TAVILYAPIKEY= HOST='' PORT='30002' VOSKMODEL='PATHTOVOSKMODEL'
Set the payload to an arbitrary number like 0.1kg and start the robot.
Run the main.py file to begin issuing commands to the robotic arm (select chatbot or voice mode and use default camera settings)

For Physical Robot 5. Connect UR3e robotic arm to power supply and start up system Start Robot

Connect UR3e robotic arm to host computer via ethernet cable and connect Azure Kinect camera to host computer via USB-C cable
Find the IP address of the robot in URSim by selecting the "About" option.
Create a .env file in the project root directory with the following format: ```plaintext OPENAIAPIKEY= TAVILYAPIKEY= HOST='' PORT='30002' VOSKMODEL='PATHTOVOSKMODEL'
Go to network settings and manually configure IP assignment to settings shown
Run the main.py file to begin issuing commands to the physical robotic arm (select chatbot or voice mode and use Kinect camera settings)

UR3e Physical Robot

Future Work

Upload short videos to the Vision API for enhanced environmental understanding.
Integrate tracking functionalities (trackface, trackobject) as modular LangChain tools.
Refine spatial awareness for precise distance measurements. (Uncomment depth configuration code in main.py and detection.py to edit)
Add gripper and mobility to robot

Acknowledgments

This project was developed as part of an academic project under the guidance of Dr. Yu Wu. It leverages the capabilities of cutting-edge technologies like OpenAI’s APIs and LangChain.

Special thanks to the Universal Robots community for providing simulation tools and the OpenCV community for their vision resources.

Owner

Name: Edmund Ngan
Login: EdmundNegan
Kind: user

Repositories: 1
Profile: https://github.com/EdmundNegan

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ngan"
  given-names: "Ngan"
title: "JarVision Github Repository"
version: 2.0.4
date-released: 2024-12-15
url: "https://github.com/EdmundNegan/JarVision"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science