jarvis

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

https://github.com/microsoft/jarvis

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    5 of 23 committers (21.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

deep-learning platform pytorch

Keywords from Contributors

transformers large-language-models
Last synced: 6 months ago · JSON representation ·

Repository

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Basic Info
  • Host: GitHub
  • Owner: microsoft
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 39.7 MB
Statistics
  • Stars: 24,321
  • Watchers: 380
  • Forks: 2,047
  • Open Issues: 127
  • Releases: 0
Topics
deep-learning platform pytorch
Created almost 3 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation Security Support

README.md

JARVIS

arXiv Open in Spaces

The mission of JARVIS is to explore artificial general intelligence (AGI) and deliver cutting-edge research to the whole community.

What's New

  • [2024.01.15] We release Easytool for easier tool usage.
  • [2023.11.30] We release TaskBench for evaluating task automation capability of LLMs.
  • [2023.07.28] We are now in the process of planning evaluation and project rebuilding. We will release a new version of Jarvis in the near future.
  • [2023.07.24] We released a light langchain version of Jarvis. See here.
  • [2023.04.16] Jarvis now supports the OpenAI service on the Azure platform and the GPT-4 model.
  • [2023.04.06] We added the Gradio demo and built the web API for /tasks and /results in server mode.
    • The Gradio demo is now hosted on Hugging Face Space. (Build with inference_mode=hybrid and local_deployment=standard)
    • The Web API /tasks and /results access intermediate results for Stage #1: task planning and Stage #1-3: model selection with execution results. See here.
  • [2023.04.03] We added the CLI mode and provided parameters for configuring the scale of local endpoints.
    • You can enjoy a lightweight experience with Jarvis without deploying the models locally. See here.
    • Just run python awesome_chat.py --config configs/config.lite.yaml to experience it.
  • [2023.04.01] We updated a version of code for building.

Overview

Language serves as an interface for LLMs to connect numerous AI models for solving complicated AI tasks!

image

See our paper: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace, Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu and Yueting Zhuang (the first two authors contribute equally)

We introduce a collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors (from HuggingFace Hub). The workflow of our system consists of four stages: + Task Planning: Using ChatGPT to analyze the requests of users to understand their intention, and disassemble them into possible solvable tasks. + Model Selection: To solve the planned tasks, ChatGPT selects expert models hosted on Hugging Face based on their descriptions. + Task Execution: Invokes and executes each selected model, and return the results to ChatGPT. + Response Generation: Finally, using ChatGPT to integrate the prediction of all models, and generate responses.

System Requirements

Default (Recommended)

For configs/config.default.yaml:

  • Ubuntu 16.04 LTS
  • VRAM >= 24GB
  • RAM > 12GB (minimal), 16GB (standard), 80GB (full)
  • Disk > 284GB
    • 42GB for damo-vilab/text-to-video-ms-1.7b
    • 126GB for ControlNet
    • 66GB for stable-diffusion-v1-5
    • 50GB for others

Minimum (Lite)

For configs/config.lite.yaml:

  • Ubuntu 16.04 LTS
  • Nothing else

The configuration configs/config.lite.yaml does not require any expert models to be downloaded and deployed locally. However, it means that Jarvis is restricted to models running stably on HuggingFace Inference Endpoints.

Quick Start

First replace openai.key and huggingface.token in server/configs/config.default.yaml with your personal OpenAI Key and your Hugging Face Token, or put them in the environment variables OPENAI_API_KEY and HUGGINGFACE_ACCESS_TOKEN respectively. Then run the following commands:

For Server:

```bash

setup env

cd server conda create -n jarvis python=3.8 conda activate jarvis conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia pip install -r requirements.txt

download models. Make sure that git-lfs is installed.

cd models bash download.sh # required when inference_mode is local or hybrid.

run server

cd .. python modelsserver.py --config configs/config.default.yaml # required when `inferencemodeislocalorhybrid python awesome_chat.py --config configs/config.default.yaml --mode server # for text-davinci-003 ``

Now you can access Jarvis' services by the Web API.

  • /hugginggpt --method POST, access the full service.
  • /tasks --method POST, access intermediate results for Stage #1.
  • /results --method POST, access intermediate results for Stage #1-3.

For example:

```bash

request

curl --location 'http://localhost:8004/tasks' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ { "role": "user", "content": "based on pose of /examples/d.jpg and content of /examples/e.jpg, please show me a new image" } ] }'

response

[{"args":{"image":"/examples/d.jpg"},"dep":[-1],"id":0,"task":"openpose-control"},{"args":{"image":"/examples/e.jpg"},"dep":[-1],"id":1,"task":"image-to-text"},{"args":{"image":"-0","text":"-1"},"dep":[1,0],"id":2,"task":"openpose-text-to-image"}] ```

For Web:

We provide a user-friendly web page. After starting awesome_chat.py in a server mode, you can run the commands to communicate with Jarvis in your browser:

  • you need to install nodejs and npm first.
  • [ IMPORTANT ] if you are running the web client on another machine, you need set http://{LAN_IP_of_the_server}:{port}/ to HUGGINGGPT_BASE_URL of web/src/config/index.ts.
  • if you want to use the video generation feature, you need to compile ffmpeg manually with H.264.
  • you can switch to ChatGPT by double click on the setting icon!

bash cd web npm install npm run dev

```bash

Optional: Install ffmpeg

This command need be executed without errors.

LDLIBRARYPATH=/usr/local/lib /usr/local/bin/ffmpeg -i input.mp4 -vcodec libx264 output.mp4 ```

For Gradio

The Gradio demo is now hosted on Hugging Face Space. You can also run the following commands to start the demo locally:

```bash python modelsserver.py --config configs/config.gradio.yaml python rungradio_demo.py --config configs/config.gradio.yaml

or run the HF Space as a Docker image (Build with inference_mode=hibrid and local_deployment=standard)

docker run -it -p 7860:7860 --platform=linux/amd64 registry.hf.space/microsoft-hugginggpt:latest python app.py ```

For CLI:

You can also run Jarvis more easily in CLI mode:

bash cd server python awesome_chat.py --config configs/config.default.yaml --mode cli

Examples of CLI mode:

Welcome to Jarvis! A collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors. Jarvis can plan tasks, schedule Hugging Face models, generate friendly responses based on your requests, and help you with many things. Please enter your request (`exit` to exit). [ User ]: Given a collection of image A: /examples/a.jpg, B: /examples/b.jpg, C: /examples/c.jpg, please tell me how many zebras in these picture? [ Jarvis ]: Based on your request, I have used several models to process the images you provided. For the image A: /examples/a.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "a cat sitting on a window sill looking out". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/a3dc.jpg', which contains two objects with labels 'potted plant' and 'cat'. For the image B: /examples/b.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "two zebras standing in a field of dry grass". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/f5df.jpg', which contains three objects with labels 'zebra'. For the image C: /examples/c.jpg, I used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text "a zebra standing in a field of tall grass". Then I used the object-detection model facebook/detr-resnet-50 to generate the image with predicted box '/images/69c7.jpg', which contains one object with label 'zebra'. Finally, I used the canny-control model canny-control to generate the image '/images/7e18.png' based on the image /examples/f.jpg. Therefore, there are four zebras in these pictures. Is there anything else I can help you with? [ User ]: Please answer all the named entities in the sentence: Iron Man is a superhero appearing in American comic books published by Marvel Comics. The character was co-created by writer and editor Stan Lee, developed by scripter Larry Lieber, and designed by artists Don Heck and Jack Kirby. [ Jarvis ]: Sure, I understand your request. Based on the inference results of the models, I will answer your request step by step. Firstly, I used the token-classification model dslim/bert-base-NER to recognize the named entities in the sentence. The inference result is that the named entities are Iron Man, Stan Lee, Larry Lieber, Don Heck and Jack Kirby. Then, I used the question-answering model deepset/roberta-base-squad2 to answer your request. The inference result is that there is no output since the context cannot be empty. Therefore, I cannot make it. I hope this answer your request. Is there anything else I can help you with? [ User ]: Please generate a canny image based on /examples/f.jpg [ Jarvis ]: Sure. I have generated a canny image based on /examples/f.jpg. To do this, I first used the image-to-text model nlpconnect/vit-gpt2-image-captioning to generate the text description of the image, which is "a herd of giraffes and zebras grazing in a field". Then I used the canny-control model to generate the canny image of the original image, which is stored in the path "/images/1ea2.png". Finally, I used the canny-text-to-image model lllyasviel/sd-controlnet-canny to generate the canny image based on the text description and the canny image, which is stored in the path "/images/ba66.png". Do you need anything else?

Configuration

The server-side configuration file is server/configs/config.default.yaml, and some parameters are presented as follows:

  • model: LLM, currently supports text-davinci-003. We are working on integrating more open-source LLMs.
  • inference_mode: mode of inference endpoints
    • local: only use the local inference endpoints
    • huggingface: only use the Hugging Face Inference Endpoints (free of local inference endpoints)
    • hybrid: both of local and huggingface
  • local_deployment: scale of locally deployed models, works under local or hybrid inference mode:
    • minimal (RAM>12GB, ControlNet only)
    • standard (RAM>16GB, ControlNet + Standard Pipelines)
    • full (RAM>42GB, All registered models)

On a personal laptop, we recommend the configuration of inference_mode: hybridand local_deployment: minimal. But the available models under this setting may be limited due to the instability of remote Hugging Face Inference Endpoints.

NVIDIA Jetson Embedded Device Support

A Dockerfile is included that provides experimental support for NVIDIA Jetson embedded devices. This image provides accelerated ffmpeg, pytorch, torchaudio, and torchvision dependencies. To build the docker image, ensure that the default docker runtime is set to 'nvidia'. A pre-built image is provided at https://hub.docker.com/r/toolboc/nv-jarvis.

```bash

Build the docker image

docker build --pull --rm -f "Dockerfile.jetson" -t toolboc/nv-jarvis:r35.2.1 ```

Due to to memory requirements, JARVIS is required to run on Jetson AGX Orin family devices (64G on-board RAM device preferred) with config options set to: * inference_mode: local * local_deployment: standard

Models and configs are recommended to be provided through a volume mount from the host to the container as shown in the docker run step below. It is possible to uncomment the # Download local models section of the Dockerfile to build a container with models included.

Start the model server, awesomechat, and web app on Jetson Orin AGX

```bash

run the container which will automatically start the model server

docker run --name jarvis --net=host --gpus all -v ~/jarvis/configs:/app/server/configs -v ~/src/JARVIS/server/models:/app/server/models toolboc/nv-jarvis:r35.2.1

(wait for model server to complete initialization)

start awesome_chat.py

docker exec jarvis python3 awesome_chat.py --config configs/config.default.yaml --mode server

start the web application (application will be acessible at http://localhost:9999)

docker exec jarvis npm run dev --prefix=/app/web ```

Screenshots

Citation

If you find this work useful in your method, you can cite the paper as below:

@inproceedings{shen2023hugginggpt,
  author = {Shen, Yongliang and Song, Kaitao and Tan, Xu and Li, Dongsheng and Lu, Weiming and Zhuang, Yueting},
  booktitle = {Advances in Neural Information Processing Systems},
  title = {HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace},
  year = {2023}
}
@article{shen2023taskbench,
  title   = {TaskBench: Benchmarking Large Language Models for Task Automation},
  author  = {Shen, Yongliang and Song, Kaitao and Tan, Xu and Zhang, Wenqi and Ren, Kan and Yuan, Siyu and Lu, Weiming and Li, Dongsheng and Zhuang, Yueting},
  journal = {arXiv preprint arXiv:2311.18760},
  year    = {2023}
}


@article{yuan2024easytool,
  title   = {EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction},
  author  = {Siyu Yuan and Kaitao Song and Jiangjie Chen and Xu Tan and Yongliang Shen and Ren Kan and Dongsheng Li and Deqing Yang},
  journal = {arXiv preprint arXiv:2401.06201},
  year    = {2024}
}

Owner

  • Name: Microsoft
  • Login: microsoft
  • Kind: organization
  • Email: opensource@microsoft.com
  • Location: Redmond, WA

Open source projects and samples from Microsoft

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you find this work useful in your method, you can cite the paper as below."
authors:
  - family-names: Shen
    given-names: Yongliang
  - family-names: Song
    given-names: Kaitao
  - family-names: Tan
    given-names: Xu
  - family-names: Li
    given-names: Dongsheng
  - family-names: Lu
    given-names: Weiming
  - family-names: Zhuang
    given-names: Yueting
title: "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace"
version: 2.0.4
license: MIT
status: preprint
date-released: 2023-03-30
url: https://github.com/microsoft/JARVIS
preferred-citation:
  type: article
  authors:
    - family-names: Shen
      given-names: Yongliang
    - family-names: Song
      given-names: Kaitao
    - family-names: Tan
      given-names: Xu
    - family-names: Li
      given-names: Dongsheng
    - family-names: Lu
      given-names: Weiming
    - family-names: Zhuang
      given-names: Yueting
  journal: "arXiv preprint arXiv:2303.17580"
  title: "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace"
  year: 2023
  url: https://arxiv.org/abs/2303.17580

GitHub Events

Total
  • Commit comment event: 5
  • Issues event: 32
  • Watch event: 956
  • Issue comment event: 18
  • Pull request review event: 3
  • Pull request event: 2
  • Fork event: 110
Last Year
  • Commit comment event: 5
  • Issues event: 32
  • Watch event: 956
  • Issue comment event: 18
  • Pull request review event: 3
  • Pull request event: 2
  • Fork event: 110

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 89
  • Total Committers: 23
  • Avg Commits per committer: 3.87
  • Development Distribution Score (DDS): 0.573
Past Year
  • Commits: 11
  • Committers: 3
  • Avg Commits per committer: 3.667
  • Development Distribution Score (DDS): 0.273
Top Committers
Name Email Commits
Yongliang Shen s****l@z****n 38
Siyu Yuan s****1@m****n 11
KaiTao Song k****g@n****n 8
Microsoft Open Source m****e 5
Vacodwave 8****0@q****m 4
Zaki Machfj z****j@u****a 3
Mano Bharathi M 8****3 2
Xu Tan (谭旭) 4****u 2
toolboc t****c@g****m 2
0xcha05 1****5 1
Aryan Utkarsh 1****h 1
Boquan (Brian) Yin 5****t 1
Elie Brosset 3****a 1
Erik Dombi e****2@g****m 1
Ewout ter Hoeven E****n@s****l 1
Ikko Eltociear Ashimine e****r@g****m 1
Julien Chaumond j****n@h****o 1
Marc Weber m****4@g****m 1
Neil Goldader a****l@g****m 1
Ovik o****l@g****m 1
WangShaoyu1 1****9@1****m 1
Wei-Zao 8****o 1
minakokojima l****7@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 171
  • Total pull requests: 55
  • Average time to close issues: 9 days
  • Average time to close pull requests: about 1 month
  • Total issue authors: 141
  • Total pull request authors: 45
  • Average comments per issue: 2.02
  • Average comments per pull request: 0.73
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 26
  • Pull requests: 4
  • Average time to close issues: less than a minute
  • Average time to close pull requests: N/A
  • Issue authors: 24
  • Pull request authors: 2
  • Average comments per issue: 0.04
  • Average comments per pull request: 0.5
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ekiwi111 (6)
  • liujie316316 (6)
  • jiaenqiu (5)
  • HaroldMitts (3)
  • iwoomi (2)
  • lovelucymuch (2)
  • 1nnovat1on (2)
  • elgiguiere (2)
  • hsju2021 (2)
  • StevenChao55 (2)
  • harshsingh32 (2)
  • Combustible-material (2)
  • Pied-Piper1 (2)
  • lychees (2)
  • samqin123 (2)
Pull Request Authors
  • mGFROMTHE5031 (6)
  • toolboc (2)
  • ragizaki (2)
  • syGOAT (2)
  • sairam9963 (2)
  • moritzbrantner (2)
  • princepride (2)
  • HardwayLinka (2)
  • dependabot[bot] (2)
  • Wei-Zao (1)
  • Toubat (1)
  • Redskull-127 (1)
  • elgiguiere (1)
  • MAGceKa (1)
  • fastflair (1)
Top Labels
Issue Labels
enhancement (2)
Pull Request Labels
dependencies (2)

Dependencies

hugginggpt/web/electron/package.json npm
  • electron ^23.1.0 development
  • electron-builder ^23.6.0 development
hugginggpt/web/package.json npm
  • @types/node ^16.11.56 development
  • @vitejs/plugin-vue ^3.0.3 development
  • @vue/tsconfig ^0.1.3 development
  • autoprefixer ^10.4.12 development
  • npm-run-all ^4.1.5 development
  • postcss ^8.4.18 development
  • tailwindcss ^3.2.1 development
  • typescript ~4.7.4 development
  • vite ^3.2.5 development
  • vue-tsc ^0.40.7 development
  • axios ^1.3.4
  • vue ^3.2.38
  • vue-router ^4.1.5
hugginggpt/server/requirements.txt pypi
  • accelerate ==0.18.0
  • asteroid ==0.6.0
  • controlnet_aux 78efc716868a7f5669c288233d65b471f542ce40
  • datasets ==2.11.0
  • diffusers 8c530fc2f6a76a2aefb6b285dce6df1675092ac6
  • espnet ==202301
  • espnet_model_zoo ==0.1.7
  • flask ==2.2.3
  • flask_cors ==3.0.10
  • gradio ==3.24.1
  • pydub ==0.25.1
  • pytesseract ==0.3.10
  • speechbrain ==0.5.14
  • tiktoken ==0.3.3
  • timm ==0.6.13
  • transformers c612628045822f909020f7eb6784c79700813eda
  • typeguard ==2.13.3
  • waitress ==2.1.2
taskbench/requirements.txt pypi
  • Levenshtein ==0.21.1
  • aiohttp ==3.8.1
  • click ==8.0.4
  • datasets ==2.14.5
  • emoji ==2.8.0
  • matplotlib ==3.7.1
  • networkx ==2.8.4
  • numpy ==1.23.5
  • pandas ==1.2.4
  • requests ==2.28.1
  • scikit-learn ==1.0.2
  • scipy ==1.10.0