mm-react

Official repo for MM-REACT

https://github.com/microsoft/mm-react

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    4 of 136 committers (2.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords from Contributors

agents application fine-tuning llamaindex multi-agents rag vector-database cryptocurrencies closember cohere
Last synced: 10 months ago · JSON representation ·

Repository

Official repo for MM-REACT

Basic Info
Statistics
  • Stars: 955
  • Watchers: 19
  • Forks: 67
  • Open Issues: 20
  • Releases: 0
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation Security

README.md

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

:fire: News

  • [2023.04.12] Incoming changes by 2023.04.20: Update LLM to Azure OpenAI API for GPT4.0
  • [2023.03.27] Incoming changes by 2023.03.31: Update LLM to Azure OpenAI API for GPT3.5 Turbo
  • [2023.03.21] We build MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.
  • [2023.03.21] Feel free to explore various demo videos on our website!
  • [2023.03.21] Try our live demo!

:notes: Introduction

MM-REACT teaser MM-REACT allocates specialized vision experts with ChatGPT to solve challenging visual understanding tasks through multimodal reasoning and action.

MM-ReAct Design

design * To enable the image as input, we simply use the file path as the input to ChatGPT. The file path functions as a placeholder, allowing ChatGPT to treat it as a black box. * Whenever a specific property, such as celebrity names or box coordinates, is required, ChatGPT is expected to seek help from a specific vision expert to identify the desired information. * The expert output is serialized as text and combined with the input to further activate ChatGPT. * If no external experts are needed, we directly return the response to the user.

Getting Started

MM-REACT code is bases on langchain.

Please refer to langchain for instructions on installation and documentation.

Additional packages needed for MM-REACT

bash pip install PIL imagesize

Here are the list of resources you need to set up in Azure and their environment variables

  1. Computer Vision service, for Tags, Objects, Faces and Celebrity.

bash export IMUN_URL="https://yourazureendpoint.cognitiveservices.azure.com/vision/v3.2/analyze" export IMUN_PARAMS="visualFeatures=Tags,Objects,Faces" export IMUN_CELEB_URL="https://yourazureendpoint.cognitiveservices.azure.com/vision/v3.2/models/celebrities/analyze" export IMUN_CELEB_PARAMS="" export IMUN_SUBSCRIPTION_KEY=

  1. Computer Vision service for dense captioning. With a potentially different subscription key (e.g. westus region supports this)

bash export IMUN_URL2="https://yourazureendpoint.cognitiveservices.azure.com/computervision/imageanalysis:analyze" export IMUN_PARAMS2="api-version=2023-02-01-preview&model-version=latest&features=denseCaptions" export IMUN_SUBSCRIPTION_KEY2=

  1. Form Recogizer (OCR) prebuilt services

bash export IMUN_OCR_READ_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-read:analyze" export IMUN_OCR_RECEIPT_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-receipt:analyze" export IMUN_OCR_BC_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-businessCard:analyze" export IMUN_OCR_LAYOUT_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze" export IMUN_OCR_INVOICE_URL="https://yourazureendpoint.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-invoice:analyze" export IMUN_OCR_PARAMS="api-version=2022-08-31" export IMUN_OCR_SUBSCRIPTION_KEY=

  1. Bing search service

bash export BING_SEARCH_URL="https://api.bing.microsoft.com/v7.0/search" export BING_SUBSCRIPTION_KEY=

  1. Bing visual search service (available on a separate pricing)

bash export BING_VIS_SEARCH_URL="https://api.bing.microsoft.com/v7.0/images/visualsearch" export BING_SUBSCRIPTION_KEY_VIS=

  1. Azure OpenAI service

bash export OPENAI_API_TYPE=azure export OPENAI_API_VERSION=2022-12-01 export OPENAI_API_BASE=https://yourazureendpoint.openai.azure.com/ export OPENAI_API_KEY=

Note: At the time of writing, we use and test against private endpoint. The public endpoint is now released and we plan to add support for it later.

  1. Photo editting local service

bash export PHOTO_EDIT_ENDPOINT_URL="http://127.0.0.1:123/" export PHOTO_EDIT_ENDPOINT_URL_SHORT=127.0.0.1

Sample code to run conversational-mm-assistant agent against an image

conversational-mm-assistant sample

Acknowledgement

We are highly inspired by langchain.

Citation

@article{yang2023mmreact, author = {Zhengyuan Yang* and Linjie Li* and Jianfeng Wang* and Kevin Lin* and Ehsan Azarnasab* and Faisal Ahmed* and Zicheng Liu and Ce Liu and Michael Zeng and Lijuan Wang^}, title = {MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action}, publisher = {arXiv}, year = {2023}, }

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner

  • Name: Microsoft
  • Login: microsoft
  • Kind: organization
  • Email: opensource@microsoft.com
  • Location: Redmond, WA

Open source projects and samples from Microsoft

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Chase"
  given-names: "Harrison"
title: "LangChain"
date-released: 2022-10-17
url: "https://github.com/hwchase17/langchain"

GitHub Events

Total
  • Issues event: 1
  • Watch event: 28
  • Fork event: 2
Last Year
  • Issues event: 1
  • Watch event: 28
  • Fork event: 2

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 719
  • Total Committers: 136
  • Avg Commits per committer: 5.287
  • Development Distribution Score (DDS): 0.396
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Harrison Chase h****7@g****m 434
Ehsan Azar e****r@m****m 32
Samantha Whitmore w****a@g****m 17
Francisco Ingham f****m@g****m 9
Ankush Gola 9****1 9
Ikko Eltociear Ashimine e****r@g****m 8
Matt Robinson m****n@u****o 7
Nicolas n****9@g****m 6
Microsoft Open Source m****e 5
Hunter Gerlach H****h 4
Yong723 5****3 4
mrbean 4****n 4
Eugene Yurtsev e****v@g****m 4
blob42 c****t@b****z 4
Delip Rao d****p 4
Keiji Kanazawa g****a 3
Linjie Li l****2 3
Kacper Łukawski k****i 3
Nicholas Larus-Stone n****s@g****m 3
Sasmitha Manathunga 7****1 3
Zach Schillaci 4****7 3
Jon Luo 2****o 3
Dennis Antela Martinez d****a@g****m 3
Anton Troynikov a****n 3
Amos Ng me@a****g 3
Akash Samant 7****1 3
Scott Leibrand s****d@g****m 2
Sam Hogan s****r@g****m 2
Smit Shah w****8@g****m 2
Steven Hoelscher s****h@u****u 2
and 106 more...

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 16
  • Total pull requests: 59
  • Average time to close issues: 7 months
  • Average time to close pull requests: 10 days
  • Total issue authors: 9
  • Total pull request authors: 5
  • Average comments per issue: 1.56
  • Average comments per pull request: 0.15
  • Merged pull requests: 26
  • Bot issues: 0
  • Bot pull requests: 29
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • zairm21 (2)
  • jun0wanan (1)
  • johncorring (1)
  • krrishdholakia (1)
  • aryansid (1)
  • hui-tony-zk (1)
  • fengyuli-dev (1)
  • lukestanley (1)
  • yeyu2 (1)
Pull Request Authors
  • dashesy (23)
  • dependabot[bot] (17)
  • eltociear (1)
  • fproulx-boostsecurity (1)
  • avsthiago (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (17)