https://github.com/cliangyu/aibrowser

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 1 committers (100.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: cliangyu
Language: JavaScript
Default Branch: main
Size: 187 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme

GPT-4V-Act: Chromium Copilot

Important Note: As GPT-4V(ision) has not yet been made publicly available, this project necessitates an active ChatGPT Plus subscription for multimodal prompting access. It's worth noting that the tactics used by this project to tap into an unofficial GPT-4V API may contravene the associated ChatGPT Term of Service clause:

2. (c) Restrictions: You may not ... (iv) except as permitted through the API, use any automated or programmatic method to extract data or output from the Services, including scraping, web harvesting, or web data extraction;

GPT-4V-Act serves as an eloquent multimodal AI assistant that harmoniously combines GPT-4V(ision) with a web browser. It's designed to mirror the input and output of a human operatorprimarily screen feedback and low-level mouse/keyboard interaction. The objective is to foster a smooth transition between human-computer operations, facilitating the creation of tools that considerably boost the accessibility of any user interface (UI), aid workflow automation, and enable automated UI testing.

https://github.com/ddupont808/GPT-4V-Act/assets/3820588/fbcde8d1-a7d6-4089-95f6-fd099cc98a0d

How it works

GPT-4V-Act leverages both GPT-4V(ision) and Set-of-Mark Prompting, together with a tailored auto-labeler. This auto-labeler assigns a unique numerical ID to each interactable UI element.

By incorporating a task and a screenshot as input, GPT-4V-Act can deduce the subsequent action required to accomplish a task. For mouse/keyboard output, it can refer to the numerical labels for exact pixel coordinates.

Get Started! ```bash

Clone the repo

git clone https://github.com/ddupont808/GPT-4V-Act ai-browser

Navigate to the repo directory

cd ai-browser

Install the required packages

npm install

Start the demo

npm start ```

Features

Vision (Partial)
- JS DOM auto-labeler (w/ COCO export)
- AI auto-labeler
Clicking
Typing (Partial)
- Typing characters (letters, numbers, strings)
- Typing special keycodes (enter, pgup, pgdown)
Scrolling
Prompting user for more information
Remembering information relevant to task

If you have ideas, feedback, or want to contribute, feel free to create an Issue or reach out to ddupont@mit.edu

Demonstration Prompt

Below is an example of using the user-interface to instruct the agent to "play a random song for me"

This is the prompt seen by GPT-4V and the corresponding output

User

```markdown task: play a random song for me

type ClickAction = { action: "click", element: number } type TypeAction = { action: "type", element: number, text: string } type ScrollAction = { action: "scroll", direction: "up" | "down" } type RequestInfoFromUser = { action: "request-info", prompt: string } type RememberInfoFromSite = { action: "remember-info", info: string } type Done = { action: "done" }

response format

{ briefExplanation: string, nextAction: ClickAction | TypeAction | ScrollAction | RequestInfoFromUser | RememberInfoFromSite | Done }

instructions

observe the screenshot, and think about the next action

output your response in a json markdown code block

```

Assistant

json { "briefExplanation": "I'll type 'random song' into the search bar to find a song for you.", "nextAction": { "action": "type", "element": 7, "text": "random song" } }

Owner

Name: Liangyu Chen
Login: cliangyu
Kind: user
Location: Singapore
Company: Nanyang Technological University

Website: cliangyu.com
Twitter: cliangyu_
Repositories: 1
Profile: https://github.com/cliangyu

GitHub Events

Total

Last Year

Committers

Last synced: 11 months ago

All Time

Total Commits: 8
Total Committers: 1
Avg Commits per committer: 8.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Dillon DuPont	d**t@m**u	8

Committer Domains (Top 20 + Academic)

mit.edu: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/cliangyu/aibrowser

Science Score: 46.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

GPT-4V-Act: Chromium Copilot

How it works

Clone the repo

Navigate to the repo directory

Install the required packages

Start the demo

Features

Demonstration Prompt

User

response format

instructions

observe the screenshot, and think about the next action

output your response in a json markdown code block

Assistant

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels