https://github.com/ahwang16/grounded-intuition-gpt-vision
Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images
Science Score: 20.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
1 of 1 committers (100.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Keywords
Repository
Resources for Grounded Intuition of GPT-Vision's Abilities with Scientific Images
Basic Info
Statistics
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md

Overview
This is the GitHub repository for my recent article, Grounded Intuition of GPT-Vision's Abilities with Scientific Images.
~Coming soon: Colab notebook for running GPT-Vision on the API.~ Now available!
This paper contributes:
- an in-depth qualitative analysis of GPT-Vision's generations of images from scientific papers,
- a formalized procedure for qualitative analysis based on grounded theory and thematic analysis in social science/HCI literature, and
- our images and generated passages for further research and reproducibility.
We used two prompts to generate passages for each image:
- Write alt text to describe this <type>.
- Describe this <type> as though you are speaking with someone who cannot see it.
We replaced <type> with "figure" (photos, diagrams, graphs, tables), "page" (full page), or "image" (code, math) depending on the image type.
The images can be found in the images directory. Each file is named with the following convention:
<type>_<id>_<short-description>.png
with decimals in image IDs replaced by hyphens. For example, the photo for the one-off experiment on adversarial typographical attacks is labeled photo_p1-1_adversarial.png.
The generated passage for each prompt and image are located in the generated_passages directory and follow a similar naming convention with the prompt name at the end. The prompts for photo_p1-1_adversarial.png can be found in photo_p1-1_adversarial_alt.png and photo_p1-1_adversarial_desc.png.
We're on the news!
- As OpenAI's Multimodal API Launches Broadly, Research Shows It's Still Flawed, TechCrunch
- ChatGPT-Maker OpenAI Hosts its First Big Tech Showcase as the AI Startup Faces Growing Competition, Associated Press
Suggested citation
If you would like to cite the paper or repository, you can use
@misc{hwang_grounded_2023,
title={Grounded Intuition of GPT-Vision's Abilities with Scientific Images},
author={Alyssa Hwang and Andrew Head and Chris Callison-Burch},
year={2023},
eprint={2311.02069},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Owner
- Name: Alyssa Hwang
- Login: ahwang16
- Kind: user
- Company: University of Pennsylvania
- Website: alyssahwang.com
- Repositories: 2
- Profile: https://github.com/ahwang16
GitHub Events
Total
Last Year
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Alyssa Hwang | a****6@s****u | 12 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ahwang16 (1)