https://github.com/claromes/toolazytowritealt

alt text for lazy people

https://github.com/claromes/toolazytowritealt

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

blip-2 clarifai language-model llm-hackathon streamlit vision-and-language
Last synced: 5 months ago · JSON representation

Repository

alt text for lazy people

Basic Info
  • Host: GitHub
  • Owner: claromes
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 4.46 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Archived
Topics
blip-2 clarifai language-model llm-hackathon streamlit vision-and-language
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Roadmap

README.md

🦥 too lazy to write alt

Streamlit LLM Hackathon general-english-image-caption-blip-2-6_7B

Generate and translate alt text using VLP and LLM.

too lazy... is mobile-friendly, allows multiple images to be uploaded via URL or directly, offers translation into multiple languages, and includes a "copy to clipboard" button for each generated alt text.

Model

BLIP-2 OPT 6.7B model is fine-tuned for the image captioning task using the ViT-g image encoder and the OPT language model with 6.7 billion parameters. The model uses the prompt "a photo of" as an initial input to the language model and is trained to generate the caption with the language modeling loss.

BLIP-2 is an innovative and resource-efficient approach to vision-language pre-training (VLP) that utilizes frozen pretrained image encoders and large language models (LLMs) (e.g. OPT, FlanT5).

Limitations

The BLIP-2 image captioning model inherits the limitations and risks of language models, such as outputting offensive language, propagating social bias, or leaking private information. The model's performance could also be unsatisfactory due to various reasons, including inaccurate knowledge from the language model, activating the incorrect reasoning path, or not having up-to-date information about new image content. Additionally, the model's performance could be limited by the quality and diversity of the training data, as well as the generalization ability to unseen images and captions. Remediation approaches include using instructions to guide the model's generation, training on a filtered dataset with harmful content removed, or fine-tuning the model on a specific domain or task to improve its performance.

Evaluation

BLIP-2 achieves state-of-the-art performance on various vision-language tasks while having a small amount of trainable parameters during pre-training, compared to other vision-language pre-training methods.

The model is fine-tuned for the image captioning task using the prompt "a photo of" as an initial input to the language model, and the model is trained to generate the caption with the language modeling loss.

Known Issue

The model takes some time to load, resulting in a negative user experience. A message has been added to notify users about the model loading and to request a new generation of alt text. This measure aims to temporarily address the issue.

Specs

Supported Formats

PNG, JPG, JFIF, TIFF, BMP, WEBP, JPEG, TIF

Upload limits

  • Up to 128 images
  • Up to 20MB per file

Screenshots

too lazy to write alt too lazy to write alt too lazy to write alt

Recorded Demo

recorded demo

Development

Requirements

  • Python 3.8+
  • Clarifai PAT

Installation

$ git clone git@github.com:claromes/toolazytowritealt.git

$ cd toolazytowritealt

$ pip install -r requirements.txt

Create .streamlit/secrets.toml file and add PAT='YOUR_PAT_GOES_HERE'

$ streamlit run app.py

Streamlit will be served at http://localhost:8501

Docs

License

GNU General Public License v3.0

Owner

  • Login: claromes
  • Kind: user

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 46
  • Total Committers: 2
  • Avg Commits per committer: 23.0
  • Development Distribution Score (DDS): 0.043
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Claromes c****s@h****m 44
claromes c****a@h****m 2
Committer Domains (Top 20 + Academic)
hey.com: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • clarifai ==9.8.0
  • googletrans ==4.0.0
  • streamlit ==1.26.0