Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Repo of HawkLlama.
Basic Info
Statistics
- Stars: 16
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
# HawkLlama
[🤗**Huggingface Model**](https://huggingface.co/AIM-ZJU/HawkLlama_8b) | [🗂️**Github**](https://github.com/aim-uofa/VLModel) | [📖**Technical Report**](assets/technical_report.pdf) | [🎮️**Demo**](http://115.236.57.99:30020/)
Zhejiang University, China
This is the official implementation of HawkLlama, an open-source multimodal large language model designed for real-world vision and language understanding applications. Our model features the following highlights.
HawkLlama-8B is constructed utilizing:
- Llama3-8B, the latest open-source large language model, trained on over 15 trillion tokens.
- SigLIP, an enhancement over CLIP employing sigmoid loss, which achieves superior performance in image recognition.
- An efficient vision-language connector, designed to capture high-resolution details without increasing the number of visual tokens, helps reduce the training overhead associated with high-resolution images.
For model training, we utilize Llava-Pretrain dataset for pretraining and a mixed dataset specifically curated for instruction tuning, which contains both multimodal and language-only data for supervised fine-tuning.
HawkLlama-8B is developed on NeMo framework, which facilitates 3D parallelism and offers scalability potential for future extension.
Our model is open-source and reproducible. Please check our technical report for more details.
Contents
Setup
Create envoirment and activate it.
Shell conda create -n hawkllama python=3.10 -y conda activate hawkllamaClone and install this repo.
git clone https://github.com/aim-uofa/VLModel.git cd VLModel pip install -e . pip install -e third_party/VLMEvalKit
Model Weights
Please refer to our HuggingFace repository to download the pretrained model weights.
Inference
We provide an example code for inference.
```Python import torch from PIL import Image from HawkLlama.model import LlavaNextProcessor, LlavaNextForConditionalGeneration from HawkLlama.utils.conversation import convllavallama3, DEFAULTIMAGE_TOKEN
processor = LlavaNextProcessor.frompretrained("AIM-ZJU/HawkLlama8b")
model = LlavaNextForConditionalGeneration.frompretrained("AIM-ZJU/HawkLlama8b", torchdtype=torch.bfloat16, lowcpumemusage=True) model.to("cuda:0")
imagefile = "assets/coin.png" image = Image.open(imagefile).convert('RGB')
prompt = "what coin is that?" prompt = DEFAULTIMAGETOKEN + "\n" + prompt
conversation = convllavallama3.copy() userroleind = 0 botroleind = 1 conversation.appendmessage(conversation.roles[userroleind], prompt) conversation.appendmessage(conversation.roles[botroleind], "") prompt = conversation.getprompt() inputs = processor(prompt, image, returntensors="pt").to("cuda:0") inputs['pixelvalues'] = inputs['pixelvalues'].to(torch.bfloat16) output = model.generate(**inputs, eostokenid=processor.tokenizer.eostokenid, maxnewtokens=2048, dosample=False, use_cache=True)
print(processor.decode(output[0], skipspecialtokens=True)) ```
Evaluation
Evaluate is modified based on the VLMEval codebase.
``` bash
single gpu
python thirdparty/VLMEvalKit/run.py --data MMBenchDEVEN MMMUDEVVAL SEEDBenchIMG --model hawkllamallama3vlm --verbose
multi-gpus
torchrun --nproc-per-node=8 thirdparty/VLMEvalKit/run.py --data MMBenchDEVEN MMMUDEVVAL SEEDBenchIMG --model hawkllamallama3vlm --verbose ```
The results are shown below:
| Benchmark | Our Method | LLaVA-Llama3-v1.1 | LLaVA-Next | |-----------------|----------------|-------------------|------------| | MMMU val | 37.8 | 36.8 | 36.9 | | SEEDBench img | 71.0 | 70.1 | 70.0 | | MMBench-EN dev | 70.6 | 70.4 | 68.0 | | MMBench-CN dev | 64.4 | 64.2 | 60.6 | | CCBench | 33.9 | 31.6 | 24.7 | | AI2D test | 65.6 | 70.0 | 67.1 | | ScienceQA test | 76.1 | 72.9 | 70.4 | | HallusionBench | 41.0 | 47.7 | 35.2 | | MMStar | 43.0 | 45.1 | 38.1 |
Training
See train with NeMo.
Demo
Welcome to try our demo!
License
For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
Acknowledgements
We express our appreciation to the following projects for their outstanding contributions in academia and code development: LLaVA, NeMo, VLMEvalKit and xtuner.
Owner
- Name: Advanced Intelligent Machines (AIM)
- Login: aim-uofa
- Kind: organization
- Location: China
- Repositories: 23
- Profile: https://github.com/aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
GitHub Events
Total
- Watch event: 6
- Push event: 1
- Fork event: 1
Last Year
- Watch event: 6
- Push event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- accelerate *
- einops *
- huggingface_hub *
- matplotlib *
- numpy ==1.23.4
- omegaconf *
- openai ==1.3.5
- opencv-python >=4.4.0.46
- openpyxl *
- pandas >=1.5.3
- pillow *
- portalocker *
- protobuf *
- pycocoevalcap *
- python-dotenv *
- requests *
- rich *
- seaborn *
- sentencepiece *
- sty *
- tabulate *
- tiktoken *
- timeout-decorator *
- torch ==2.1.2
- tqdm *
- transformers *
- typing_extensions ==4.7.1
- validators *
- visual_genome *
- xlsxwriter *