https://github.com/bytedance/salmonn
SALMONN family: A suite of advanced multi-modal LLMs
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
3 of 6 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Repository
SALMONN family: A suite of advanced multi-modal LLMs
Basic Info
- Host: GitHub
- Owner: bytedance
- License: apache-2.0
- Default Branch: main
- Homepage: https://bytedance.github.io/SALMONN/
- Size: 58.4 MB
Statistics
- Stars: 1,304
- Watchers: 27
- Forks: 101
- Open Issues: 20
- Releases: 0
Topics
Metadata Files
README.md
SALMONN family: A suite of advanced multi-modal LLMs

🚀🚀 Welcome to the repo of SALMONN!
The SALMONN model family consists of a series of advanced multi-modal large language models. For more details, please refer to the corresponding branches.
- [ICML 2025] video-SALMONN-o1
- video-SALMONN 2
- [ICASSP 2025 & ACL 2025] SALMONN for speech quality assessment
- [ICML 2024] video-SALMONN
- [ICLR 2024] SALMONN
🔥 News
- [2025-07-08] We have opensourced video-SALMONN 2! video-SALMONN 2 is a powerful audio-visual LLM that generates high-quality audio-visual video captions and achieves competitive performance on general video QA benchmarks.
- [2025-06-01] We have opensourced QualiSpeech dataset - A speech quality assessment dataset with natural language reasoning. You can use QualiSpeech to develop your own audio LLM for speech quality assessment or to evaluate the low-level speech perception capabilities of existing audio LLMs. Feel free to download it here!
- [2025-03-03] We have released the data processing scripts and finetuned model checkpoints for SALMONN for speech quality assessment! See here!
- [2024-09-04] We have released the model and inference code for video-SALMONN! See here!
- [2024-05-28] 🧳 We have released all the annotations (including 600k SQA/AQA data and 50k audio-based storytelling data) for the 3-stage training of SALMONN! Feel free to download them here!
- [2024-04-07] 🤖 We have released all the codes you need to train your own SALMONN! Try some cool things!
- [2024-01-16] 💖 Our paper was accepted by ICLR 2024!
- [2023-11-13] 🎁 We have released a 7B version of SALMONN at tsinghua-ee/SALMONN-7B and built the 7B demo here!
- [2023-10-08] ✨ We have released the model checkpoint and the inference code for SALMONN-13B!
📖 Paper List
``` @inproceedings{ sun2025videosalmonno1, title={{video-SALMONN-o1}: Reasoning-enhanced Audio-visual Large Language Model}, author={Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun MA, Chao Zhang}, booktitle={ICML}, year={2025} }
@article{tang2025video, title={{video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models}}, author={Changli Tang and Yixuan Li and Yudong Yang and Jimin Zhuang and Guangzhi Sun and Wei Li and Zejun Ma and Chao Zhang}, journal={arXiv preprint arXiv:2506.15220}, year={2025}, }
@inproceedings{wang2024enabling, title={Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation}, author={Wang, Siyin and Yu, Wenyi and Yang, Yudong and Tang, Changli and Li, Yixuan and Zhuang, Jimin and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others}, booktitle={Proc. ICASSP}, address={Hyderabad}, year={2025} }
@inproceedings{wang2024enabling, title={QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions}, author={Wang, Siyin and Yu, Wenyi and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others}, booktitle={Proc. ACL}, address={Vienna}, year={2025} }
@inproceedings{ sun2024videosalmonn, title={video-{SALMONN}: Speech-Enhanced Audio-Visual Large Language Models}, author={Guangzhi Sun and Wenyi Yu and Changli Tang and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Yuxuan Wang and Chao Zhang}, booktitle={Forty-first International Conference on Machine Learning}, year={2024}, url={https://openreview.net/forum?id=nYsh5GFIqX} }
@inproceedings{ tang2024salmonn, title={SALMONN: Towards Generic Hearing Abilities for Large Language Models}, author={Changli Tang and Wenyi Yu and Guangzhi Sun and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Chao Zhang}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=14rn7HpKVk} } ```
Owner
- Name: Bytedance Inc.
- Login: bytedance
- Kind: organization
- Location: Singapore
- Website: https://opensource.bytedance.com
- Twitter: ByteDanceOSS
- Repositories: 255
- Profile: https://github.com/bytedance
GitHub Events
Total
- Create event: 5
- Commit comment event: 1
- Issues event: 52
- Watch event: 264
- Delete event: 3
- Member event: 1
- Issue comment event: 59
- Push event: 23
- Pull request review event: 1
- Pull request event: 17
- Fork event: 27
Last Year
- Create event: 5
- Commit comment event: 1
- Issues event: 52
- Watch event: 264
- Delete event: 3
- Member event: 1
- Issue comment event: 59
- Push event: 23
- Pull request review event: 1
- Pull request event: 17
- Fork event: 27
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Changli Tang | 8****6 | 26 |
| Yu-Doit | 5****t | 14 |
| Brian Sun | g****4@n****k | 8 |
| Brian Sun | g****4@c****k | 5 |
| chan-ming | 6****g | 4 |
| tangchangli | t****i@b****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 78
- Total pull requests: 40
- Average time to close issues: 23 days
- Average time to close pull requests: 20 days
- Total issue authors: 61
- Total pull request authors: 19
- Average comments per issue: 1.18
- Average comments per pull request: 0.3
- Merged pull requests: 19
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 36
- Pull requests: 22
- Average time to close issues: 8 days
- Average time to close pull requests: about 8 hours
- Issue authors: 30
- Pull request authors: 7
- Average comments per issue: 0.86
- Average comments per pull request: 0.0
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- cathyliucx (4)
- yt605155624 (4)
- SaraAlthubaiti (3)
- David19970306 (3)
- zhanghanweii (3)
- peggyxpxu (2)
- deniro21 (2)
- mohitd404 (2)
- Dinxin (2)
- qixueweigitbub (2)
- URRealHero (1)
- JustinYuu (1)
- tuanad121 (1)
- andeyeluguo (1)
- ridingmower (1)
Pull Request Authors
- BriansIDP (7)
- shubham-gupta-30 (4)
- hawkoli1987 (4)
- TCL606 (4)
- apu52 (3)
- teinhonglo (2)
- mohitd404 (2)
- chan-ming (2)
- alienishi (2)
- ayushrakesh (1)
- cotitan (1)
- HimanshuMahto (1)
- eltociear (1)
- Killer2OP (1)
- denglelaibh (1)