https://github.com/bytedance/salmonn

SALMONN family: A suite of advanced multi-modal LLMs

https://github.com/bytedance/salmonn

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    3 of 6 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

audio audio-processing audio-visual-understanding bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university video video-understanding
Last synced: 5 months ago · JSON representation

Repository

SALMONN family: A suite of advanced multi-modal LLMs

Basic Info
Statistics
  • Stars: 1,304
  • Watchers: 27
  • Forks: 101
  • Open Issues: 20
  • Releases: 0
Topics
audio audio-processing audio-visual-understanding bytedance iclr2024 icml-2024 large-language-models multi-modal music research speech speech-recognition tsinghua-university video video-understanding
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Code of conduct

README.md

SALMONN family: A suite of advanced multi-modal LLMs

🚀🚀 Welcome to the repo of SALMONN!

The SALMONN model family consists of a series of advanced multi-modal large language models. For more details, please refer to the corresponding branches.

🔥 News

  • [2025-07-08] We have opensourced video-SALMONN 2! video-SALMONN 2 is a powerful audio-visual LLM that generates high-quality audio-visual video captions and achieves competitive performance on general video QA benchmarks.
  • [2025-06-01] We have opensourced QualiSpeech dataset - A speech quality assessment dataset with natural language reasoning. You can use QualiSpeech to develop your own audio LLM for speech quality assessment or to evaluate the low-level speech perception capabilities of existing audio LLMs. Feel free to download it here!
  • [2025-03-03] We have released the data processing scripts and finetuned model checkpoints for SALMONN for speech quality assessment! See here!
  • [2024-09-04] We have released the model and inference code for video-SALMONN! See here!
  • [2024-05-28] 🧳 We have released all the annotations (including 600k SQA/AQA data and 50k audio-based storytelling data) for the 3-stage training of SALMONN! Feel free to download them here!
  • [2024-04-07] 🤖 We have released all the codes you need to train your own SALMONN! Try some cool things!
  • [2024-01-16] 💖 Our paper was accepted by ICLR 2024!
  • [2023-11-13] 🎁 We have released a 7B version of SALMONN at tsinghua-ee/SALMONN-7B and built the 7B demo here!
  • [2023-10-08] ✨ We have released the model checkpoint and the inference code for SALMONN-13B!

📖 Paper List

``` @inproceedings{ sun2025videosalmonno1, title={{video-SALMONN-o1}: Reasoning-enhanced Audio-visual Large Language Model}, author={Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun MA, Chao Zhang}, booktitle={ICML}, year={2025} }

@article{tang2025video, title={{video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models}}, author={Changli Tang and Yixuan Li and Yudong Yang and Jimin Zhuang and Guangzhi Sun and Wei Li and Zejun Ma and Chao Zhang}, journal={arXiv preprint arXiv:2506.15220}, year={2025}, }

@inproceedings{wang2024enabling, title={Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation}, author={Wang, Siyin and Yu, Wenyi and Yang, Yudong and Tang, Changli and Li, Yixuan and Zhuang, Jimin and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others}, booktitle={Proc. ICASSP}, address={Hyderabad}, year={2025} }

@inproceedings{wang2024enabling, title={QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions}, author={Wang, Siyin and Yu, Wenyi and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others}, booktitle={Proc. ACL}, address={Vienna}, year={2025} }

@inproceedings{ sun2024videosalmonn, title={video-{SALMONN}: Speech-Enhanced Audio-Visual Large Language Models}, author={Guangzhi Sun and Wenyi Yu and Changli Tang and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Yuxuan Wang and Chao Zhang}, booktitle={Forty-first International Conference on Machine Learning}, year={2024}, url={https://openreview.net/forum?id=nYsh5GFIqX} }

@inproceedings{ tang2024salmonn, title={SALMONN: Towards Generic Hearing Abilities for Large Language Models}, author={Changli Tang and Wenyi Yu and Guangzhi Sun and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Chao Zhang}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=14rn7HpKVk} } ```

Owner

  • Name: Bytedance Inc.
  • Login: bytedance
  • Kind: organization
  • Location: Singapore

GitHub Events

Total
  • Create event: 5
  • Commit comment event: 1
  • Issues event: 52
  • Watch event: 264
  • Delete event: 3
  • Member event: 1
  • Issue comment event: 59
  • Push event: 23
  • Pull request review event: 1
  • Pull request event: 17
  • Fork event: 27
Last Year
  • Create event: 5
  • Commit comment event: 1
  • Issues event: 52
  • Watch event: 264
  • Delete event: 3
  • Member event: 1
  • Issue comment event: 59
  • Push event: 23
  • Pull request review event: 1
  • Pull request event: 17
  • Fork event: 27

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 58
  • Total Committers: 6
  • Avg Commits per committer: 9.667
  • Development Distribution Score (DDS): 0.552
Past Year
  • Commits: 34
  • Committers: 4
  • Avg Commits per committer: 8.5
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Changli Tang 8****6 26
Yu-Doit 5****t 14
Brian Sun g****4@n****k 8
Brian Sun g****4@c****k 5
chan-ming 6****g 4
tangchangli t****i@b****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 78
  • Total pull requests: 40
  • Average time to close issues: 23 days
  • Average time to close pull requests: 20 days
  • Total issue authors: 61
  • Total pull request authors: 19
  • Average comments per issue: 1.18
  • Average comments per pull request: 0.3
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 36
  • Pull requests: 22
  • Average time to close issues: 8 days
  • Average time to close pull requests: about 8 hours
  • Issue authors: 30
  • Pull request authors: 7
  • Average comments per issue: 0.86
  • Average comments per pull request: 0.0
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cathyliucx (4)
  • yt605155624 (4)
  • SaraAlthubaiti (3)
  • David19970306 (3)
  • zhanghanweii (3)
  • peggyxpxu (2)
  • deniro21 (2)
  • mohitd404 (2)
  • Dinxin (2)
  • qixueweigitbub (2)
  • URRealHero (1)
  • JustinYuu (1)
  • tuanad121 (1)
  • andeyeluguo (1)
  • ridingmower (1)
Pull Request Authors
  • BriansIDP (7)
  • shubham-gupta-30 (4)
  • hawkoli1987 (4)
  • TCL606 (4)
  • apu52 (3)
  • teinhonglo (2)
  • mohitd404 (2)
  • chan-ming (2)
  • alienishi (2)
  • ayushrakesh (1)
  • cotitan (1)
  • HimanshuMahto (1)
  • eltociear (1)
  • Killer2OP (1)
  • denglelaibh (1)
Top Labels
Issue Labels
documentation (2)
Pull Request Labels
documentation (7) enhancement (1)