Labbé, E., Pellegrini, T., & Pinquier, J. (2023). CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding. arxiv preprint. https://doi.org/10.48550/arXiv.2309.00454