multitext-nllb-seed
https://github.com/common-parallel-corpora/multitext-nllb-seed
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: common-parallel-corpora
- Language: Makefile
- Default Branch: main
- Size: 19.8 MB
Statistics
- Stars: 0
- Watchers: 6
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
multi-text nllb-seed
This package provides a multi-text version of the original bi-text nllb-seed dataset. It contains: - a manually edited consensus eng_Latn reference file, - order_files matching the lines of each original data and the new eng_Latn reference, - (minimizing the edit distance between matched pairs) - re-ordered bitext data files, - and the resulting multitext nllb-seed dataset.

Ackowledgement
- Moussa Koulako Bala Doumbouya
- Abdoulaye Sow
- Christopher D Manning
Ackowledgement
This work was done as part of a project supported by N'ko USA Inc., Friasoft, Meta Platforms, Inc. and Stanford University.
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Owner
- Name: common-parallel-corpora
- Login: common-parallel-corpora
- Kind: organization
- Repositories: 1
- Profile: https://github.com/common-parallel-corpora