Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: common-parallel-corpora
  • Language: Makefile
  • Default Branch: main
  • Size: 19.8 MB
Statistics
  • Stars: 0
  • Watchers: 6
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created about 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

multi-text nllb-seed

This package provides a multi-text version of the original bi-text nllb-seed dataset. It contains: - a manually edited consensus eng_Latn reference file, - order_files matching the lines of each original data and the new eng_Latn reference, - (minimizing the edit distance between matched pairs) - re-ordered bitext data files, - and the resulting multitext nllb-seed dataset.

Ackowledgement

  • Moussa Koulako Bala Doumbouya
  • Abdoulaye Sow
  • Christopher D Manning

Ackowledgement

This work was done as part of a project supported by N'ko USA Inc., Friasoft, Meta Platforms, Inc. and Stanford University.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Owner

  • Name: common-parallel-corpora
  • Login: common-parallel-corpora
  • Kind: organization

GitHub Events

Total
Last Year