Updated 6 months ago

https://github.com/commoncrawl/web-languages • Rank 7.7 • Science 26%

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

Updated 6 months ago

colibri-utils • Rank 1.8 • Science 26%

NLP utilities that rely on Colibri Core: currently only language identification