Updated 6 months ago

multilang-probe • Rank 3.6 • Science 77%

A solution to detect languages and type characters in a multilingual setting.

Updated 6 months ago

llmebench • Rank 10.5 • Science 54%

Benchmarking Large Language Models

Updated 6 months ago

polydedupe • Rank 6.4 • Science 57%

PolyDeDupe: Multi-Lingual Data Deduplication

Updated 6 months ago

turkish-question-generation • Rank 3.9 • Science 57%

Automated question generation and question answering from Turkish texts using text-to-text transformers

Updated 6 months ago

scribesalad • Rank 4.4 • Science 26%

A collection of YouTube videos transcripts : Podcasts (Joe Rogan Experience, Tim Ferris, Jocko podcast, ..), lectures (YaleCourses, MIT lectures, ..). A big transcripts salad spanning history, geography, science, politics, film making and more.

Updated 5 months ago

https://github.com/bigscience-workshop/data-preparation • Rank 8.1 • Science 13%

Code used for sourcing and cleaning the BigScience ROOTS corpus

Updated 6 months ago

allophant • Science 44%

A multilingual phoneme recognizer capable of generalizing zero-shot to unseen phoneme inventories.

Updated 6 months ago

universalpython • Science 52%

Write Python in any human language. UniversalPython is a transpiler which makes it possible to write Python code in different human languages like Urdu, German, Czech, and more. The code is translated to Python.

Updated 5 months ago

https://github.com/ai4bharat/indicinstruct • Science 10%

Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"

Updated 5 months ago

https://github.com/asreview/asreview-multilingual-feature-extractor • Science 23%

A model extension for ASReview. ASReview multilingual feature extractor is a feature extractor based on distiluse-base-multilingual-cased-v1.