ecosyste.ms
All services
Data
Packages
Repositories
Advisories
Tools
Dependency Parser
Dependency Resolver
SBOM Parser
License Parser
Digest
Archives
Diff
Summary
Indexes
Timeline
Commits
Issues
Sponsors
Docker
Open Collective
Dependabot
Applications
Funds
Dashboards
Experiments
OST
Papers
Awesome
Ruby
Open Source Science
Fund name
Search
Fields
Support
GitHub
API
Projects with CITATION.cff
Export CSV
Sort
Recently synced
Ranking
Science Score
Updated 4 months ago
curator
• Science 44%
Scalable data pre processing and curation toolkit for LLMs
data
data-curation
data-prep
data-preparation
data-processing
data-processing-pipelines
data-quality
datacuration
datarecipes
deduplication
fast-data-processing
fine-tuning
large-language-models
large-scale-data-processing
llm
llm-data-quality
llmapps
python
semantic-deduplication