Updated 6 months ago
most-different-text-selection
Use embedding data from LLMs to determine the most different text in a given corpus.
Updated 5 months ago
https://github.com/alon-albalak/data-selection-survey
A Survey on Data Selection for Language Models