simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
token-wars-dataviz
A data visualisation in `matplotlib` of the number of parameters in major LLMs as well as the number of tokens of text they were trained on.
klmbr
klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs
transform-emr
This model is a decoder transformer based model aiming to model events predictions from EMR records as a sequential text generation problem. This project is a part of my thesis research.
double-jeopardy-in-llms
Code for "Double Jeopardy and Climate Impact in the Use of Large Language Models." Includes scripts for analyzing socio-economic disparities, tokenization inefficiencies, and LLM utility using FLORES-200, Ethnologue, WDI, and GPT-4 APIs.