retomaton
PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)
eo
EOLANG, an Experimental Pure Object-Oriented Programming Language Based on 𝜑-Calculus
@digitallinguistics/dlx2html
A JavaScript library for converting linguistic data to HTML
mel-cepstral-distance
A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment".
kipper
The Kipper programming language for comprehensive and all-round type safety on the web 🦊✨ Made at HTL Leonding & JKU Linz
Eclipse Golo
Eclipse Golo - Published in JOSS (2016)
knn-transformers
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
markovjunior
Probabilistic language based on pattern matching and constraint propagation, 153 examples
rascal
The implementation of the Rascal meta-programming language (including interpreter, type checker, parser generator, compiler and JVM based run-time system)
@digitallinguistics/scription2dlx
A JavaScript library that converts scription text files to the Data Format for Digital Linguistics
slm-code-generation
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)
conversationalign
An R package for analyzing linguistic alignment between partners in conversation transcripts
are-we-fast-yet
Are We Fast Yet? Comparing Language Implementations with Objects, Closures, and Arrays
dissertation
My Ph.D. dissertation in linguistics at the University of California, Santa Barbara
ucto
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --
https://github.com/clarkzjw/one-two-three...infinity
:straight_ruler: Calculating the sum from one to a billion in different programming languages, inspired by https://github.com/leachim6/hello-world
https://github.com/cldf/cldf
CLDF: Cross-Linguistic Data Formats - the specification
comp396
Analyzing Language Bias Between French and English in Conventional Multilingual Sentiment Analysis Models
instate
instate: predict the state of residence from last name using the indian electoral rolls
https://github.com/apn-pucky/metamorph
Metamorph generates new text through repeated translation
fern
A human-readable and modifiable data-expression language with minimal clutter.
@effekt-lang/effekt
A language with lexical effect handlers and lightweight effect polymorphism
mini-lang
The example mini programming language written for the "Write a language in a week" series.
https://github.com/airscripts/analscript
A modern approach for writing anally fast stuff.
double-jeopardy-in-llms
Code for "Double Jeopardy and Climate Impact in the Use of Large Language Models." Includes scripts for analyzing socio-economic disparities, tokenization inefficiencies, and LLM utility using FLORES-200, Ethnologue, WDI, and GPT-4 APIs.