arkhe
Tools for cleaning rectangular data - :exclamation: This is a read-only mirror from https://codeberg.org/tesselle/arkhe
pytrack
a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction
https://github.com/johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://github.com/buchananja/dpyp
A convenience tool for small-scale data pipelines in Python
synr
An R package for handling synesthesia consistency test data. Explore, validate and summarize data.
rotating-photo-tree
An example lesson repository for use in lesson template screencasts
https://github.com/baimamboukar/python_data_cleaning
Data cleaning automation for emails in csv and excel files
https://github.com/erictleung/2017-new-coder-survey
:beginner: Code to help clean and format the 2017 New Coder Survey by freeCodeCamp
https://github.com/erictleung/2018-new-coder-survey
:beginner: Code to wrangle data from the 2018 New Coder Survey by freeCodeCamp
https://github.com/desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
datalark
Like the mudlark finding treasures on the foreshore, the datalark seeks treasures hidden within messy data!
https://github.com/cdcgov/clean-genes
A rust crate that automatically cleans up a gene alignment by trimming to ORF and identifying and/or removing problematic sequences.
fastqrepair
A pipeline that can be used to recover corrupted FASTQ.gz files, drop or fix uncompliant reads, remove unpaired reads, and settles reads that became disordered
tutorials-early
Tutorials to learn reading, cleaning and validating case data, and converting line list data to incidence for visualizing epidemic curves.
pydvl
pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
data2neo
Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.
equitystack
A structured repository of Python scripts and Jupyter notebooks for development sector data workflows — including public health, gender equity, women's economic empowerment (WEE), education, and MEL (Monitoring, Evaluation, and Learning). Includes plug-and-play templates, sample data, test coverage, and Colab-ready execution.
https://github.com/OpenDCAI/DataFlow
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
https://github.com/csu-agricultural-water-quality-program/als-data-cleaning-tool
A coding tool developed in R to take water analysis results exported from the ALS WEBTRIEVE™ data portal. Exported data are cleaned, merged, and exported into archiving (e.g., CSV) or visual (e.g., HTML) formats.
cleansumstats
Convert GWAS sumstat files into a common format with a common reference for positions, rsids and effect alleles.
mierda
The Multidimensional Insufficient Effort Responding Detection Approach (mIERda) for Psychometric and Survey Data