Updated 6 months ago

arkhe • Rank 3.0 • Science 77%

Tools for cleaning rectangular data - :exclamation: This is a read-only mirror from https://codeberg.org/tesselle/arkhe

Updated 6 months ago

authoritative • Rank 8.5 • Science 44%

Clean Author Names from R Packages DESCRIPTION Files

Updated 6 months ago

validate • Rank 19.5 • Science 23%

Professional data validation for the R environment

Updated 6 months ago

synr • Rank 6.4 • Science 23%

An R package for handling synesthesia consistency test data. Explore, validate and summarize data.

Updated 5 months ago

errorlocate • Rank 15.0 • Science 13%

Find and replace erroneous fields in data using validation rules

Updated 6 months ago

deductive • Rank 12.6 • Science 13%

Methods for deductive data correction and imputation

Updated 6 months ago

rotating-photo-tree • Rank 0.0 • Science 18%

An example lesson repository for use in lesson template screencasts

Updated 4 months ago

https://github.com/erictleung/2017-new-coder-survey • Rank 1.1 • Science 13%

:beginner: Code to help clean and format the 2017 New Coder Survey by freeCodeCamp

Updated 4 months ago

https://github.com/erictleung/2018-new-coder-survey • Rank 1.1 • Science 13%

:beginner: Code to wrangle data from the 2018 New Coder Survey by freeCodeCamp

Updated 5 months ago

https://github.com/desbordante/desbordante-core • Science 49%

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Updated 6 months ago

datalark • Science 54%

Like the mudlark finding treasures on the foreshore, the datalark seeks treasures hidden within messy data!

Updated 5 months ago

https://github.com/cdcgov/clean-genes • Science 26%

A rust crate that automatically cleans up a gene alignment by trimming to ORF and identifying and/or removing problematic sequences.

Updated 6 months ago

fastqrepair • Science 57%

A pipeline that can be used to recover corrupted FASTQ.gz files, drop or fix uncompliant reads, remove unpaired reads, and settles reads that became disordered

Updated 6 months ago

tutorials-early • Science 44%

Tutorials to learn reading, cleaning and validating case data, and converting line list data to incidence for visualizing epidemic curves.

Updated 6 months ago

pydvl • Science 36%

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation

Updated 6 months ago

data2neo • Science 54%

Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.

Updated 6 months ago

cleanepi • Science 67%

R package to clean and standardize epidemiological data

Updated 6 months ago

equitystack • Science 49%

A structured repository of Python scripts and Jupyter notebooks for development sector data workflows — including public health, gender equity, women's economic empowerment (WEE), education, and MEL (Monitoring, Evaluation, and Learning). Includes plug-and-play templates, sample data, test coverage, and Colab-ready execution.

Updated 6 months ago

mrclean-greedy • Science 52%

A greedy algorithm for cleaning a data file.

Updated 5 months ago

https://github.com/csu-agricultural-water-quality-program/als-data-cleaning-tool • Science 26%

A coding tool developed in R to take water analysis results exported from the ALS WEBTRIEVE™ data portal. Exported data are cleaned, merged, and exported into archiving (e.g., CSV) or visual (e.g., HTML) formats.

Updated 6 months ago

cleansumstats • Science 54%

Convert GWAS sumstat files into a common format with a common reference for positions, rsids and effect alleles.

Updated 6 months ago

mierda • Science 57%

The Multidimensional Insufficient Effort Responding Detection Approach (mIERda) for Psychometric and Survey Data