visdat
visdat: Visualising Whole Data Frames - Published in JOSS (2017)
FSharpGephiStreamer
FSharpGephiStreamer: An idiomatic bridge between F# and network visualization - Published in JOSS (2019)
Arabica
Arabica: A Python package for exploratory analysis of text data - Published in JOSS (2024)
SmartEDA
SmartEDA: An R Package for Automated Exploratory Data Analysis - Published in JOSS (2019)
Powering single-cell analyses in the browser with WebAssembly
Powering single-cell analyses in the browser with WebAssembly - Published in JOSS (2023)
Plotrr
Plotrr: Functions for making visual exploratory data analysis with nested data easier. - Published in JOSS (2017)
llnl-thicket
GenomicSuperSignature
Interpretation of RNAseq experiments through robust, efficient comparison to public databases
scattertext
Beautiful visualizations of how language differs among document types.
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
skimpy
skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.
edarf
edarf: Exploratory Data Analysis using Random Forests - Published in JOSS (2016)
geothermal_esda
This repository contains exploratory spatial data analysis (ESDA) functions and scripts. These functions are designed for geothermal spatial datasets, and are applicable to other spatial datasets.
https://github.com/bayer-group/bic-subscreen
Interactive visualization to explore subgroup effects
https://github.com/awslabs/amazon-accessible-rl-sdk
A2RL is a Python library for offline reinforcement learning
https://github.com/desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
masters-thesis
Monitoring parallel file system usage in a high-performance computer cluster
https://github.com/dadananjesha/eda-case-study
EDA Case Study is an exploratory data analysis project designed to uncover insights from a dataset through thorough visualization and statistical analysis.
tutorials-early
Tutorials to learn reading, cleaning and validating case data, and converting line list data to incidence for visualizing epidemic curves.
https://github.com/nagapv/edexplore
A simple widget for interactive EDA / QA. Works on top of Pandas [in Jupyter Notebook] using IPyWidgets with a sprinkle of Regex.
bicausality
A framework to infer causality on binary data using techniques in frequent pattern mining and estimation statistics. Given a set of individual vectors S={x} where x(i) is a realization value of binary variable i, the framework infers empirical causal relations of binary variables i,j from S in a form of causal graph G=(V,E).
movis
MOVIS: A Multi-Omics Software Solution for Multi-modal Time-Series Clustering, Embedding, and Visualizing Tasks, by Aleksandar Anžel, Dominik Heider, and Georges Hattab
https://github.com/atharvapathak/telecom_churn_case_study
Build a classification model for reducing the churn rate for a telecom company