pyhelpers
PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks
https://github.com/habedi/feature-factory
A feature engineering library for Rust 🦀 with Python bindings 🐍
https://github.com/buchananja/dpyp
A convenience tool for small-scale data pipelines in Python
https://github.com/alexandersouthan/pypreprocessing
Especially useful for preprocessing of datasets like Raman spectra, infrared spectra, UV/Vis spectra, but also HPLC data and many other types of data. pyPreprocessing includes baseline correction, smoothing, filtering, normalization and transformation.
house-price-prediction-regression-modeling
House price prediction involves analyzing data points to estimate the value of a residential property using statistical techniques such as regression analysis and machine learning algorithms. It is useful for both buyers and sellers in making informed decisions based on market trends and property values.
https://github.com/desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
https://github.com/amr-yasser226/intrusion-detection-kaggle
End-to-end pipeline for multi-class cyber-attack detection using per-flow network features: data profiling, deduplication, skew-correction, outlier treatment, feature engineering, imbalance handling, and tree-based modeling (XGBoost, LightGBM, CatBoost, stacking), with a final Kaggle submission scoring 0.9146 public / 0.9163 private.
datalark
Like the mudlark finding treasures on the foreshore, the datalark seeks treasures hidden within messy data!
swansf-datapreprocessing-sampling-notebooks
These notebooks provide a comprehensive workflow, from start to finish, for processing and analyzing the SWAN-SF dataset. They include detailed steps for reading the dataset files, performing full preprocessing, and executing classification.