Projects

Updated 6 months ago

pyhelpers • Rank 10.9 • Science 67%

PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks

data-manipulation data-preprocessing py-utils python python-utilities python-utility python-utils utilities

Updated 6 months ago

atlantic • Rank 9.3 • Science 67%

Atlantic: Automated Data Preprocessing Framework for Machine Learning

automation automl automl-pipeline data-preprocessing data-science feature-selection label-encoder machine-learning onehot-encoder predictive-maintenance predictive-modeling preprocessing-pipeline python scikit-learn

Updated 6 months ago

https://github.com/habedi/feature-factory • Rank 9.9 • Science 26%

A feature engineering library for Rust 🦀 with Python bindings 🐍

data-preprocessing data-science feature-engineering feature-selection machine-learning python python-library rust rust-library

Updated 6 months ago

https://github.com/buchananja/dpyp • Rank 4.4 • Science 26%

A convenience tool for small-scale data pipelines in Python

data data-analysis data-cleaning data-engineering data-pipeline data-preprocessing data-processing data-science pandas pipeline

Updated 6 months ago

https://github.com/alexandersouthan/pypreprocessing • Rank 7.8 • Science 10%

Especially useful for preprocessing of datasets like Raman spectra, infrared spectra, UV/Vis spectra, but also HPLC data and many other types of data. pyPreprocessing includes baseline correction, smoothing, filtering, normalization and transformation.

baseline-correction data-analysis data-preprocessing python smoothing spectroscopy

Updated 6 months ago

house-price-prediction-regression-modeling • Science 44%

House price prediction involves analyzing data points to estimate the value of a residential property using statistical techniques such as regression analysis and machine learning algorithms. It is useful for both buyers and sellers in making informed decisions based on market trends and property values.

data-preprocessing linear-regression model-training predict-house-prices

Updated 6 months ago

https://github.com/desbordante/desbordante-core • Science 49%

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Updated 5 months ago

https://github.com/amr-yasser226/intrusion-detection-kaggle • Science 26%

End-to-end pipeline for multi-class cyber-attack detection using per-flow network features: data profiling, deduplication, skew-correction, outlier treatment, feature engineering, imbalance handling, and tree-based modeling (XGBoost, LightGBM, CatBoost, stacking), with a final Kaggle submission scoring 0.9146 public / 0.9163 private.

catboost cyber-security data-preprocessing ensemble-learning feature-engineering imbalanced-data jupyter-notebooks kaggle lightgbm machine-learning outlier-detection random-forest xgboost

Updated 6 months ago

datalark • Science 54%

Like the mudlark finding treasures on the foreshore, the datalark seeks treasures hidden within messy data!

data-cleaning data-preparation data-preprocessing data-transformation rstats-package

Updated 6 months ago

swansf-datapreprocessing-sampling-notebooks • Science 57%

These notebooks provide a comprehensive workflow, from start to finish, for processing and analyzing the SWAN-SF dataset. They include detailed steps for reading the dataset files, performing full preprocessing, and executing classification.

data-preprocessing deep-learning gru imputation lstm machine-learning multivariate-timeseries normalization pandas python sampling smote solar-flare-prediction time-series-analysis time-series-classification timegan

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

pyhelpers • Rank 10.9 • Science 67%

atlantic • Rank 9.3 • Science 67%

https://github.com/habedi/feature-factory • Rank 9.9 • Science 26%

https://github.com/buchananja/dpyp • Rank 4.4 • Science 26%

https://github.com/alexandersouthan/pypreprocessing • Rank 7.8 • Science 10%

house-price-prediction-regression-modeling • Science 44%

https://github.com/desbordante/desbordante-core • Science 49%

https://github.com/amr-yasser226/intrusion-detection-kaggle • Science 26%

datalark • Science 54%

swansf-datapreprocessing-sampling-notebooks • Science 57%