Updated 6 months ago

pyhelpers • Rank 10.9 • Science 67%

PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks

Updated 6 months ago

https://github.com/alexandersouthan/pypreprocessing • Rank 7.8 • Science 10%

Especially useful for preprocessing of datasets like Raman spectra, infrared spectra, UV/Vis spectra, but also HPLC data and many other types of data. pyPreprocessing includes baseline correction, smoothing, filtering, normalization and transformation.

Updated 6 months ago

house-price-prediction-regression-modeling • Science 44%

House price prediction involves analyzing data points to estimate the value of a residential property using statistical techniques such as regression analysis and machine learning algorithms. It is useful for both buyers and sellers in making informed decisions based on market trends and property values.

Updated 6 months ago

https://github.com/desbordante/desbordante-core • Science 49%

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Updated 5 months ago

https://github.com/amr-yasser226/intrusion-detection-kaggle • Science 26%

End-to-end pipeline for multi-class cyber-attack detection using per-flow network features: data profiling, deduplication, skew-correction, outlier treatment, feature engineering, imbalance handling, and tree-based modeling (XGBoost, LightGBM, CatBoost, stacking), with a final Kaggle submission scoring 0.9146 public / 0.9163 private.

Updated 6 months ago

datalark • Science 54%

Like the mudlark finding treasures on the foreshore, the datalark seeks treasures hidden within messy data!

Updated 6 months ago

swansf-datapreprocessing-sampling-notebooks • Science 57%

These notebooks provide a comprehensive workflow, from start to finish, for processing and analyzing the SWAN-SF dataset. They include detailed steps for reading the dataset files, performing full preprocessing, and executing classification.