Scientific Software
Updated 6 months ago

Feature-engine — Peer-reviewed • Rank 24.3 • Science 93%

Feature-engine: A Python package for feature engineering for machine learning - Published in JOSS (2021)

Mathematics Economics
Scientific Software · Peer-reviewed
Updated 6 months ago

geometricus • Rank 9.1 • Science 67%

A structure-based, alignment-free embedding approach for proteins. Can be used as input to machine learning algorithms.

Updated 6 months ago

tsfel • Rank 19.2 • Science 39%

An intuitive library to extract features from time series.

Updated 6 months ago

zoish • Rank 8.1 • Science 44%

Zoish is a Python package that streamlines machine learning by leveraging SHAP values for feature selection and interpretability, making model development more efficient and user-friendly

Updated 6 months ago

https://github.com/apache/hamilton • Rank 12.1 • Science 36%

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Updated 6 months ago

ballet • Rank 10.6 • Science 33%

☀️🦶 A lightweight framework for collaborative, open-source feature engineering

Updated 6 months ago

upgini • Rank 17.5 • Science 26%

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

Updated 6 months ago

https://github.com/gperdrizet/ensembleset • Rank 5.0 • Science 26%

Ensemble dataset generator for tabular data prediction and modeling projects.

Updated 6 months ago

https://github.com/functime-org/functime • Rank 16.5 • Science 13%

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.

Updated 6 months ago

https://github.com/ajayarunachalam/msda • Rank 4.9 • Science 10%

Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector

Updated 6 months ago

https://github.com/chris-santiago/tsfeast • Science 10%

A collection of Scikit-Learn compatible time series transformers and tools.

Updated 6 months ago

https://github.com/andrei-vataselu/data-science-snippets • Science 26%

🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.

Updated 5 months ago

https://github.com/amr-yasser226/machine-learning-for-network-intrusion-detection • Science 26%

A complete pipeline for network intrusion detection comparing label encoding and one‑hot encoding, with SMOTE resampling, feature selection, and ensemble modeling using scikit‑learn and XGBoost, also this was phase one of our University's "CSAI 253- Machine Learning" course.

Updated 5 months ago

https://github.com/atharvapathak/sales_forecasting_project • Science 13%

Forecasted product sales using time series models such as Holt-Winters, SARIMA and causal methods, e.g. Regression. Evaluated performance of models using forecasting metrics such as, MAE, RMSE, MAPE and concluded that Linear Regression model produced the best MAPE in comparison to other models

Updated 5 months ago

https://github.com/amr-yasser226/intrusion-detection-kaggle • Science 26%

End-to-end pipeline for multi-class cyber-attack detection using per-flow network features: data profiling, deduplication, skew-correction, outlier treatment, feature engineering, imbalance handling, and tree-based modeling (XGBoost, LightGBM, CatBoost, stacking), with a final Kaggle submission scoring 0.9146 public / 0.9163 private.

Updated 6 months ago

fedora-framework • Science 67%

The Fedora Framework is an evolutionary feature engineering framework designed to optimize features for machine learning tasks

Updated 6 months ago

asaca-automatic-speech-analysis-for-cognitive-assessment • Science 44%

The automatic system that can extract PRAAT-like speech features from raw speech wav files, and also can get low WER (<10) high quality transcriptions at the same time.

Updated 6 months ago

https://github.com/desbordante/desbordante-core • Science 49%

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Updated 6 months ago

behavioralproject • Science 44%

Classify TD vs ASD according to SRS behavioral report severity score. ABIDE II data set is utilized for training and testing. Freesurfer v6 is utilized for sMRI volumes preprocessing and features extraction.

Updated 6 months ago

breast_cancer_diagnosis_ml • Science 57%

This project demonstrates the use of machine learning models to predict breast cancer diagnoses. The repository covers the entire workflow from data preprocessing and feature engineering to model training and evaluation, providing insights into diagnosis prediction with various ML models.

Updated 6 months ago

context-engineering • Science 26%

Explore cutting-edge research in context engineering with insights from top institutions. Enhance AI performance with practical techniques. 🌟📂

Updated 6 months ago

https://github.com/data-prompt-query/dpq • Science 26%

dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy