Scientific Software
Updated 6 months ago

MLxtend — Peer-reviewed • Rank 26.9 • Science 95%

MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack - Published in JOSS (2018)

Scientific Software
Updated 6 months ago

Learning from Crowds with Crowd-Kit — Peer-reviewed • Rank 17.3 • Science 100%

Learning from Crowds with Crowd-Kit - Published in JOSS (2024)

Artificial Intelligence and Machine Learning
Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

PyClustering — Peer-reviewed • Rank 20.2 • Science 93%

PyClustering: Data Mining Library - Published in JOSS (2019)

Scientific Software
Updated 6 months ago

NiaARM — Peer-reviewed • Rank 11.4 • Science 98%

NiaARM: A minimalistic framework for Numerical Association Rule Mining - Published in JOSS (2022)

Scientific Software
Updated 6 months ago

WordTokenizers.jl — Peer-reviewed • Rank 13.2 • Science 95%

WordTokenizers.jl: Basic tools for tokenizing natural language in Julia - Published in JOSS (2020)

Engineering (40%) Earth and Environmental Sciences (40%)
Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

scikit-hubness — Peer-reviewed • Rank 8.4 • Science 93%

scikit-hubness: Hubness Reduction and Approximate Neighbor Search - Published in JOSS (2020)

Scientific Software
Updated 6 months ago

latentcor — Peer-reviewed • Rank 4.4 • Science 95%

latentcor: An R Package for estimating latent correlations from mixed data types - Published in JOSS (2021)

Updated 6 months ago

pypots • Rank 21.6 • Science 77%

A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

Updated 6 months ago

lexicalrichness • Rank 15.5 • Science 67%

:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).

Updated 6 months ago

lightgbm • Rank 33.1 • Science 46%

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Updated 6 months ago

https://github.com/alan-turing-institute/clevercsv • Rank 21.7 • Science 57%

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

Updated 6 months ago

ast_monitor • Rank 10.1 • Science 67%

AST-Monitor is a wearable Raspberry Pi computer for cyclists

Scientific Software
Updated 6 months ago

PySPOD — Peer-reviewed • Rank 11.3 • Science 59%

PySPOD: A Python package for Spectral Proper Orthogonal Decomposition (SPOD) - Published in JOSS (2021)

Economics (40%)
Scientific Software · Peer-reviewed
Updated 6 months ago

sport-activities-features • Rank 12.6 • Science 57%

A minimalistic toolbox for extracting features from sports activity files written in Python

Updated 6 months ago

scibasic • Rank 17.4 • Science 49%

sciBASIC# is a kind of dialect language which is derive from the native VB.NET language, and written for the data scientist.

Updated 6 months ago

https://github.com/chaoss/grimoirelab-perceval • Rank 18.4 • Science 46%

Send Sir Perceval on a quest to retrieve and gather data from software repositories.

Updated 6 months ago

welly • Rank 9.1 • Science 54%

Welly helps with well loading, wireline logs, log quality, data science

Updated 6 months ago

tested • Rank 2.1 • Science 59%

The best way to test "it" is to watch someone else try to break "it".

Updated 6 months ago

svgdigitizer • Rank 10.5 • Science 49%

(x,y) Data Points from SVG files

Scientific Software
Updated 6 months ago

CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database — Peer-reviewed • Rank 8.8 • Science 49%

CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database - Published in JOSS (2016)

Biology (34%)
Scientific Software · Peer-reviewed
Updated 6 months ago

getDEE2 • Rank 11.5 • Science 26%

Programmatic access to the DEE2 RNA expression dataset.

Updated 4 months ago

https://github.com/erictleung/ml-portfolio • Rank 1.4 • Science 10%

:book: Experiment with various machine learning algorithms on various data sets from the University of California, Irvine (UCI) Machine Learning Repository (http://archive.ics.uci.edu/ml/index.html)

Updated 6 months ago

pygrinder • Science 67%

PyGrinder: a Python toolkit for grinding data beans into the incomplete for real-world data simulation by introducing missing values with different missingness patterns, including MCAR (complete at random), MAR (at random), MNAR (not at random), sub sequence missing, and block missing

Updated 6 months ago

arctic3d • Science 67%

Automatic Retrieval and ClusTering of Interfaces in Complexes from 3D structural information

Updated 6 months ago

https://github.com/cvjena/libmaxdiv • Science 10%

Implementation of the Maximally Divergent Intervals algorithm for Anomaly Detection in multivariate spatio-temporal time-series.

Updated 6 months ago

awesome-arm-in-smart-agriculture • Science 67%

A collection of literature on the use of association rule mining methods in smart agriculture

Updated 4 months ago

https://github.com/emptymalei/mini-lab • Science 13%

Some code snippets used to explain stuff to myself in my personal data science wiki

Updated 6 months ago

https://github.com/desbordante/desbordante-core • Science 49%

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Updated 6 months ago

tap-clinicaltrials • Science 44%

Singer tap for ClinicalTrials.gov study records data.

Updated 6 months ago

https://github.com/grrvlr/tsmd • Science 26%

The TSMD project brings together Motif Discovery methods for Time Series, aiming to compare their performance through well-defined research questions and to simplify their practical use. It provides both guidelines for selecting the most suitable methods based on the data, and accessible implementations of the most relevant approaches.

Updated 6 months ago

tsb-uad • Science 26%

An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection

Updated 6 months ago

core-periphery-hypergraphs • Science 57%

[KDD 2022] Official Code Release for "Core-periphery Models for Hypergraphs"

Updated 6 months ago

https://github.com/cn-tu/py-outlier-detection-stream-data • Science 13%

Outlier Detection in Stream Data with Python. CN contact: Félix Iglesias

Updated 6 months ago

https://github.com/avallecam/cdcper • Science 13%

Miscelanea de funciones customizadas a tareas de análisis en CDC Perú

Updated 6 months ago

data-mining • Science 18%

Data mining notebooks and scripts

Updated 6 months ago

https://github.com/equinor/timeseriesanalysis • Science 26%

Library that combines control engineering, dynamic simulation and machine learning on time-series. Developed to describe industrial processes and -automation. Lightweight, robust and fast for use in advanced analytics. Built on .NET to run anywhere.

Updated 6 months ago

https://github.com/amr-yasser226/data-mining-and-information-retrieval • Science 26%

Revision notes and MCQs for DSAI 201 – Data Mining and Information Retrieval. Includes lecture summaries, algorithm overviews, and practice questions to support course preparation and review.

Updated 6 months ago

nuggets • Science 57%

R package for searching of patterns in subspaces described with elementary conjunctions

Updated 6 months ago

https://github.com/robelgium/msnoise • Science 59%

A Python Package for Monitoring Seismic Velocity Changes using Ambient Seismic Noise | http://www.msnoise.org

Updated 6 months ago

tsdb • Science 67%

a Python toolbox loads 172 public time series datasets for machine/deep learning with a single line of code. Datasets from multiple domains including healthcare, financial, power, traffic, weather, and etc.