Projects | Open Source Science

Scientific Software

Updated 10 months ago

gym-electric-motor (GEM) — Peer-reviewed • Rank 14.2 • Science 98%

gym-electric-motor (GEM): A Python toolbox for the simulation of electric drive systems - Published in JOSS (2021)

benchmark converters electric-drive electrical-engineering gym-environment machinelearning motor-models openai openai-gym openai-gym-environments pmsm reinforcement-learning

Mathematics

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

DeepBench — Peer-reviewed • Rank 8.6 • Science 95%

DeepBench: A simulation package for physical benchmarking data - Published in JOSS (2025)

benchmark simulation

Mathematics

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

ctbench - compile-time benchmarking and analysis — Peer-reviewed • Rank 4.9 • Science 95%

ctbench - compile-time benchmarking and analysis - Published in JOSS (2023)

benchmark clang compilation data-analysis data-visualization gcc metaprogramming

Scientific Software · Peer-reviewed

Updated 10 months ago

fastcrypto • Rank 22.3 • Science 67%

Common cryptographic library used in software at Mysten Labs.

benchmark blockchain bls crypto cryptography ed25519 zkp

Updated 10 months ago

yaib • Rank 10.1 • Science 77%

🧪Yet Another ICU Benchmark: a holistic framework for the standardization of clinical prediction model experiments. Provide custom datasets, cohorts, prediction tasks, endpoints, preprocessing, and models. Paper: https://arxiv.org/abs/2306.05109

amsterdamumcdb benchmark clinical-data clinical-ml deep-learning ehr eicu-crd framework hirid-dataset icu machine-learning mimic-iii mimic-iv patient-monitoring time-series

Updated 10 months ago

mmaction2 • Rank 21.3 • Science 64%

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

action-recognition ava benchmark deep-learning i3d non-local openmmlab posec3d pytorch slowfast spatial-temporal-action-detection temporal-action-localization tsm tsn uniformerv2 video-classification video-understanding x3d

Updated 10 months ago

bigcodebench • Rank 15.3 • Science 64%

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

agent agents benchmark chatgpt claude-3 code-generation deepseek function-calling gemini gpt-4 instruction-following large-language-models llm program-synthesis tool-use

Updated 10 months ago

asreview-insights • Rank 12.3 • Science 67%

Tools such as plots and metrics to analyze (simulated) reviews for ASReview LAB

active-learning asreview benchmark discovery machine-learning plot statistics utrecht-university visualization

Updated 10 months ago

tiny_qa_benchmark_pp • Rank 2.1 • Science 77%

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

benchmark dataset evaluation huggingface-datasets litellm llm llm-testing llmops qa-dataset smoke-test synthetic-data tinybenchmarks

Updated 10 months ago

proteinworkshop • Rank 11.6 • Science 67%

Benchmarking framework for protein representation learning. Includes a large number of pre-training and downstream task datasets, models and training/task utilities. (ICLR 2024)

benchmark dataset deep-learning lightning pretraining protein protein-structure pytorch

Updated 10 months ago

SciMLBenchmarks • Rank 11.5 • Science 67%

Scientific machine learning (SciML) benchmarks, AI for science, and (differential) equation solvers. Covers Julia, Python (PyTorch, Jax), MATLAB, R

ai ai-for-science benchmark dae differential-equations differentialequations jax julia matlab nerual-differential-equations neural-ode ode partial-differential-equations pde python pytorch scientific-machine-learning sciml sde

Updated 10 months ago

mmpose • Rank 24.4 • Science 54%

OpenMMLab Pose Estimation Toolbox and Benchmark.

animal-pose-estimation benchmark cpm crowdpose face-keypoint freihand hand-pose-estimation higher-hrnet hourglass hrnet human-pose mmpose mpii mspn ochuman pose-estimation pytorch rsn rtmpose udp

Updated 10 months ago

seb • Rank 11.3 • Science 67%

A Scandinavian Benchmark for sentence embeddings

benchmark low-resource-nlp natural-language-processing nlp scandinavian

Updated 10 months ago

trajectopy • Rank 11.1 • Science 67%

Trajectopy - Trajectory Evaluation in Python

alignment benchmark comparison evaluation mapping metrics robotics trajectory trajectory-analysis

Updated 10 months ago

babelstream • Rank 10.7 • Science 67%

STREAM, for lots of devices written in many programming models

benchmark cuda gpgpu gpu hpc kokkos memory-bandwidth openacc opencl openmp parallel-processing raja sycl

Updated 10 months ago

benchexec • Rank 18.5 • Science 59%

BenchExec: A Framework for Reliable Benchmarking and Resource Measurement

benchmark benchmark-framework benchmarking cgroups linux python resource-measurement

Updated 10 months ago

amlb • Rank 13.2 • Science 64%

OpenML AutoML Benchmarking Framework

automl benchmark machine-learning

Updated 10 months ago

tasksource • Rank 11.4 • Science 64%

Datasets collection and preprocessings framework for NLP extreme multitask learning

benchmark bigbench crossfit curated-datasets dataset-collection discriminative extreme-mtl extreme-multi-task-learning glue huggingface instruction-tuning meta-learning multi-task-learning multi-task-learning-scaling natural-language-inference nlp preprocessings reward-modeling scaling text-classification

Updated 10 months ago

fluidx3d • Rank 8.4 • Science 67%

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

benchmark cfd computational-fluid-dynamics fluid-dynamics fluid-simulation fluid-solver gpgpu gpu gpu-computing high-performance-computing hpc interactive-visualization lattice-boltzmann lbm opencl physics raytracing scientific-computing scientific-visualization simulation

Updated 10 months ago

BenchmarkPlots • Rank 20.1 • Science 54%

A benchmarking framework for the Julia language

benchmark julia julia-language

Updated 10 months ago

compression_benchmark • Rank 4.8 • Science 67%

Benchmarking FASTQ compression with 'mature' compression algorithms

benchmark bioinformatics compression fastq

Updated 10 months ago

https://github.com/cheind/py-motmetrics • Rank 22.6 • Science 49%

:bar_chart: Benchmark multiple object trackers (MOT) in Python

benchmark clear-mot-metrics metrics mot mot-challenge object-detection object-tracking tracker

Updated 10 months ago

lrebench • Rank 4.2 • Science 67%

[EMNLP 2022 Findings] Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study

benchmark chinese data-augmentation data-augumentation dataset efficient emnlp few-shot information-extraction kg knowledge-graph knowprompt long-tail low-resource lrebench prompt re relation-extraction self-training

Updated 10 months ago

rl4co • Rank 16.6 • Science 54%

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

attention attention-model benchmark combinatorial-optimization cvrp electronic-design-automation hydra neural-combinatorial-optimization operations-research pytorch-lightning reinforcement-learning scheduling tensordict torchrl tsp vehicle-routing-problem

Updated 10 months ago

benchmarks-acoustic-propagation • Rank 3.0 • Science 67%

Coupled model development for acoustic propagation through multilayer systems for particle-velocity sensors

acoustics benchmark romsoc

Updated 10 months ago

pytorch-benchmark • Rank 11.9 • Science 57%

Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption

benchmark deep-learning flops gpu jetson python pytorch timing-analysis

Updated 10 months ago

hyperfine • Rank 14.9 • Science 54%

A command-line benchmarking tool

benchmark cli command-line rust terminal tool

Updated 10 months ago

fastrag • Rank 14.7 • Science 54%

Efficient Retrieval Augmentation and Generation Framework

benchmark colbert diffusion generative-ai information-retrieval knowledge-graph llm multi-modal nlp question-answering semantic-search sentence-transformers summarization transformers

Updated 10 months ago

eval-suite • Rank 9.2 • Science 59%

[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.

benchmark ceval chatgpt dataset evaluation gpt-3 gpt-4 hallucination hallucination-detection hallucination-evaluation hallucinations huggingface huggingface-transformers large-language-models llm openai openai-api qwen

Updated 10 months ago

logpai • Rank 18.1 • Science 49%

A machine learning toolkit for log parsing [ICSE'19, DSN'16]

anomaly-detection benchmark log log-analysis log-mining log-parser log-parsing

Updated 10 months ago

py-torchbenchmark • Rank 12.5 • Science 54%

TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.

benchmark pytorch

Updated 10 months ago

@stdlib/bench-harness • Rank 11.1 • Science 54%

Benchmark harness.

bench benchmark harness javascript lib library measure node node-js nodejs perf performance standard stdlib tap

Updated 10 months ago

matbench • Rank 14.5 • Science 49%

Matbench: Benchmarks for materials science property prediction

benchmark chemistry condensed-matter data-science machine-learning machine-learning-algorithms materials-science physics

Updated 10 months ago

benchmarl • Rank 8.2 • Science 54%

BenchMARL is a library for benchmarking Multi-Agent Reinforcement Learning (MARL). BenchMARL allows to quickly compare different MARL algorithms, tasks, and models while being systematically grounded in its two core tenets: reproducibility and standardization.

benchmark machine-learning marl multi-agent multi-agent-reinforcement-learning pytorch reinforcement-learning rl robotics torch

Updated 10 months ago

h5bench • Rank 7.9 • Science 54%

A benchmark suite for measuring HDF5 performance.

benchmark hdf5 hpc

Updated 10 months ago

evo • Rank 22.0 • Science 36%

Python package for the evaluation of odometry and SLAM

benchmark euroc evaluation kitti mapping metrics odometry robotics ros ros2 slam trajectory trajectory-analysis trajectory-evaluation tum

Updated 10 months ago

qcd • Rank 4.0 • Science 54%

Quantum Circuit Designer: A gymnasium-based set of environments for benchmarking reinforcement learning for quantum circuit design.

benchmark circuit-design gymnasium quantum-computing reinforcement-learning

Updated 10 months ago

muld • Rank 3.8 • Science 54%

The Multitask Long Document Benchmark

benchmark long-texts nlp

Updated 10 months ago

xfinder • Rank 6.6 • Science 51%

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

benchmark cc-by-nc-nd-4 chatglm dataset evaluation gpt judge-model key-answer-extraction large-language-models llm llm-as-a-judge llm-as-evaluator lm-evaluation open-compass phi qwen regex reliability reliable-evaluation xfinder

Updated 10 months ago

aptv2 • Rank 3.3 • Science 54%

The official repo for the extension of [NeurIPS'22] "APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking": https://github.com/pandorgan/APT-36K

animal-pose-estimation benchmark dataset deep-learning few-shot-learning pose-estimation pose-tracking pre-training transfer-learning vision-transformer

Updated 10 months ago

springwebsite • Rank 3.2 • Science 54%

Website code of Spring benchmark

benchmark dataset django optical-flow scene-flow stereo

Updated 10 months ago

jreferral • Rank 3.2 • Science 54%

An open-source tool that recommends the most energy efficient JVM configuration for java software

benchmark energy energy-consumption energy-efficiency java jvm optimization

Updated 10 months ago

@stdlib/utils-timeit • Rank 2.9 • Science 54%

Time a snippet.

bench benchmark clock javascript measure node node-js nodejs perf performance stdlib tic time timeit timer toc util utilities utility utils

Updated 10 months ago

@stdlib/bench • Rank 11.0 • Science 44%

Benchmark.

bench benchmark harness javascript lib library measure node node-js nodejs perf performance standard stdlib tap

Updated 10 months ago

birdset • Rank 7.0 • Science 46%

A benchmark dataset collection for bird sound classification

avian benchmark bioacoustics deeplearning

Updated 10 months ago

https://github.com/csgillespie/benchmarkme • Rank 16.6 • Science 36%

Crowd sourced benchmarking

benchmark cran r

Updated 10 months ago

are-we-fast-yet • Rank 8.2 • Science 44%

Are We Fast Yet? Comparing Language Implementations with Objects, Closures, and Arrays

benchmark benchmarking comparison dynamic-languages language language-implementations performance

Updated 10 months ago

opencl-benchmark • Rank 5.5 • Science 44%

A small OpenCL benchmark program to measure peak GPU/CPU performance.

bandwidth benchmark benchmarking flops gpgpu gpu gpu-computing high-performance-computing hpc opencl tool tools

Updated 10 months ago

https://github.com/beuth-erdelt/benchmark-experiment-host-manager • Rank 9.9 • Science 39%

This python tool helps managing DBMS benchmarking experiments in a Kubernetes-based HPC cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.

aws benchmark cluster database dbms k8s kubernetes python tpc-c tpc-h ycsb

Updated 10 months ago

https://github.com/bark-simulator/bark • Rank 15.4 • Science 33%

Open-Source Framework for Development, Simulation and Benchmarking of Behavior Planning Algorithms for Autonomous Driving

artificial-intelligence autonomous-driving autonomous-vehicles bark bark-simulator benchmark deep-reinforcement-learning machine-learning multi-agent reinforcement-learning research robotics self-driving-car simulation simulator verification

Updated 10 months ago

tax-retrieval-benchmark • Rank 0.7 • Science 44%

An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.

benchmark droit embeddings fiscal fiscalite information-retrieval mteb rag retrieval retrieval-augmented-generation sbert semantic-search sentence-embeddings sentence-transformers stp tax taxation

Updated 10 months ago

https://github.com/bio-phys/mdbenchmark • Rank 10.6 • Science 33%

Quickly generate, start and analyze benchmarks for molecular dynamics simulations.

benchmark cli computational-chemistry gromacs high-performance-computing molecular-dynamics namd python simulation

Updated 10 months ago

polybench • Rank 7.5 • Science 36%

Multivariate polynomial arithmetic benchmark tests.

benchmark mathematics multivariate-polynomials

Updated 9 months ago

https://github.com/google-deepmind/physics-iq-benchmark • Rank 6.9 • Science 36%

Benchmarking physical understanding in generative video models

benchmark generative-models physical-understanding video-generation

Updated 10 months ago

leakdb • Rank 5.2 • Science 36%

LeakDB (Leakage Diagnosis Benchmark) is a realistic leakage dataset for water distribution networks. The dataset is comprised of a large number of artificially created but realistic leakage scenarios, on different water distribution networks, under varying conditions. A scoring algorithm in MATLAB code is provided to evaluate the results of different algorithms.

benchmark dataset leakage

Updated 10 months ago

hdnom • Rank 14.5 • Science 26%

🔮 Benchmarking and visualization toolkit for penalized Cox models

benchmark high-dimensional-data linear-regression nomogram-visualization penalized-cox-models survival-analysis

Updated 10 months ago

https://github.com/cbg-ethz/bmi • Rank 4.8 • Science 33%

Mutual information estimators and benchmark

benchmark estimator mutual-information python

Updated 10 months ago

https://github.com/brucewlee/h-test • Rank 0.7 • Science 33%

[ACL 2024] Language Models Don't Learn the Physical Manifestation of Language

benchmark evaluation language-model

Updated 10 months ago

https://github.com/crowdstrike/cloud-resource-estimator • Rank 5.5 • Science 26%

Cloud deployment size calculation utilities

benchmark cloud-auditing crowdstrike crowdstrike-falcon crowdstrike-horizon cspm cspm-benchmark falcon horizon

Updated 10 months ago

https://github.com/cdjellen/otbench • Rank 5.0 • Science 26%

Effective Benchmarks for Optical Turbulence Modeling

benchmark machine-learning optical-turbulence

Updated 10 months ago

sceneflow_from_blender • Rank 2.8 • Science 28%

Get 3D motion vectors / scene flow directly from Blender

benchmark blender dataset optical-flow scene-flow stereo

Updated 10 months ago

https://github.com/awadell1/pkgjogger.jl • Rank 4.2 • Science 26%

Take your packages for a jog!

benchmark benchmark-framework benchmarking github-actions julia julia-language

Updated 10 months ago

spring_utils • Rank 1.4 • Science 28%

Additional utility code for the Spring dataset and benchmark

benchmark dataset optical-flow pytorch scene-flow

Updated 10 months ago

https://github.com/aim-uofa/geobench • Rank 4.8 • Science 23%

A toolbox for benchmarking SOTA discriminative and generative geometry estimation models.

benchmark monocular-depth-estimation monocular-surface-normal-estimation

Updated 8 months ago

https://github.com/lquenti/blackheap • Rank 1.6 • Science 26%

An blackbox approach to I/O modelling. (Migrated to Codeberg)

benchmark fuse hpc kernel-density-estimation linear-regression performance-analysis statistics

Updated 10 months ago

https://github.com/citiususc/blinkg • Rank 0.7 • Science 26%

BLINKG: Benchmark for LLM-Integrated Knowledge Graph Generation

benchmark knowledge-graph llms llms-benchmarking mappings

Updated 10 months ago

https://github.com/avik-pal/deeplearningbenchmarks • Rank 3.3 • Science 23%

Benchmarks across Deep Learning Frameworks in Julia and Python

benchmark computer-vision conv2d flux gpu julia machine-learning pytorch

Updated 10 months ago

https://github.com/ai-forever/ruscode • Rank 2.1 • Science 23%

Official repository for RusCode benchmark dataset (NAACL 2025)

benchmark dataset kandinsky naacl2025 russian-dataset text-to-image

Updated 10 months ago

https://github.com/jurgisp/memory-maze • Rank 12.3 • Science 10%

Evaluating long-term memory of reinforcement learning algorithms

benchmark reinforcement-learning research

Updated 10 months ago

https://github.com/crate/tsperf • Rank 4.4 • Science 13%

TSPERF Time Series Database Benchmark Suite. Framework for evaluating and comparing the performance of time series databases, in the spirit of TimescaleDB's TSBS.

benchmark benchmark-suite benchmarks cratedb database database-benchmarking database-perfomance database-performance-analysis influxdb mongodb mssql odbc postgresql time-series timescaledb timeseries timeseries-data timeseries-database timestream tsbs

Updated 10 months ago

https://github.com/yegor256/plum • Rank 3.8 • Science 13%

Programming language ultimate metrics (PLUM) collected automatically from GitHub, Google Scholar, Twitter, etc.

analytics benchmark metrics research-project statistics

Updated 10 months ago

pglib-opf • Rank 5.9 • Science 10%

Benchmarks for the Optimal Power Flow Problem

benchmark dataset matpower optimal-power-flow

Updated 10 months ago

cellbench • Rank 5.6 • Science 10%

R package for benchmarking single cell analysis methods

benchmark bioinformatics r

Updated 10 months ago

https://github.com/aliireza/ddio-bench • Rank 5.2 • Science 10%

Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

benchmark dca ddio

Updated 10 months ago

https://github.com/ceed/benchmarks • Rank 5.2 • Science 10%

CEED Benchmarks

benchmark ceed exascale-computing high-order

Updated 10 months ago

https://github.com/cvanaret/nonconvex_solver_comparison • Rank 1.9 • Science 13%

This repo collects results of nonlinear optimization solvers on standard benchmark problems

augmented-lagrangian-method benchmark benchmarking filtersqp interior-point-algorithms ipopt newtons-method nonlinear-optimization nonlinear-optimization-algorithms nonlinear-programming nonlinear-programming-algorithms optimization performance-profile sequential-quadratic-programming snopt software-benchmarking sqp uno-solver

Updated 10 months ago

https://github.com/bblodfon/paad-survival-bench • Rank 1.8 • Science 13%

Benchmark survival ML models against a multimodal TCGA dataset

benchmark curatedtcgadata mlr3 mlr3proba survival-prediction tcga

Updated 10 months ago

https://github.com/bblodfon/ml-course-2022 • Rank 0.7 • Science 13%

Benchmarking ML classification models on spam dataset

benchmark classification ml-course mlr3 spam spam-classification

Updated 10 months ago

foundation-model-benchmarking-tool • Science 54%

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

bedrock benchmark benchmarking deepseek deepseek-r1 evaluation-metrics foundation-models g5 g6 g6e generative-ai inferentia llama2 llama3 p4d p5 sagemaker trainium

Updated 10 months ago

qpbenchmark • Science 44%

Benchmark for quadratic programming solvers available in Python

benchmark optimization quadratic-programming solvers

Updated 10 months ago

https://github.com/bethgelab/model-vs-human • Science 36%

Benchmark your model on out-of-distribution datasets with carefully collected human comparison data (NeurIPS 2021 Oral)

benchmark pytorch robustness tensorflow toolbox

Updated 10 months ago

hyphi-gym • Science 54%

A Gymnasium benchmark suite for evaluating the robustness and multi-task performance of reinforcement learning algorithms in various discrete and continuous environments.

benchmark gridworld gym gymnasium maze mujoco openai reinforcement-learning robustness

Updated 10 months ago

llm-jp-eval • Science 26%

Modified llm-jp-eval with API and HF scripts for LFMs.

benchmark evaluation liquid-ai llm llm-jp-eval

Updated 10 months ago

graph-scaling • Science 67%

A sampling based method for scaling graph (network) data sets.

benchmark forest-fire-sampling graph-datasets graph-sampling graph-scaling graphs random-edge-sampling random-node-sampling sampling totally-induced-edge-sampling

Updated 10 months ago

many-types-4-py-dataset • Science 41%

ManyTypes4Py: A benchmark Python dataset for machine learning-based type inference

benchmark clean dataset machine-learning manytypes4py msr mt4py python type-annotations type-checked type-inference visible-type-hints

Updated 10 months ago

https://github.com/ai4healthuol/mds-ed • Science 49%

Repository for the paper 'MDS-ED: Multimodal Decision Support in the Emergency Department – a benchmark dataset based on MIMIC-IV'.

benchmark datasets deep-learning ecg healthcare medical-dataset multimodal waveforms

Updated 10 months ago

neteasecrowd-dataset • Science 54%

NetEaseCrowd dataset, a collection of data obtained from You Ling crowdsourcing platform, Fuxi AI Lab, NetEase.

benchmark crowdsourcing data-centric-ai dataset truth-inference

Updated 10 months ago

robustbench • Science 54%

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]

adversarial-machine-learning adversarial-robustness benchmark model-zoo

Updated 10 months ago

stochastic-benchmark • Science 36%

Repository for Stochastic Optimization Solvers Benchmark code

benchmark optimization

Updated 10 months ago

https://github.com/axect/scientific_bench • Science 26%

Benchmark some scientific computations for various languages & libraries

benchmark cpp eigen3 julia languages nim numpy python rust scientific-computing

Updated 10 months ago

supergleber • Science 39%

German Language Understanding Evaluation Benchmark @NAACL24

benchmark german llm

Updated 10 months ago

benchmark-privesc-linux • Science 54%

A comprehensive local Linux Privilege-Escalation Benchmark

benchmark benchmarks linux linux-shell privilege-escalation

Updated 10 months ago

variantbenchmarking • Science 57%

Pipeline to evaluate and validate the accuracy of variant calling methods in genomic research

benchmark draft nextflow nf-core pipeline structural-variants variant-calling workflow

Updated 10 months ago

https://github.com/cedrickchee/dawnbench-analysis • Science 13%

DAWNBench analysis of CIFAR-10 time-to-accuracy.

benchmark benchmark-scripts cifar10 dawnbench deeplearning machinelearning

Updated 10 months ago

https://github.com/bytedance/web-bench • Science 36%

Web-Bench is a benchmark designed to evaluate the performance of LLMs in actual Web development.

benchmark

Updated 10 months ago

tsb-uad • Science 26%

An End-to-End Benchmark Suite for Univariate Time-Series Anomaly Detection

anomaly-detection anomaly-detection-algorithm benchmark data-mining data-science datasets python python3 time-series time-series-analysis

Updated 10 months ago

https://github.com/compnet/signedbenchmark • Science 13%

Benchmark to study partitioning problems on signed graphs

benchmark graph-partitioning signed-graphs

Updated 10 months ago

https://github.com/aksw/frankgraphbench • Science 49%

The FranKGraphBench is a Framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

benchmark knowledge-graph recommender-system

Updated 10 months ago

https://github.com/amazon-science/memerag • Science 36%

MEMERAG: A Multilingual End-to-End Meta-Evaluation Benchmark for Retrieval Augmented Generation

benchmark evaluation rag

Updated 10 months ago

https://github.com/bytedance/shot2story • Science 10%

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark dataset large-language-models research video-captioning video-language video-language-pretraining video-question-answering video-story video-story-generation video-summarization vision-language