adl-benchmarks-index

A list of example analysis challenges and solved implementations of them in various languages

https://github.com/iris-hep/adl-benchmarks-index

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
5 of 9 committers (55.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A list of example analysis challenges and solved implementations of them in various languages

Basic Info

Host: GitHub
Owner: iris-hep
Default Branch: master
Homepage:
Size: 63.5 KB

Statistics

Stars: 13
Watchers: 9
Forks: 9
Open Issues: 9
Releases: 1

Created about 7 years ago · Last pushed about 2 years ago

Metadata Files

Readme Citation

Introduction

This repository is intended to maintain a list of common agreed-upon benchmark analysis tasks that can be used to exemplify, test, and compare different languages and approaches used for analysis. Also listed here are public data files available to run these benchmarks on and the repositories of actual implementations of these benchmarks.

Functionality benchmarks

Plot the E_T^miss of all events.
Plot the p_T of all jets.
Plot the p_T of jets with |η| < 1.
Plot the E_T^miss of events that have at least two jets with p_T > 40 GeV.
Plot the E_T^miss of events that have an opposite-charge muon pair with an invariant mass between 60 and 120 GeV.
For events with at least three jets, plot the p_T of the trijet four-momentum that has the invariant mass closest to 172.5 GeV in each event and plot the maximum b-tagging discriminant value among the jets in this trijet.
Plot the scalar sum in each event of the p_T of jets with p_T > 30 GeV that are not within 0.4 in ΔR of any light lepton with p_T > 10 GeV.
For events with at least three light leptons and a same-flavor opposite-charge light lepton pair, find such a pair that has the invariant mass closest to 91.2 GeV in each event and plot the transverse mass of the system consisting of the missing tranverse momentum and the highest-p_T light lepton not in this pair.

For the motivations behind these benchmarks, see motivation.md. For a technical reference of the terms used in the benchmarks, see reference.md.

Input data files

Converted to NanoAOD from 2012 CMS open data:
- root://eospublic.cern.ch//eos/root-eos/benchmark/Run2012B_SingleMu.root (16 GiB, 53 million events)

Language implementations

|Repository|Language|Description| |----------|--------|-----------| |opendata-benchmarks|RDataFrame|RDataFrame is a componenent of ROOT that provides a high-level interface for analyzing TTrees and other data formats. Each task is solved with a simpler syntax useful in interpreted ROOT macros as well as a fully compiled C++ syntax for best performance. | |nail|NAIL (Natual Analysis Implementation Language)|| |groot|Go|Part of the Go-HEP project, groot is a pure Go package that provides read/write access to ROOT files| |coffea|Python + Numpy|Coffea builds on numpy and awkward-array for columnar data analysis in Python| |bamboo|Python + RDataFrame|The bamboo analysis framework provides a high-level Python interface to RDataFrame (technically an embedded domain-specific language)| |queryosity| C++ | Queryosity is a (semi-)structured data analysis library with support for arbitrary data types. | |Rumble|JSONiq (an XQuery dialect for JSON data)|Most data in ROOT files can be exposed in the JSON data model and can thus be processed by JSONiq. This implementation is targeted to be run on Rumble, a JSONiq implementation on top of Spark, but could be run by any other JSONiq processor.| |BigQuery|BigQuery's dialect of SQL|SQL is arguably the most wide-spread language for querying structured data. Since SQL:1999, it supports arrays and structured types and is thus, in principle, suited for typical HEP analyses, though not many implementations support these features. BigQuery's dialect is based on SQL:2011, supports the mentioned features, and has a few additional language constructs that make queries more concise.| |PrestoDB|PrestoDB's dialect of SQL |Like BigQuery, Presto has some support for arrays and structured types; however, it only has limited support for nested queries and a more verbose syntax than BigQuery.| |Amazon Athena|Athena's dialect of SQL|Athena is a fully-managed Query-as-a-Service system based on PrestoDB with attractive scalability and pricing but a few more limitations than Presto (most importantly, no support for user-defined functions).| |SQL++ (AsterixDB)|SQL++|AsterixDB is a Big Data platform specialized for semi-structured data. Its query language is thus designed to deal with nested data intuitively.| |UnROOT.jl|Julia|Pure Julia implementation utilizing packages developed by JuliaHEP as a demonstration of ease of use, flexibility, and peak performance at the same time for end-user analysis.| |Snowflake|Snowflake's dialect of SQL|Snowflake is a fully-managed Query-as-a-Service system that boasts high performance and scalability as a pure in-cloud database. Moreover, Snowflake adds support for the powerful VARIANT data type, specifically designed to efficiently store and process semi-structured data.|

Adding new benchmarks, data, or implementations

Additional benchmarks or public data files can be suggested as GitHub issues on this project to start a discussion within the HSF Data Analysis Working Group community.
Suggested modifications to the layout of this repository are also welcome as new GitHub issues.
If you would like to add a repository with a new implementation of the benchmarks, go ahead and submit a pull request with the proposed changes.

Owner

Name: IRIS-HEP
Login: iris-hep
Kind: organization

Website: http://iris-hep.org/
Repositories: 32
Profile: https://github.com/iris-hep

Institute for Research and Innovation in Software for High Energy Physics

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
title: ADL Functionality Benchmarks Index
abstract: A list of example analysis challenges and solved implementations of them in various languages
authors:
  - family-names: Proffitt
    given-names: Mason
  - family-names: Müller
    given-names: Ingo
  - family-names: Graur
    given-names: Dan
  - family-names: Adamec
    given-names: Mat
  - family-names: Ling
    given-names: Jerry
  - family-names: David
    given-names: Pieter
  - family-names: Guiraud
    given-names: Enrico
  - family-names: Binet
    given-names: Sebastien
doi: 10.5281/zenodo.5131286
repository-code: "https://github.com/iris-hep/adl-benchmarks-index"

GitHub Events

Total

Last Year

Committers

Last synced: 11 months ago

All Time

Total Commits: 56
Total Committers: 9
Avg Commits per committer: 6.222
Development Distribution Score (DDS): 0.429

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Mason Proffitt	m**t@c**h	32
taehyounpark	t**k@i**m	7
Ingo Mueller	i**r@i**h	7
Mat Adamec	m**c@g**m	3
Pieter David	p**d@g**m	2
Jerry Ling	p**n@j**v	2
Sebastien Binet	b**t@c**h	1
Enrico Guiraud	e**d@c**h	1
DanGraur	d**r@i**h	1

Committer Domains (Top 20 + Academic)

cern.ch: 3 inf.ethz.ch: 2 jling.dev: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 20
Total pull requests: 29
Average time to close issues: 3 months
Average time to close pull requests: 12 days
Total issue authors: 6
Total pull request authors: 9
Average comments per issue: 2.65
Average comments per pull request: 1.34
Merged pull requests: 29
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 30 days
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 4.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

masonproffitt (7)
ingomueller-net (6)
gordonwatts (2)
nsmith- (1)
benkrikler (1)
matthewfeickert (1)

Pull Request Authors

masonproffitt (5)
ingomueller-net (5)
taehyounpark (2)
mat-adamec (2)
sbinet (1)
eguiraud (1)
pieterdavid (1)
Moelf (1)
DanGraur (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

adl-benchmarks-index

Science Score: 64.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Introduction

Functionality benchmarks

Input data files

Language implementations

Adding new benchmarks, data, or implementations

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels