cathode

CATHODE program from arxiv.org/abs/2109.00546

https://github.com/hepml-anomalydetection/cathode

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 11 months ago · JSON representation

Repository

CATHODE program from arxiv.org/abs/2109.00546

Basic Info

Host: GitHub
Owner: HEPML-AnomalyDetection
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 60.4 MB

Statistics

Stars: 10
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 0

Created almost 5 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

CATHODE

A new anomaly detection algorithm that brings together the best from ANODE and CWoLa. Train a density estimator on sidebands, sample artificial datapoints in the signal region, train a classifier to distinguish artificial and real signal region data and then use the same classifier for classifying signal (the anomaly) from background.

To see the definition of signal region and sideband region please see: SB-SR

performance comparison

Note: This repo serves to reproduce the results from the CATHODE paper. Meanwhile, a more easy-to-use code base, to illustrate and adopt CATHODE for own projects, has been set up here.

Follow the instructions below to reproduce the results. The steps "Train the ANODE model", "Mix data and samples", "Train the classifier", and "Evaluation" can be called separately as described below. Alternatively, the script run_all.py can be used to run the full pipeline in one call.

Citation

If you use CATHODE for your research, please cite:
- "Classifying anomalies through outer density estimation",
By Anna Hallin, Joshua Isaacson, Gregor Kasieczka, Claudius Krause, Benjamin Nachman, Tobias Quadfasel, Matthias Schlaffer, David Shih, and Manuel Sommerhalder.
Phys. Rev. D 106, 055006 (2022).

Data preparation

(can be skipped if one starts directly from the preprocessed samples here)

To get the datasets: wget https://zenodo.org/record/4536377/files/events_anomalydetection_v2.features.h5 wget https://zenodo.org/record/5759087/files/events_anomalydetection_qcd_extra_inneronly_features.h5

To preprocess: python run_data_preparation_LHCORD.py To scan over different signal injections and/or different splits, use the --S_over_B and --seed option respectively. The results in the paper when scanning into lower S/B ratios were achieved by varying the seed from 1 to 10.

Running the full pipeline

Use the script run_all.py to run the full pipeline in one go. The flag --mode with the options CATHODE, ANODE, CWoLa, or supervised specifies which analysis type will be run. Explanation of additional arguments are explained when calling python run_all.py -h. In general, arguments considering the density estimator step start with --DE_, and arguments considering the classifier step start with --cf_.

The command to produce the most up-to-date performance is: python run_all.py --data_dir separated_data/ --mode CATHODE --cf_separate_val_set --no_extra_signal --cf_n_samples 400000 --cf_realistic_conditional --cf_oversampling --cf_no_logit --cf_use_class_weights --cf_save_model --cf_n_runs 1

Train the ANODE model

The corresponding script is runANODEtraining.py

Mix data and samples

The corresponding script is runclassifierdata_creation.py

Train the classifier

The corresponding script is runclassifiertraining.py

Evaluation

The evaluation leading to the main plots is shown in plotting_notebook.ipynb.

Alternatively, a short script like ``` from evaluationutils import fullsingle_evaluation

datasavedir = 'classifierdatafolder/' predsdir = 'classifieroutputfolder/'

_ = fullsingleevaluation(datasavedir, predsdir, nensembleepochs=10, sicrange=(0, 20), savefig='resultSIC') ```

will plot the resulting SIC curve to file.

Benchmarks

Classic ANODE

The most up-to-date command for the ANODE benchmark is: python run_all.py --no_extra_signal --data_dir separated_data/ --mode ANODE

CWoLa

The most up-to-date command for the CWoLa Hunting benchmark is: python run_all.py --data_dir separated_data/ --mode CWoLa --cf_separate_val_set --no_extra_signal --cf_oversampling --cf_no_logit --cf_use_class_weights --cf_save_model

Idealized AD

The most up-to-date command for the idealized anomaly detector benchmark is: python run_all.py --data_dir separated_data/ --mode idealized_AD --cf_separate_val_set --no_extra_signal --cf_no_logit --cf_oversampling --cf_use_class_weights --cf_save_model --cf_extra_bkg

Supervised

The most up-to-date command for the fully supervised benchmark is: python run_all.py --data_dir separated_data/ --mode supervised --cf_separate_val_set --no_extra_signal --cf_no_logit --cf_oversampling --cf_save_model --cf_extra_bkg

Side remarks

All the listed scripts provide documentation on how to use them by calling python [SCRIPT].py --help. In particular, the above example commands use default input/output directories and model names, which should be adjusted to custom choices when multiple studies are performed.

Owner

Name: HEPML-AnomalyDetection
Login: HEPML-AnomalyDetection
Kind: organization

Repositories: 1
Profile: https://github.com/HEPML-AnomalyDetection

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science