cathode
CATHODE program from arxiv.org/abs/2109.00546
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
CATHODE program from arxiv.org/abs/2109.00546
Basic Info
Statistics
- Stars: 10
- Watchers: 1
- Forks: 3
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CATHODE
A new anomaly detection algorithm that brings together the best from ANODE and CWoLa. Train a density estimator on sidebands, sample artificial datapoints in the signal region, train a classifier to distinguish artificial and real signal region data and then use the same classifier for classifying signal (the anomaly) from background.
To see the definition of signal region and sideband region please see: SB-SR

Note: This repo serves to reproduce the results from the CATHODE paper. Meanwhile, a more easy-to-use code base, to illustrate and adopt CATHODE for own projects, has been set up here.
Follow the instructions below to reproduce the results. The steps "Train the ANODE model", "Mix data and samples", "Train the classifier", and "Evaluation" can be called separately as described below. Alternatively, the script run_all.py can be used to run the full pipeline in one call.
Citation
If you use CATHODE for your research, please cite:
- "Classifying anomalies through outer density estimation",
By Anna Hallin, Joshua Isaacson, Gregor Kasieczka, Claudius Krause, Benjamin Nachman,
Tobias Quadfasel, Matthias Schlaffer, David Shih, and Manuel Sommerhalder.
Phys. Rev. D 106, 055006 (2022).
Data preparation
(can be skipped if one starts directly from the preprocessed samples here)
To get the datasets:
wget https://zenodo.org/record/4536377/files/events_anomalydetection_v2.features.h5
wget https://zenodo.org/record/5759087/files/events_anomalydetection_qcd_extra_inneronly_features.h5
To preprocess:
python run_data_preparation_LHCORD.py
To scan over different signal injections and/or different splits, use the --S_over_B and --seed option respectively. The results in the paper when scanning into lower S/B ratios were achieved by varying the seed from 1 to 10.
Running the full pipeline
Use the script run_all.py to run the full pipeline in one go. The flag --mode with the options CATHODE, ANODE, CWoLa, or supervised specifies which analysis type will be run. Explanation of additional arguments are explained when calling python run_all.py -h. In general, arguments considering the density estimator step start with --DE_, and arguments considering the classifier step start with --cf_.
The command to produce the most up-to-date performance is:
python run_all.py --data_dir separated_data/ --mode CATHODE --cf_separate_val_set --no_extra_signal --cf_n_samples 400000 --cf_realistic_conditional --cf_oversampling --cf_no_logit --cf_use_class_weights --cf_save_model --cf_n_runs 1
Train the ANODE model
The corresponding script is runANODEtraining.py
Mix data and samples
The corresponding script is runclassifierdata_creation.py
Train the classifier
The corresponding script is runclassifiertraining.py
Evaluation
The evaluation leading to the main plots is shown in plotting_notebook.ipynb.
Alternatively, a short script like ``` from evaluationutils import fullsingle_evaluation
datasavedir = 'classifierdatafolder/' predsdir = 'classifieroutputfolder/'
_ = fullsingleevaluation(datasavedir, predsdir, nensembleepochs=10, sicrange=(0, 20), savefig='resultSIC') ```
will plot the resulting SIC curve to file.
Benchmarks
Classic ANODE
The most up-to-date command for the ANODE benchmark is:
python run_all.py --no_extra_signal --data_dir separated_data/ --mode ANODE
CWoLa
The most up-to-date command for the CWoLa Hunting benchmark is:
python run_all.py --data_dir separated_data/ --mode CWoLa --cf_separate_val_set --no_extra_signal --cf_oversampling --cf_no_logit --cf_use_class_weights --cf_save_model
Idealized AD
The most up-to-date command for the idealized anomaly detector benchmark is:
python run_all.py --data_dir separated_data/ --mode idealized_AD --cf_separate_val_set --no_extra_signal --cf_no_logit --cf_oversampling --cf_use_class_weights --cf_save_model --cf_extra_bkg
Supervised
The most up-to-date command for the fully supervised benchmark is:
python run_all.py --data_dir separated_data/ --mode supervised --cf_separate_val_set --no_extra_signal --cf_no_logit --cf_oversampling --cf_save_model --cf_extra_bkg
Side remarks
All the listed scripts provide documentation on how to use them by calling python [SCRIPT].py --help. In particular, the above example commands use default input/output directories and model names, which should be adjusted to custom choices when multiple studies are performed.
Owner
- Name: HEPML-AnomalyDetection
- Login: HEPML-AnomalyDetection
- Kind: organization
- Repositories: 1
- Profile: https://github.com/HEPML-AnomalyDetection
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2