simplify

Create and validate simplified likelihoods from full likelihoods.

https://github.com/eschanet/simplify

Keywords

cern hep-ex inference likelihood particlephysics statistics

Keywords from Contributors

histograms

Last synced: 6 months ago · JSON representation

Repository

Create and validate simplified likelihoods from full likelihoods.

Basic Info

Host: GitHub
Owner: eschanet
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage:
Size: 11.6 MB

Statistics

Stars: 6
Watchers: 2
Forks: 2
Open Issues: 4
Releases: 0

Topics

cern hep-ex inference likelihood particlephysics statistics

Created over 5 years ago · Last pushed over 4 years ago

Metadata Files

Readme Contributing License

simplify

A python package that creates simplified likelihoods from full likelihoods. The method is documented in the ATLAS PUB note Implementation of simplified likelihoods in HistFactory for searches for supersymmetry and in chapter 10 of Eric Schanet's PhD thesis. Currently, only one format is implemented for simplified likelihoods, but the idea is to support additional forms of (not so) simplified likelihoods.

Introduction

In high energy physics (HEP), searches for new physics are typically interested in making inferences about a probabilistic model given some observed collision data. This approach can be formalised using a statistical model f(x|φ), i.e. a parametric family of probability density functions (PDFs) describing the probability of observing data x given some model parameters φ. The likelihood function L(φ) then refers to the value of f as a function of φ given fixed x.

For binned data, the HistFactory template for building statistical models and likelihoods finds ample usage in HEP.

Although many searches for supersymmetry (SUSY) are sensitive to a variety of beyond the Standard Model (BSM) physics models, for reasons of computational cost and complexity they are often only interpreted in a limited set of simplified models. While statistical inference and interpretation of limits on individual SUSY production and decay topologies is straightforward and very convenient, their lack of model complexity leads to poor approximations to the true underlying constraints on the respective model parameters of a more complete SUSY model. In order to investigate realistic SUSY scenarios, large-scale re-interpretation efforts scanning a large number of dimensions is needed, resulting in a significant increase in computational cost for the statistical inference.

The approximation method put forward in the ATLAS PUB note Implementation of simplified likelihoods in HistFactory for searches for supersymmetry and implemented in this repository introduces the notion of simplified likelihoods that come with low computational cost but high statistical precision, therefore offering a viable solution for large-scale re-interpretation efforts over large model spaces.

Installation

Follow good practice and start by creating a virtual environment, e.g. using venv

console python3 -m venv simplify

and then activating it console source simplify/bin/activate

Default installation from PyPI

You can install simplify directly from PyPI with console python3 -m pip install simplify[contrib]

Notice that simplify is supported and tested for Python 3.7, Python 3.8, and Python 3.9.

Development installation

If you want to contribute to simplify, install the development version of the package. Fork the repository, clone your fork, and then install from local resources with console python3 -m pip install --ignore-installed -U -e .[complete] Note that you might have to wrap .[complete] into quotes depending on your shell.

Next, setup the git pre-commit hook for Black console pre-commit install

Now you should be able to run all the tests with console python3 -m pytest

How to run

You can use simplify either through your command line, or integrate it directly into your scripts.

CLI

Run with e.g.

console simplify convert < fullLH.json > simplifiedLH.json

or e.g.

console curl http://foo/likelihood.json | simplify convert

where fullLH.json is the full likelihood you want to convert into a simplified likelihood. Simplify is able to read/write from/to stdin/stdout.

Hit simplify --help for detailed information on the CLI.

In Python script

You can also use simplify in a Python script, e.g. to create some validation and cross-check plots and tables.

```py import pyhf import json

import simplify

set the computational backend to pyhf and load LH

pyhf.set_backend(pyhf.tensorlib, "minuit") spec = json.load(open("likelihood.json", "r"))

ws from full LH

ws = pyhf.Workspace(spec)

get model and data for each ws we just created

use polynomial interpolation and exponential extrapolation

for nuisance params

model = ws.model( modifier_settings = { "normsys": {"interpcode": "code4"}, "histosys": {"interpcode": "code4p"}, } ) data = ws.data(model)

run fit

fit_result = simplify.fitter.fit(ws)

plot the pulls

plt = simplify.plot.pulls( fit_result, "plots/" )

plot correlation matrix

plt = simplify.plot.correlationmatrix( fitresult, "plots/", pruning_threshold=0.1 )

get a yieldstable in nice LaTeX format

tables = simplify.plot.yieldsTable( ws, "plots/", fit_result, ) ```

Example Likelihood

Let's go through an example likelihood. We'll use the full likelihood of an ATLAS search for direct production of electroweakinos in final states with one lepton and a Higgs boson (10.1140/epjc/s10052-020-8050-3). The full likelihood in JSON format as specified by ATL-PHYS-PUB-2019-029 is publicly available to download from doi.org/10.17182. It contains the full statistical model of the original analysis given the full observed dataset from Run-2 of the LHC.

You can either download the likelihood by hand from HEPData, or just let pyhf do the work for you by using console pyhf contrib download https://doi.org/10.17182/hepdata.90607.v3/r3 1Lbb-likelihoods && cd 1Lbb-likelihoods

From there, provided you have already setup simplify previously (which also sets up pyhf), you can produce a simplified likelihood of this analysis with console simplify convert < BkgOnly.json > simplify_BkgOnly.json

And you're done. Well, at least you've got yourself a simplified version of that likelihood, which approximates the total background using a single background sample that is set to the post-fit total background determined from the full likelihood. The uncertainties (the expensive part) are approximated using only the final uncertainty on the background estimate in each bin of the analysis.

If you think about it, this gives you quite a simple likelihood function. Let's compare them quickly. For the full likelihood, we can inspect the full likelihood with pyhf (and only look at the first 17 lines containing summary of what pyhf spits out): console pyhf inspect BkgOnly.json | head -n 17

This should give you ```console Summary

channels  8
 samples  9

parameters 115 modifiers 115

channels  nbins

SRHMEMmct2 3 SRLMEMmct2 3 SRMMEMmct2 3 STCREMcuts 1 TRHMEMcuts 1 TRLMEMcuts 1 TRMMEMcuts 1 WREMcuts 1 ```

Think about this for a second. You've got 8 channels with a total of 14 bins. Each bin contains information about 9 samples, and each event rate for each sample is subject to a total of 115 additional parameters (the uncertainties of the model). This makes for quite a complicated likelihood function.

On to the simplified one then. console pyhf inspect simplify_BkgOnly.json | head -n 17 gives us ```console Summary

channels  8
 samples  1

parameters 1 modifiers 1

channels  nbins

SRHMEMmct2 3 SRLMEMmct2 3 SRMMEMmct2 3 STCREMcuts 1 TRHMEMcuts 1 TRLMEMcuts 1 TRMMEMcuts 1 WREMcuts 1 ``` i.e, we still have the original number of bins and samples (this is what drives our sensitivity, so we don't want to compromise here), but we end up with only one sample and one uncertainty per bin.

It's not surprising to see then, that the computational performance of both is quite different. Let's have a look at a benchmark for this specific analysis:

Benchmark of ANA-SUSY-2019-08

Ignore the green bars for now and focus on the orange and blue ones instead. The orange (blue) ones show the wall times in seconds for the full (simplified) likelihood. In their fastest configurations, the simplified likelihood obviously is two orders of magnitude faster than the full likelihood.

But this isn't worth anything if the approximation isn't a good one. So let's have a look at how it performs. All the original signal models investigated by the analysis are contained in the 1Lbb-likelihoods as a JSON patch file. Just patch each one onto the full and simplified likelihood, perform statistical inference using pyhf and then plot the results:

Performance of ANA-SUSY-2019-08

Given the two orders of magnitude we gain in computational speed, this small loss in statistical precision is impressive! Within the one standard deviation uncertainties, there is basically no difference at all in both contours!

P.S. I skipped quite a few steps to get to this figure. All of the necessary tools and scripts are available (and sometimes described) in my pyhfinferencetools.

Owner

Name: Eric Schanet
Login: eschanet
Kind: user
Location: Zurich

Website: eschanet.com
Twitter: eschanet
Repositories: 3
Profile: https://github.com/eschanet

Data Scientist | Particle Physicist | Computing Enthusiast

GitHub Events

Total

Issues event: 1
Issue comment event: 1
Create event: 1

Last Year

Issues event: 1
Issue comment event: 1
Create event: 1

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 155
Total Committers: 5
Avg Commits per committer: 31.0
Development Distribution Score (DDS): 0.316

Top Committers

Name	Email	Commits
Eric Schanet	e**t@c**h	106
Eric Schanet	e**t@g**m	35
Matthew Feickert	m**t@c**h	9
Eric Schanet	E**t@c**h	4
Giordon Stark	k**g@g**m	1

Committer Domains (Top 20 + Academic)

cern.ch: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 6
Total pull requests: 15
Average time to close issues: 3 months
Average time to close pull requests: 8 days
Total issue authors: 4
Total pull request authors: 3
Average comments per issue: 2.0
Average comments per pull request: 1.73
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

lukasheinrich (2)
matthewfeickert (2)
WolfgangWaltenberger (1)
eschanet (1)

Pull Request Authors

matthewfeickert (8)
eschanet (6)
kratsg (1)

Top Labels

Issue Labels

enhancement (2)

Pull Request Labels

fix (2) documentation (1) API (1)

Packages

Total packages: 2
Total downloads:
- pypi 373 last-month
Total docker downloads: 14

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 6
(may contain duplicates)
Total versions: 11
Total maintainers: 3

pypi.org: simplify

Produce simplified likelihoods of different formats

Documentation: https://simplify.readthedocs.io/
License: BSD 3-Clause
Latest release: 0.1.10
published over 4 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 5
Downloads: 333 Last month
Docker Downloads: 14

Rankings

Dependent repos count: 6.7%

Dependent packages count: 10.0%

Downloads: 14.4%

Average: 14.7%

Forks count: 19.1%

Stargazers count: 23.1%

Maintainers (3)

matthewfeickert kratsg eschanet

Last synced: 6 months ago

pypi.org: simplify-hep

Produce simplified likelihoods of different formats

Documentation: https://simplify-hep.readthedocs.io/
License: BSD 3-Clause
Latest release: 0.1.5
published almost 5 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 40 Last month

Rankings

Dependent packages count: 10.0%

Forks count: 19.1%

Stargazers count: 21.5%

Dependent repos count: 21.7%

Average: 24.5%

Downloads: 50.2%

Maintainers (1)

eschanet

Last synced: 6 months ago

simplify

Science Score: 33.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

simplify

Table of contents

Introduction

Installation

Default installation from PyPI

Development installation

How to run

CLI

In Python script

set the computational backend to pyhf and load LH

ws from full LH

get model and data for each ws we just created

use polynomial interpolation and exponential extrapolation

for nuisance params

run fit

plot the pulls

plot correlation matrix

get a yieldstable in nice LaTeX format

Example Likelihood

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: simplify

Rankings

Maintainers (3)

pypi.org: simplify-hep

Rankings

Maintainers (1)