https://github.com/cern-it-innovation/gqc
Guided Quantum Compression (GQC) network for simultaneous dimensionality reduction and classification of high-dimensional data.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: iop.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.5%) to scientific vocabulary
Keywords
Repository
Guided Quantum Compression (GQC) network for simultaneous dimensionality reduction and classification of high-dimensional data.
Basic Info
Statistics
- Stars: 4
- Watchers: 3
- Forks: 5
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Guided Quantum Compression for Higgs Identification
Many data sets are too complex for currently available quantum computers. Consequently, quantum machine learning applications conventionally resort to dimensionality reduction algorithms, e.g., auto-encoders, before passing data through the quantum models. We show that using a classical auto-encoder as an independent preprocessing step can significantly decrease the classification performance of a quantum machine learning algorithm. To ameliorate this issue, we design an architecture that unifies the preprocessing and quantum classification algorithms into a single trainable model: the guided quantum compression model. The utility of this model is demonstrated by using it to identify the Higgs boson in proton-proton collisions at the LHC, where the conventional approach proves ineffective. Conversely, the guided quantum compression model excels at solving this classification problem, achieving a good accuracy. Additionally, the model developed herein shows better performance compared to the classical benchmark when using only low-level kinematic features.
This repository represents the source code of the following paper Guided quantum compression for high dimensional data classification
If you plan to use or take part of the code, please cite the usage:
@article{Belis_2024,
title={Guided quantum compression for high dimensional data classification},
volume={5},
ISSN={2632-2153},
url={http://dx.doi.org/10.1088/2632-2153/ad5fdd},
DOI={10.1088/2632-2153/ad5fdd},
number={3},
journal={Machine Learning: Science and Technology},
publisher={IOP Publishing},
author={Belis, Vasilis and Odagiu, Patrick and Grossi, Michele and Reiter, Florentin and Dissertori, Günther and Vallecorsa, Sofia},
year={2024},
month=jul, pages={035010} }
Installing Dependencies
We strongly recommend using conda to install the dependencies for this repo.
If you have 'conda', go into the folder with the code you want to run, then create
an environment from the .yml file in that folder. Activate the environment.
Now you can run the code! Go to the Running the code section. for further instructions.
If you do not want to use conda, here is a list of the packages you
would need to install:
Pre-processing * numpy * pandas * pytables * matplotlib * scikit-learn
Auto-encoders
* numpy
* matplotlib
* scikit-learn
* pytorch (follow instruction here)
* torchinfo
* pykeops
* g++ compiler version >= 7
* cudatoolkit version >= 10
* geomloss
Pennylane VQC
* numpy
* matplotlib
* scikit-learn
* pytorch (follow instruction here)
* torchinfo
* pykeops
* g++ compiler version >= 7
* cudatoolkit version >= 10
* geomloss
* pennylane
* pennylane-qiskit
* pennylane-lightning[gpu]
* NVidia cuQuantum SDK
The pykeops package is required to run the Sinkhorn auto-encoder. However, it is a tricky package to manage, so make sure that you have a gcc and a g++ compiler in your path that is compatible with the version of cuda you are running. We recommend using conda for exactly this reason, since conda sets certain environment variables such that everything is configured correctly and pykeops can compile using cuda.
If you encounter any bugs, please contact us at the email addresses listed on this repository.
Running the Code
The data preprocessing scripts are ran from inside the preprocessing folder. These scripts were customised for the specific data set that the authors are using. For access to this data, please contact us.
The preprocessing scripts produce normalised numpy arrays saved to three different files for training, validation, and testing.
The scripts to launch the autoencoder training on the data are in the bin
folder. Look for the run.snip files to see the basic run cases for the
code and customise from there.
Owner
- Name: CERN-IT-INNOVATION
- Login: CERN-IT-INNOVATION
- Kind: organization
- Repositories: 6
- Profile: https://github.com/CERN-IT-INNOVATION
GitHub Events
Total
- Watch event: 1
- Push event: 3
- Fork event: 3
Last Year
- Watch event: 1
- Push event: 3
- Fork event: 3
Dependencies
- cmake >=3.21.4
- geomloss >=0.2.4
- matplotlib >=3.4.3
- numpy >=1.21.4
- optuna >=2.10.0
- pandas >=1.3.4
- pyekops >=1.5
- pyparsing <3
- qiskit >=0.32.0
- qiskit-machine-learning >=0.2.1
- scikit-learn >=1.0.1
- tables >=3.6.1
- torch >=1.10.0
- torchaudio >=0.10.0
- torchinfo >=1.5.3
- torchvision >=0.11.1