eqmarl
This is the repository for the paper "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels" published in ICLR 2025
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (1.5%) to scientific vocabulary
Keywords
Repository
This is the repository for the paper "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels" published in ICLR 2025
Basic Info
Statistics
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels
This repository is the official implementation of "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels", published in the Thirteenth International Conference on Learning Representations (ICLR) 2025.
See Citation section for BibTeX reference.
https://github.com/user-attachments/assets/7daf9eac-4b95-4d33-88be-d93856b48622
See eqmarl-vis repository for visualizations.
Installation
The codebase is provided as an installable Python package called eqmarl. To install the package via pip, you can run:
```bash
Navigate to eqmarl source folder.
$ cd path/to/eqmarl/
Install eqmarl package.
$ python -m pip install . ```
You can verify the package was successfully install by running:
bash
$ python -c "import importlib.metadata; version=importlib.metadata.version('eqmarl'); print(version)"
1.0.0
Requirements
If instead you just want to install the requirements without the package, you can run:
bash
$ python -m pip install -r requirements.txt -r requirements-dev.txt
Notes on Tensorflow Quantum installation with Anaconda
Installation of this repo can be little finicky because of the requirements for tensorflow-quantum on various systems.
If you are using Anaconda to manage Python on macOS, be aware that the version of Python may have been built using an outdated version of macOS. To check this, you can run:
bash
$ python -c "from distutils import util; print(util.get_platform())"
macosx-10.9-x86_64
Notice that in the above example we see the installation of Python was built against macosx-10.9-x86_64, whereas the wheel for tensorflow-quantum requires macosx-12.1-x86_64 or later.
To circumvent this, you can download the wheel for tensorflow-quantum==0.7.2 from here https://pypi.org/project/tensorflow-quantum/0.7.2/#files and change the name of the filename from tensorflow_quantum-0.7.2-cp39-cp39-macosx_12_1_x86_64.whl to tensorflow_quantum-0.7.2-cp39-cp39-macosx_10_9_x86_64.whl. Once you've done that you can install the wheel via:
```bash
Activate your environment.
$ conda activate myenv
Install wheel file manually.
$ python -m pip install tensorflowquantum-0.7.2-cp39-cp39-macosx109x86_64.whl ```
Training
To train using the frameworks in the paper, run this command:
bash
$ python ./scripts/experiment_runner.py ./experiments/<experiment_name>.yml
This invokes the experiment_runner.py script, which runs experiments based on YAML configurations.
Note that the option -r/--n-train-rounds can be used to train over multiple seed rounds (defaults to 1 round).
The experiment configuration for each of the frameworks discussed in the paper is described as a YAML file in the experiments folder.
The full list of experiments is as follows:
Experiment YAML File | Environment | Description
--- | --- | ---
coingame_maa2c_mdp_eqmarl_noentanglement.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_phi+.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_phi-.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_psi+.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_eqmarl_psi-.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_mdp_fctde.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
coingame_maa2c_mdp_qfctde.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
coingame_maa2c_mdp_sctde.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
coingame_maa2c_pomdp_eqmarl_noentanglement.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_phi+.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_phi-.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_psi+.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_eqmarl_psi-.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers.
coingame_maa2c_pomdp_fctde.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
coingame_maa2c_pomdp_qfctde.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
coingame_maa2c_pomdp_sctde.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
coingame_maa2c_mdp_eqmarl_psi+_L2.yml | $\texttt{CoinGame-2}$ | MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers.
coingame_maa2c_mdp_eqmarl_psi+_L10.yml | $\texttt{CoinGame-2}$ | MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers.
coingame_maa2c_mdp_qfctde_L2.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers.
coingame_maa2c_mdp_qfctde_L10.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers.
coingame_maa2c_mdp_fctde_size3.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units.
coingame_maa2c_mdp_fctde_size6.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units.
coingame_maa2c_mdp_fctde_size24.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units.
coingame_maa2c_mdp_sctde_size3.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units.
coingame_maa2c_mdp_sctde_size6.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units.
coingame_maa2c_mdp_sctde_size24.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units.
coingame_maa2c_pomdp_eqmarl_psi+_L2.yml | $\texttt{CoinGame-2}$ | POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers.
coingame_maa2c_pomdp_eqmarl_psi+_L10.yml | $\texttt{CoinGame-2}$ | POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers.
coingame_maa2c_pomdp_qfctde_L2.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers.
coingame_maa2c_pomdp_qfctde_L10.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers.
coingame_maa2c_pomdp_fctde_size3.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units.
coingame_maa2c_pomdp_fctde_size6.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units.
coingame_maa2c_pomdp_fctde_size24.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units.
coingame_maa2c_pomdp_sctde_size3.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units.
coingame_maa2c_pomdp_sctde_size6.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units.
coingame_maa2c_pomdp_sctde_size24.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units.
cartpole_maa2c_mdp_eqmarl_psi+.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
cartpole_maa2c_mdp_fctde.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
cartpole_maa2c_mdp_qfctde.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
cartpole_maa2c_mdp_sctde.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
cartpole_maa2c_pomdp_eqmarl_psi+.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers.
cartpole_maa2c_pomdp_fctde.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units.
cartpole_maa2c_pomdp_qfctde.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers.
cartpole_maa2c_pomdp_sctde.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.
Results
The actor-critic models trained using the frameworks described in the paper achieved the performance outlined in the sections below.
Pre-trained models can be found in the supplementary materials, within a folder called pre_trained_models/, that accompanies this repository.
<!-- Note that under the same folder structure as experiment_output. -->
The training result metrics for all models reported in the paper are listed under the experiment_output folder.
Each experiment was conducted over 10 seeds (using the -r 10 option as discussed in the Training section).
All figures reported in the paper can be generated using the Jupyter notebook figure_generator.ipynb, which references the figure configurations outlined in the figures folder.
Entanglement Style Comparison
The training results for the comparison of entanglement styles outlined in the paper are given in the table below:
Dynamics | Entanglement | Score: 20 | Score: 25 | Score: Max (value) --- | --- | --- | --- | --- MDP | $\Psi^{+}$ | 568 | 2332 | 2942 (25.67) MDP | $\Psi^{-}$ | 595 | 1987 | 2849 (25.45) MDP | $\Phi^{+}$ | 612 | 1883 | 2851 (25.51) MDP | $\Phi^{-}$ | 691 | 2378 | 2984 (25.23) MDP | $\mathtt{None}$ | 839 | 2337 | 2495 (25.12) POMDP | $\Psi^{+}$ | 1049 | 1745 | 2950 (26.28) POMDP | $\Psi^{-}$ | 1206 | 2114 | 2999 (25.95) POMDP | $\Phi^{+}$ | 1269 | - | 2992 (24.1) POMDP | $\Phi^{-}$ | 1838 | - | 2727 (22.8) POMDP | $\mathtt{None}$ | 1069 | 1955 | 2841 (26.39)
The figures that aggregate the metric performance for each of the experiments are given in the table below:
Figure | Dynamics | Metric --- | --- | --- figmaa2cmdpentanglementcompare-undiscounted_reward.pdf | MDP | Score figmaa2cmdpentanglementcompare-coins_collected.pdf | MDP | Total coins collected figmaa2cmdpentanglementcompare-owncoinrate.pdf | MDP | Own coin rate figmaa2cmdpentanglementcompare-owncoinscollected.pdf | MDP | Own coins collected figmaa2cpomdpentanglementcompare-undiscounted_reward.pdf | POMDP | Score figmaa2cpomdpentanglementcompare-coins_collected.pdf | POMDP | Total coins collected figmaa2cpomdpentanglementcompare-owncoinrate.pdf | POMDP | Own coin rate figmaa2cpomdpentanglementcompare-owncoinscollected.pdf | POMDP | Own coins collected
CoinGame experiments
The training results for the comparison of the frameworks in the $\texttt{CoinGame-2}$ environment outlined in the paper are given in the table below:
Dynamics | Framework | Score: 20 | Score: 25 | Score: Max (value) | Own coin rate: 0.95 | Own coin rate: 1.0 | Own coin rate: Max (value) --- | --- | --- | --- | --- | --- | --- | --- MDP | $\texttt{eQMARL-}\Psi^{+}$ | 568 | 2332 | 2942 (25.67) | 376 | 2136 | 2136 (1.0) MDP | $\texttt{qfCTDE}$ | 678 | - | 2378 (23.38) | 397 | - | 2832 (0.9972) MDP | $\texttt{sCTDE}$ | 1640 | 2615 | 2631 (25.3) | 1511 | - | 2637 (0.9864) MDP | $\texttt{fCTDE}$ | 1917 | - | 2925 (23.67) | 1700 | - | 2909 (0.9857) POMDP | $\texttt{eQMARL-}\Psi^{+}$ | 1049 | 1745 | 2950 (26.28) | 773 | - | 2533 (0.9997) POMDP | $\texttt{qfCTDE}$ | 1382 | 2124 | 2871 (26.09) | 1038 | 2887 | 2887 (1.0) POMDP | $\texttt{sCTDE}$ | 1738 | 2750 | 2999 (25.33) | 1588 | - | 2956 (0.9894) POMDP | $\texttt{fCTDE}$ | 1798 | 2658 | 2824 (25.49) | 1574 | - | 2963 (0.9894)
The figures that aggregate the metric performance for each of the experiments are given in the table below:
Figure | Dynamics | Metric --- | --- | --- figmaa2cmdp-undiscounted_reward.pdf | MDP | Score figmaa2cmdp-coins_collected.pdf | MDP | Total coins collected figmaa2cmdp-owncoinrate.pdf | MDP | Own coin rate figmaa2cmdp-owncoinscollected.pdf | MDP | Own coins collected figmaa2cpomdp-undiscounted_reward.pdf | POMDP | Score figmaa2cpomdp-coins_collected.pdf | POMDP | Total coins collected figmaa2cpomdp-owncoinrate.pdf | POMDP | Own coin rate figmaa2cpomdp-owncoinscollected.pdf | POMDP | Own coins collected
CartPole experiments
The training results for the comparison of the frameworks in the $\texttt{CartPole}$ environment outlined in the paper are given in the tables below:
Dynamics | Framework | Reward: Mean | Reward: Std. Dev. | Reward: 95% CI --- | --- | --- | --- | --- MDP | $\texttt{eQMARL-}\Psi^{+}$ | 79.11 | 50.62 | (77.40, 81.16) MDP | $\texttt{qfCTDE}$ | 121.35 | 110.13 | (118.29, 125.12) MDP | $\texttt{sCTDE}$ | 16.38 | 35.97 | (16.29, 16.48) MDP | $\texttt{fCTDE}$ | 15.15 | 24.17 | (15.09, 15.22) POMDP | $\texttt{eQMARL-}\Psi^{+}$ | 82.28 | 44.24 | (80.60, 83.89) POMDP | $\texttt{qfCTDE}$ | 79.03 | 44.06 | (76.80, 80.98) POMDP | $\texttt{sCTDE}$ | 40.56 | 37.36 | (38.17, 43.70) POMDP | $\texttt{fCTDE}$ | 13.93 | 29.84 | (13.62, 14.19)
Dynamics | Framework | Reward: Mean (value) | Reward: Max (value) --- | --- | --- | --- MDP | $\texttt{eQMARL-}\Psi^{+}$ | 166 (79.11) | 555 (134.16) MDP | $\texttt{qfCTDE}$ | 189 (121.35) | 810 (262.43) MDP | $\texttt{sCTDE}$ | 9 (16.38) | 931 (23.59) MDP | $\texttt{fCTDE}$ | 9 (15.15) | 38 (18.55) POMDP | $\texttt{eQMARL-}\Psi^{+}$ | 251 (82.28) | 770 (127.6) POMDP | $\texttt{qfCTDE}$ | 276 (79.03) | 648 (137.66) POMDP | $\texttt{sCTDE}$ | 680 (40.56) | 999 (167.32) POMDP | $\texttt{fCTDE}$ | 9 (13.93) | 999 (28.66)
The figures that aggregate the metric performance for each of the experiments are given in the table below:
Figure | Dynamics | Metric --- | --- | --- figcartpolemaa2cmdp-rewardmean.pdf | MDP | Average reward figcartpolemaa2cpomdp-rewardmean.pdf | POMDP | Average reward
MiniGrid experiments
The training results for the comparison of the frameworks in the $\texttt{MiniGrid}$ environment outlined in the paper are given in the tables below:
Dynamics | Framework | Reward: Mean (value) | Reward: 95% CI | Number of Trainable Critic Parameters --- | --- | --- | --- | --- POMDP | $\texttt{fCTDE}$ | -63.04 | (-65.16, -61.06) | 29,601 POMDP | $\texttt{qfCTDE}$ | -85.86 | (-87.03, -84.72) | 3,697 POMDP | $\texttt{sCTDE}$ | -88.02 | (-88.69, -87.10) | 29,801 POMDP | $\texttt{eQMARL}-\Psi^+$ | -13.32 | (-14.68, -11.91) | 3,697
The figures that aggregate the metric performance for each of the experiments are given in the table below:
Figure | Dynamics | Metric --- | --- | --- figminigrid-rewardmean.pdf | POMDP | Average reward
Ablation experiments
The training results for the ablation experiment using in the $\texttt{CoinGame-2}$ environment outlined in the paper are given in the tables below:
Dynamics | Framework | Parameters | Score: Mean | Score: Std. Dev. | Score: 95% CI | Own coin rate: Mean | Own coin rate: Std. Dev. | Own coin rate: 95% CI --- | --- | --- | --- | --- | --- | --- | --- | --- MDP | $\texttt{fCTDE-3}$ | 223 | 2.42 | 2.35 | (2.35, 2.49) | 0.6720 | 0.2024 | (0.6685, 0.6769) MDP | $\texttt{fCTDE-6}$ | 445 | 7.41 | 3.46 | (7.19, 7.65) | 0.7658 | 0.1414 | (0.7610, 0.7712) MDP | $\texttt{fCTDE-12}$ | 889 | 12.36 | 4.41 | (12.09, 12.67) | 0.8202 | 0.1379 | (0.8139, 0.8262) MDP | $\texttt{fCTDE-24}$ | 1777 | 17.63 | 2.58 | (17.25, 17.91) | 0.8823 | 0.0751 | (0.8770, 0.8875) MDP | $\texttt{sCTDE-3}$ | 229 | 3.24 | 3.09 | (3.16, 3.33) | 0.6852 | 0.1991 | (0.6821, 0.6897) MDP | $\texttt{sCTDE-6}$ | 457 | 8.54 | 3.67 | (8.29, 8.78) | 0.7857 | 0.1327 | (0.7804, 0.7924) MDP | $\texttt{sCTDE-12}$ | 913 | 14.18 | 2.69 | (13.90, 14.60) | 0.8504 | 0.0928 | (0.8454, 0.8553) MDP | $\texttt{sCTDE-24}$ | 1825 | 18.18 | 2.41 | (17.84, 18.53) | 0.8936 | 0.0673 | (0.8896, 0.8979) MDP | $\texttt{qfCTDE-L2}$ | 121 | 6.58 | 3.92 | (6.47, 6.66) | 0.8482 | 0.1921 | (0.8435, 0.8518) MDP | $\texttt{qfCTDE-L5}$ | 265 | 19.41 | 6.23 | (19.23, 19.59) | 0.9398 | 0.1020 | (0.9366, 0.9426) MDP | $\texttt{qfCTDE-L10}$ | 505 | 22.08 | 2.22 | (21.91, 22.26) | 0.9691 | 0.0247 | (0.9665, 0.9723) MDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L2}$ | 121 | 5.38 | 3.74 | (5.30, 5.46) | 0.8271 | 0.2213 | (0.8234, 0.8300) MDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L5}$ | 265 | 21.11 | 2.65 | (20.92, 21.35) | 0.9640 | 0.0347 | (0.9601, 0.9667) MDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L10}$ | 505 | 22.45 | 2.23 | (22.28, 22.62) | 0.9719 | 0.0219 | (0.9685, 0.9745) POMDP | $\texttt{fCTDE-3}$ | 169 | 2.98 | 2.47 | (2.91, 3.05) | 0.7082 | 0.1890 | (0.7039, 0.7123) POMDP | $\texttt{fCTDE-6}$ | 337 | 7.15 | 3.06 | (6.95, 7.37) | 0.7711 | 0.1388 | (0.7658, 0.7781) POMDP | $\texttt{fCTDE-12}$ | 673 | 13.46 | 3.24 | (13.09, 13.76) | 0.8443 | 0.1026 | (0.8396, 0.8506) POMDP | $\texttt{fCTDE-24}$ | 1345 | 17.38 | 2.65 | (17.06, 17.73) | 0.8889 | 0.0752 | (0.8840, 0.8945) POMDP | $\texttt{sCTDE-3}$ | 175 | 2.68 | 2.60 | (2.61, 2.74) | 0.6834 | 0.1942 | (0.6792, 0.6866) POMDP | $\texttt{sCTDE-6}$ | 349 | 6.35 | 3.53 | (6.18, 6.54) | 0.7677 | 0.1488 | (0.7633, 0.7725) POMDP | $\texttt{sCTDE-12}$ | 697 | 13.70 | 2.79 | (13.44, 13.99) | 0.8466 | 0.0985 | (0.8411, 0.8515) POMDP | $\texttt{sCTDE-24}$ | 1393 | 17.97 | 2.60 | (17.67, 18.25) | 0.8948 | 0.0723 | (0.8898, 0.9004) POMDP | $\texttt{qfCTDE-L2}$ | 745 | 12.34 | 7.56 | (12.09, 12.60) | 0.8335 | 0.2058 | (0.8277, 0.8386) POMDP | $\texttt{qfCTDE-L5}$ | 817 | 16.79 | 4.66 | (16.45, 17.04) | 0.9040 | 0.1135 | (0.8994, 0.9091) POMDP | $\texttt{qfCTDE-L10}$ | 937 | 18.14 | 4.28 | (17.83, 18.31) | 0.9476 | 0.0660 | (0.9443, 0.9508) POMDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L2}$ | 745 | 17.14 | 3.98 | (16.77, 17.47) | 0.8834 | 0.1106 | (0.8769, 0.8896) POMDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L5}$ | 817 | 18.49 | 3.91 | (18.23, 18.80) | 0.9226 | 0.0831 | (0.9172, 0.9272) POMDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L10}$ | 937 | 19.09 | 3.44 | (18.86, 19.46) | 0.9485 | 0.0603 | (0.9458, 0.9523)
Framework | Ablation Selection | Model | MDP dynamics | POMDP dynamics --- | --- | --- | --- | --- $\texttt{eQMARL}$ | $L=5$ | Actor | 136 | 412 $\texttt{eQMARL}$ | $L=5$ | Critic | 265 (132 per agent, 1 central) | 817 (408 per agent, 1 central) $\texttt{qfCTDE}$ | $L=5$ | Actor | 136 | 412 $\texttt{qfCTDE}$ | $L=5$ | Critic | 265 | 817 $\texttt{fCTDE}$ | $h=12$ | Actor | 496 | 388 $\texttt{fCTDE}$ | $h=12$ | Critic | 889 | 673 $\texttt{sCTDE}$ | $h=12$ | Actor | 496 | 388 $\texttt{sCTDE}$ | $h=12$ | Critic | 913 (444 per agent, 25 central) | 697 (336 per agent, 25 central)
The figures that aggregate the metric performance for each of the experiments are given in the table below:
Figure | Dynamics | Metric --- | --- | --- figcoingame2maa2cmdpablationeqmarlpsi+-undiscounted_reward.pdf | MDP | Score figcoingame2maa2cmdpablationeqmarlpsi+-coins_collected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationeqmarlpsi+-owncoinrate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationeqmarlpsi+-owncoinscollected.pdf | MDP | Own coins collected figcoingame2maa2cmdpablationqfctde-undiscountedreward.pdf | MDP | Score figcoingame2maa2cmdpablationqfctde-coinscollected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationqfctde-owncoin_rate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationqfctde-owncoins_collected.pdf | MDP | Own coins collected figcoingame2maa2cmdpablationfctde-undiscountedreward.pdf | MDP | Score figcoingame2maa2cmdpablationfctde-coinscollected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationfctde-owncoin_rate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationfctde-owncoins_collected.pdf | MDP | Own coins collected figcoingame2maa2cmdpablationsctde-undiscountedreward.pdf | MDP | Score figcoingame2maa2cmdpablationsctde-coinscollected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationsctde-owncoin_rate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationsctde-owncoins_collected.pdf | MDP | Own coins collected
Authors
Citation
If you use the code in this repository for your research or publication, please cite our paper published in ICLR 2025 using the following BibTeX entry (also available in CITATION.bib):
bibtex
@inproceedings{derieux2025eqmarl,
title={e{QMARL}: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels},
author={Alexander DeRieux and Walid Saad},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=cR5GTis5II},
doi={10.48550/arXiv.2405.17486}
}
Owner
- Name: NEWS@VT
- Login: news-vt
- Kind: organization
- Website: https://www.netsciwis.com/
- Repositories: 4
- Profile: https://github.com/news-vt
Network sciEnce, Wireless, and Security laboratory at Virginia Tech (NEWS@VT)
GitHub Events
Total
- Watch event: 3
- Push event: 11
- Public event: 1
- Fork event: 1
Last Year
- Watch event: 3
- Push event: 11
- Public event: 1
- Fork event: 1
Dependencies
- jupyter * development
- cirq *
- gymnasium >=0.26,<1.0
- matplotlib *
- minigrid *
- numpy *
- pandas *
- qutip *
- scipy *
- seaborn *
- sympy *
- tensorflow ==2.7.0
- tensorflow-quantum ==0.7.2
- tqdm *