eqmarl

This is the repository for the paper "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels" published in ICLR 2025

https://github.com/news-vt/eqmarl

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (1.5%) to scientific vocabulary

Keywords

machine-learning multi-agent-reinforcement-learning quantum-computing quantum-entanglement
Last synced: 9 months ago · JSON representation

Repository

This is the repository for the paper "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels" published in ICLR 2025

Basic Info
  • Host: GitHub
  • Owner: news-vt
  • License: cc-by-4.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 119 MB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
machine-learning multi-agent-reinforcement-learning quantum-computing quantum-entanglement
Created over 2 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels

ICLR 2025 arXiv OpenReview

License: CC-BY-4.0

python tensorflow

This repository is the official implementation of "eQMARL: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels", published in the Thirteenth International Conference on Learning Representations (ICLR) 2025.

See Citation section for BibTeX reference.

https://github.com/user-attachments/assets/7daf9eac-4b95-4d33-88be-d93856b48622

See eqmarl-vis repository for visualizations.

Installation

The codebase is provided as an installable Python package called eqmarl. To install the package via pip, you can run:

```bash

Navigate to eqmarl source folder.

$ cd path/to/eqmarl/

Install eqmarl package.

$ python -m pip install . ```

You can verify the package was successfully install by running:

bash $ python -c "import importlib.metadata; version=importlib.metadata.version('eqmarl'); print(version)" 1.0.0

Requirements

If instead you just want to install the requirements without the package, you can run:

bash $ python -m pip install -r requirements.txt -r requirements-dev.txt

Notes on Tensorflow Quantum installation with Anaconda

Installation of this repo can be little finicky because of the requirements for tensorflow-quantum on various systems.

If you are using Anaconda to manage Python on macOS, be aware that the version of Python may have been built using an outdated version of macOS. To check this, you can run:

bash $ python -c "from distutils import util; print(util.get_platform())" macosx-10.9-x86_64

Notice that in the above example we see the installation of Python was built against macosx-10.9-x86_64, whereas the wheel for tensorflow-quantum requires macosx-12.1-x86_64 or later.

To circumvent this, you can download the wheel for tensorflow-quantum==0.7.2 from here https://pypi.org/project/tensorflow-quantum/0.7.2/#files and change the name of the filename from tensorflow_quantum-0.7.2-cp39-cp39-macosx_12_1_x86_64.whl to tensorflow_quantum-0.7.2-cp39-cp39-macosx_10_9_x86_64.whl. Once you've done that you can install the wheel via:

```bash

Activate your environment.

$ conda activate myenv

Install wheel file manually.

$ python -m pip install tensorflowquantum-0.7.2-cp39-cp39-macosx109x86_64.whl ```

Training

To train using the frameworks in the paper, run this command:

bash $ python ./scripts/experiment_runner.py ./experiments/<experiment_name>.yml

This invokes the experiment_runner.py script, which runs experiments based on YAML configurations. Note that the option -r/--n-train-rounds can be used to train over multiple seed rounds (defaults to 1 round). The experiment configuration for each of the frameworks discussed in the paper is described as a YAML file in the experiments folder.

The full list of experiments is as follows:

Experiment YAML File | Environment | Description --- | --- | --- coingame_maa2c_mdp_eqmarl_noentanglement.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers. coingame_maa2c_mdp_eqmarl_phi+.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers. coingame_maa2c_mdp_eqmarl_phi-.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers. coingame_maa2c_mdp_eqmarl_psi+.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers. coingame_maa2c_mdp_eqmarl_psi-.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers. coingame_maa2c_mdp_fctde.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units. coingame_maa2c_mdp_qfctde.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers. coingame_maa2c_mdp_sctde.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units. coingame_maa2c_pomdp_eqmarl_noentanglement.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\texttt{None}$ entanglement and $L=5$ VQC layers. coingame_maa2c_pomdp_eqmarl_phi+.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{+}$ entanglement and $L=5$ VQC layers. coingame_maa2c_pomdp_eqmarl_phi-.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Phi^{-}$ entanglement and $L=5$ VQC layers. coingame_maa2c_pomdp_eqmarl_psi+.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers. coingame_maa2c_pomdp_eqmarl_psi-.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{-}$ entanglement and $L=5$ VQC layers. coingame_maa2c_pomdp_fctde.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units. coingame_maa2c_pomdp_qfctde.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers. coingame_maa2c_pomdp_sctde.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units. coingame_maa2c_mdp_eqmarl_psi+_L2.yml | $\texttt{CoinGame-2}$ | MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers. coingame_maa2c_mdp_eqmarl_psi+_L10.yml | $\texttt{CoinGame-2}$ | MDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers. coingame_maa2c_mdp_qfctde_L2.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers. coingame_maa2c_mdp_qfctde_L10.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers. coingame_maa2c_mdp_fctde_size3.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units. coingame_maa2c_mdp_fctde_size6.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units. coingame_maa2c_mdp_fctde_size24.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units. coingame_maa2c_mdp_sctde_size3.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units. coingame_maa2c_mdp_sctde_size6.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units. coingame_maa2c_mdp_sctde_size24.yml | $\texttt{CoinGame-2}$ | MDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units. coingame_maa2c_pomdp_eqmarl_psi+_L2.yml | $\texttt{CoinGame-2}$ | POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=2$ VQC layers. coingame_maa2c_pomdp_eqmarl_psi+_L10.yml | $\texttt{CoinGame-2}$ | POMDP experiment $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=10$ VQC layers. coingame_maa2c_pomdp_qfctde_L2.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=2$ VQC layers. coingame_maa2c_pomdp_qfctde_L10.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=10$ VQC layers. coingame_maa2c_pomdp_fctde_size3.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=3$ hidden units. coingame_maa2c_pomdp_fctde_size6.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=6$ hidden units. coingame_maa2c_pomdp_fctde_size24.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=24$ hidden units. coingame_maa2c_pomdp_sctde_size3.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=3$ hidden units. coingame_maa2c_pomdp_sctde_size6.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=6$ hidden units. coingame_maa2c_pomdp_sctde_size24.yml | $\texttt{CoinGame-2}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=24$ hidden units. cartpole_maa2c_mdp_eqmarl_psi+.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers. cartpole_maa2c_mdp_fctde.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units. cartpole_maa2c_mdp_qfctde.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers. cartpole_maa2c_mdp_sctde.yml | $\texttt{CartPole}$ | MDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units. cartpole_maa2c_pomdp_eqmarl_psi+.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{eQMARL}$ with $\Psi^{+}$ entanglement and $L=5$ VQC layers. cartpole_maa2c_pomdp_fctde.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{fCTDE}$ with $h=12$ hidden units. cartpole_maa2c_pomdp_qfctde.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{qfCTDE}$ with $L=5$ VQC layers. cartpole_maa2c_pomdp_sctde.yml | $\texttt{CartPole}$ | POMDP experiment using $\texttt{sCTDE}$ with $h=12$ hidden units.

Results

The actor-critic models trained using the frameworks described in the paper achieved the performance outlined in the sections below. Pre-trained models can be found in the supplementary materials, within a folder called pre_trained_models/, that accompanies this repository. <!-- Note that under the same folder structure as experiment_output. -->

The training result metrics for all models reported in the paper are listed under the experiment_output folder. Each experiment was conducted over 10 seeds (using the -r 10 option as discussed in the Training section). All figures reported in the paper can be generated using the Jupyter notebook figure_generator.ipynb, which references the figure configurations outlined in the figures folder.

Entanglement Style Comparison

The training results for the comparison of entanglement styles outlined in the paper are given in the table below:

Dynamics | Entanglement | Score: 20 | Score: 25 | Score: Max (value) --- | --- | --- | --- | --- MDP | $\Psi^{+}$ | 568 | 2332 | 2942 (25.67) MDP | $\Psi^{-}$ | 595 | 1987 | 2849 (25.45) MDP | $\Phi^{+}$ | 612 | 1883 | 2851 (25.51) MDP | $\Phi^{-}$ | 691 | 2378 | 2984 (25.23) MDP | $\mathtt{None}$ | 839 | 2337 | 2495 (25.12) POMDP | $\Psi^{+}$ | 1049 | 1745 | 2950 (26.28) POMDP | $\Psi^{-}$ | 1206 | 2114 | 2999 (25.95) POMDP | $\Phi^{+}$ | 1269 | - | 2992 (24.1) POMDP | $\Phi^{-}$ | 1838 | - | 2727 (22.8) POMDP | $\mathtt{None}$ | 1069 | 1955 | 2841 (26.39)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure | Dynamics | Metric --- | --- | --- figmaa2cmdpentanglementcompare-undiscounted_reward.pdf | MDP | Score figmaa2cmdpentanglementcompare-coins_collected.pdf | MDP | Total coins collected figmaa2cmdpentanglementcompare-owncoinrate.pdf | MDP | Own coin rate figmaa2cmdpentanglementcompare-owncoinscollected.pdf | MDP | Own coins collected figmaa2cpomdpentanglementcompare-undiscounted_reward.pdf | POMDP | Score figmaa2cpomdpentanglementcompare-coins_collected.pdf | POMDP | Total coins collected figmaa2cpomdpentanglementcompare-owncoinrate.pdf | POMDP | Own coin rate figmaa2cpomdpentanglementcompare-owncoinscollected.pdf | POMDP | Own coins collected

CoinGame experiments

The training results for the comparison of the frameworks in the $\texttt{CoinGame-2}$ environment outlined in the paper are given in the table below:

Dynamics | Framework | Score: 20 | Score: 25 | Score: Max (value) | Own coin rate: 0.95 | Own coin rate: 1.0 | Own coin rate: Max (value) --- | --- | --- | --- | --- | --- | --- | --- MDP | $\texttt{eQMARL-}\Psi^{+}$ | 568 | 2332 | 2942 (25.67) | 376 | 2136 | 2136 (1.0) MDP | $\texttt{qfCTDE}$ | 678 | - | 2378 (23.38) | 397 | - | 2832 (0.9972) MDP | $\texttt{sCTDE}$ | 1640 | 2615 | 2631 (25.3) | 1511 | - | 2637 (0.9864) MDP | $\texttt{fCTDE}$ | 1917 | - | 2925 (23.67) | 1700 | - | 2909 (0.9857) POMDP | $\texttt{eQMARL-}\Psi^{+}$ | 1049 | 1745 | 2950 (26.28) | 773 | - | 2533 (0.9997) POMDP | $\texttt{qfCTDE}$ | 1382 | 2124 | 2871 (26.09) | 1038 | 2887 | 2887 (1.0) POMDP | $\texttt{sCTDE}$ | 1738 | 2750 | 2999 (25.33) | 1588 | - | 2956 (0.9894) POMDP | $\texttt{fCTDE}$ | 1798 | 2658 | 2824 (25.49) | 1574 | - | 2963 (0.9894)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure | Dynamics | Metric --- | --- | --- figmaa2cmdp-undiscounted_reward.pdf | MDP | Score figmaa2cmdp-coins_collected.pdf | MDP | Total coins collected figmaa2cmdp-owncoinrate.pdf | MDP | Own coin rate figmaa2cmdp-owncoinscollected.pdf | MDP | Own coins collected figmaa2cpomdp-undiscounted_reward.pdf | POMDP | Score figmaa2cpomdp-coins_collected.pdf | POMDP | Total coins collected figmaa2cpomdp-owncoinrate.pdf | POMDP | Own coin rate figmaa2cpomdp-owncoinscollected.pdf | POMDP | Own coins collected

CartPole experiments

The training results for the comparison of the frameworks in the $\texttt{CartPole}$ environment outlined in the paper are given in the tables below:

Dynamics | Framework | Reward: Mean | Reward: Std. Dev. | Reward: 95% CI --- | --- | --- | --- | --- MDP | $\texttt{eQMARL-}\Psi^{+}$ | 79.11 | 50.62 | (77.40, 81.16) MDP | $\texttt{qfCTDE}$ | 121.35 | 110.13 | (118.29, 125.12) MDP | $\texttt{sCTDE}$ | 16.38 | 35.97 | (16.29, 16.48) MDP | $\texttt{fCTDE}$ | 15.15 | 24.17 | (15.09, 15.22) POMDP | $\texttt{eQMARL-}\Psi^{+}$ | 82.28 | 44.24 | (80.60, 83.89) POMDP | $\texttt{qfCTDE}$ | 79.03 | 44.06 | (76.80, 80.98) POMDP | $\texttt{sCTDE}$ | 40.56 | 37.36 | (38.17, 43.70) POMDP | $\texttt{fCTDE}$ | 13.93 | 29.84 | (13.62, 14.19)

Dynamics | Framework | Reward: Mean (value) | Reward: Max (value) --- | --- | --- | --- MDP | $\texttt{eQMARL-}\Psi^{+}$ | 166 (79.11) | 555 (134.16) MDP | $\texttt{qfCTDE}$ | 189 (121.35) | 810 (262.43) MDP | $\texttt{sCTDE}$ | 9 (16.38) | 931 (23.59) MDP | $\texttt{fCTDE}$ | 9 (15.15) | 38 (18.55) POMDP | $\texttt{eQMARL-}\Psi^{+}$ | 251 (82.28) | 770 (127.6) POMDP | $\texttt{qfCTDE}$ | 276 (79.03) | 648 (137.66) POMDP | $\texttt{sCTDE}$ | 680 (40.56) | 999 (167.32) POMDP | $\texttt{fCTDE}$ | 9 (13.93) | 999 (28.66)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure | Dynamics | Metric --- | --- | --- figcartpolemaa2cmdp-rewardmean.pdf | MDP | Average reward figcartpolemaa2cpomdp-rewardmean.pdf | POMDP | Average reward

MiniGrid experiments

The training results for the comparison of the frameworks in the $\texttt{MiniGrid}$ environment outlined in the paper are given in the tables below:

Dynamics | Framework | Reward: Mean (value) | Reward: 95% CI | Number of Trainable Critic Parameters --- | --- | --- | --- | --- POMDP | $\texttt{fCTDE}$ | -63.04 | (-65.16, -61.06) | 29,601 POMDP | $\texttt{qfCTDE}$ | -85.86 | (-87.03, -84.72) | 3,697 POMDP | $\texttt{sCTDE}$ | -88.02 | (-88.69, -87.10) | 29,801 POMDP | $\texttt{eQMARL}-\Psi^+$ | -13.32 | (-14.68, -11.91) | 3,697

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure | Dynamics | Metric --- | --- | --- figminigrid-rewardmean.pdf | POMDP | Average reward

Ablation experiments

The training results for the ablation experiment using in the $\texttt{CoinGame-2}$ environment outlined in the paper are given in the tables below:

Dynamics | Framework | Parameters | Score: Mean | Score: Std. Dev. | Score: 95% CI | Own coin rate: Mean | Own coin rate: Std. Dev. | Own coin rate: 95% CI --- | --- | --- | --- | --- | --- | --- | --- | --- MDP | $\texttt{fCTDE-3}$ | 223 | 2.42 | 2.35 | (2.35, 2.49) | 0.6720 | 0.2024 | (0.6685, 0.6769) MDP | $\texttt{fCTDE-6}$ | 445 | 7.41 | 3.46 | (7.19, 7.65) | 0.7658 | 0.1414 | (0.7610, 0.7712) MDP | $\texttt{fCTDE-12}$ | 889 | 12.36 | 4.41 | (12.09, 12.67) | 0.8202 | 0.1379 | (0.8139, 0.8262) MDP | $\texttt{fCTDE-24}$ | 1777 | 17.63 | 2.58 | (17.25, 17.91) | 0.8823 | 0.0751 | (0.8770, 0.8875) MDP | $\texttt{sCTDE-3}$ | 229 | 3.24 | 3.09 | (3.16, 3.33) | 0.6852 | 0.1991 | (0.6821, 0.6897) MDP | $\texttt{sCTDE-6}$ | 457 | 8.54 | 3.67 | (8.29, 8.78) | 0.7857 | 0.1327 | (0.7804, 0.7924) MDP | $\texttt{sCTDE-12}$ | 913 | 14.18 | 2.69 | (13.90, 14.60) | 0.8504 | 0.0928 | (0.8454, 0.8553) MDP | $\texttt{sCTDE-24}$ | 1825 | 18.18 | 2.41 | (17.84, 18.53) | 0.8936 | 0.0673 | (0.8896, 0.8979) MDP | $\texttt{qfCTDE-L2}$ | 121 | 6.58 | 3.92 | (6.47, 6.66) | 0.8482 | 0.1921 | (0.8435, 0.8518) MDP | $\texttt{qfCTDE-L5}$ | 265 | 19.41 | 6.23 | (19.23, 19.59) | 0.9398 | 0.1020 | (0.9366, 0.9426) MDP | $\texttt{qfCTDE-L10}$ | 505 | 22.08 | 2.22 | (21.91, 22.26) | 0.9691 | 0.0247 | (0.9665, 0.9723) MDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L2}$ | 121 | 5.38 | 3.74 | (5.30, 5.46) | 0.8271 | 0.2213 | (0.8234, 0.8300) MDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L5}$ | 265 | 21.11 | 2.65 | (20.92, 21.35) | 0.9640 | 0.0347 | (0.9601, 0.9667) MDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L10}$ | 505 | 22.45 | 2.23 | (22.28, 22.62) | 0.9719 | 0.0219 | (0.9685, 0.9745) POMDP | $\texttt{fCTDE-3}$ | 169 | 2.98 | 2.47 | (2.91, 3.05) | 0.7082 | 0.1890 | (0.7039, 0.7123) POMDP | $\texttt{fCTDE-6}$ | 337 | 7.15 | 3.06 | (6.95, 7.37) | 0.7711 | 0.1388 | (0.7658, 0.7781) POMDP | $\texttt{fCTDE-12}$ | 673 | 13.46 | 3.24 | (13.09, 13.76) | 0.8443 | 0.1026 | (0.8396, 0.8506) POMDP | $\texttt{fCTDE-24}$ | 1345 | 17.38 | 2.65 | (17.06, 17.73) | 0.8889 | 0.0752 | (0.8840, 0.8945) POMDP | $\texttt{sCTDE-3}$ | 175 | 2.68 | 2.60 | (2.61, 2.74) | 0.6834 | 0.1942 | (0.6792, 0.6866) POMDP | $\texttt{sCTDE-6}$ | 349 | 6.35 | 3.53 | (6.18, 6.54) | 0.7677 | 0.1488 | (0.7633, 0.7725) POMDP | $\texttt{sCTDE-12}$ | 697 | 13.70 | 2.79 | (13.44, 13.99) | 0.8466 | 0.0985 | (0.8411, 0.8515) POMDP | $\texttt{sCTDE-24}$ | 1393 | 17.97 | 2.60 | (17.67, 18.25) | 0.8948 | 0.0723 | (0.8898, 0.9004) POMDP | $\texttt{qfCTDE-L2}$ | 745 | 12.34 | 7.56 | (12.09, 12.60) | 0.8335 | 0.2058 | (0.8277, 0.8386) POMDP | $\texttt{qfCTDE-L5}$ | 817 | 16.79 | 4.66 | (16.45, 17.04) | 0.9040 | 0.1135 | (0.8994, 0.9091) POMDP | $\texttt{qfCTDE-L10}$ | 937 | 18.14 | 4.28 | (17.83, 18.31) | 0.9476 | 0.0660 | (0.9443, 0.9508) POMDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L2}$ | 745 | 17.14 | 3.98 | (16.77, 17.47) | 0.8834 | 0.1106 | (0.8769, 0.8896) POMDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L5}$ | 817 | 18.49 | 3.91 | (18.23, 18.80) | 0.9226 | 0.0831 | (0.9172, 0.9272) POMDP | $\texttt{eQMARL-}\Psi^{+}\texttt{-L10}$ | 937 | 19.09 | 3.44 | (18.86, 19.46) | 0.9485 | 0.0603 | (0.9458, 0.9523)

Framework | Ablation Selection | Model | MDP dynamics | POMDP dynamics --- | --- | --- | --- | --- $\texttt{eQMARL}$ | $L=5$ | Actor | 136 | 412 $\texttt{eQMARL}$ | $L=5$ | Critic | 265 (132 per agent, 1 central) | 817 (408 per agent, 1 central) $\texttt{qfCTDE}$ | $L=5$ | Actor | 136 | 412 $\texttt{qfCTDE}$ | $L=5$ | Critic | 265 | 817 $\texttt{fCTDE}$ | $h=12$ | Actor | 496 | 388 $\texttt{fCTDE}$ | $h=12$ | Critic | 889 | 673 $\texttt{sCTDE}$ | $h=12$ | Actor | 496 | 388 $\texttt{sCTDE}$ | $h=12$ | Critic | 913 (444 per agent, 25 central) | 697 (336 per agent, 25 central)

The figures that aggregate the metric performance for each of the experiments are given in the table below:

Figure | Dynamics | Metric --- | --- | --- figcoingame2maa2cmdpablationeqmarlpsi+-undiscounted_reward.pdf | MDP | Score figcoingame2maa2cmdpablationeqmarlpsi+-coins_collected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationeqmarlpsi+-owncoinrate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationeqmarlpsi+-owncoinscollected.pdf | MDP | Own coins collected figcoingame2maa2cmdpablationqfctde-undiscountedreward.pdf | MDP | Score figcoingame2maa2cmdpablationqfctde-coinscollected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationqfctde-owncoin_rate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationqfctde-owncoins_collected.pdf | MDP | Own coins collected figcoingame2maa2cmdpablationfctde-undiscountedreward.pdf | MDP | Score figcoingame2maa2cmdpablationfctde-coinscollected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationfctde-owncoin_rate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationfctde-owncoins_collected.pdf | MDP | Own coins collected figcoingame2maa2cmdpablationsctde-undiscountedreward.pdf | MDP | Score figcoingame2maa2cmdpablationsctde-coinscollected.pdf | MDP | Total coins collected figcoingame2maa2cmdpablationsctde-owncoin_rate.pdf | MDP | Own coin rate figcoingame2maa2cmdpablationsctde-owncoins_collected.pdf | MDP | Own coins collected

Authors

Citation

If you use the code in this repository for your research or publication, please cite our paper published in ICLR 2025 using the following BibTeX entry (also available in CITATION.bib):

bibtex @inproceedings{derieux2025eqmarl, title={e{QMARL}: Entangled Quantum Multi-Agent Reinforcement Learning for Distributed Cooperation over Quantum Channels}, author={Alexander DeRieux and Walid Saad}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, url={https://openreview.net/forum?id=cR5GTis5II}, doi={10.48550/arXiv.2405.17486} }

Owner

  • Name: NEWS@VT
  • Login: news-vt
  • Kind: organization

Network sciEnce, Wireless, and Security laboratory at Virginia Tech (NEWS@VT)

GitHub Events

Total
  • Watch event: 3
  • Push event: 11
  • Public event: 1
  • Fork event: 1
Last Year
  • Watch event: 3
  • Push event: 11
  • Public event: 1
  • Fork event: 1

Dependencies

pyproject.toml pypi
requirements-dev.txt pypi
  • jupyter * development
requirements.txt pypi
  • cirq *
  • gymnasium >=0.26,<1.0
  • matplotlib *
  • minigrid *
  • numpy *
  • pandas *
  • qutip *
  • scipy *
  • seaborn *
  • sympy *
  • tensorflow ==2.7.0
  • tensorflow-quantum ==0.7.2
  • tqdm *