augmentednet
A Roman Numeral Analysis Network with Synthetic Training Examples and Additional Tonal Tasks
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
A Roman Numeral Analysis Network with Synthetic Training Examples and Additional Tonal Tasks
Basic Info
Statistics
- Stars: 38
- Watchers: 2
- Forks: 8
- Open Issues: 17
- Releases: 25
Metadata Files
README.md
AugmentedNet
AugmentedNet is an automatic Roman numeral analysis neural network.
The network was developed by Nstor Npoles Lpez as part of his PhD research. It was first mentioned in a co-authored ISMIR paper in 2021, and later on in the body of the dissertation.
It has been used to power the analysis features in at least the following projects: - Sibelius - Vimu.app - MusicLang
The version documented in the PhD dissertation matches the v1.9.1 release of this repository.
The older version of the model described in the ISMIR paper matches the v1.0.0 release of this repository.
In general, the results of v1.9.1 are better and it is encouraged to use (and compare against) that version.
PhD Dissertation
Npoles Lpez, Nstor. 2022. Automatic Roman Numeral Analysis in Symbolic Music Representations. PhD Thesis, McGill University. https://escholarship.mcgill.ca/concern/theses/qr46r6307.
bibtex
@phdthesis{napoleslopez22automatic,
type = {{PhD} {Thesis}},
title = {Automatic {Roman} {Numeral} {Analysis} in {Symbolic} {Music} {Representations}},
url = {https://escholarship.mcgill.ca/concern/theses/qr46r6307},
school = {McGill University},
author = {Npoles Lpez, Nstor},
month = dec,
year = {2022}
}
ISMIR Paper
N. Npoles Lpez, M. Gotham, and I. Fujinaga, AugmentedNet: A Roman Numeral Analysis Network with Synthetic Training Examples and Additional Tonal Tasks. in Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021, pp. 404411. https://doi.org/10.5281/zenodo.5624533
bibtex
@inproceedings{napoleslopez21augmentednet,
author = {Npoles Lpez, Nstor and Gotham, Mark and Fujinaga, Ichiro},
title = {{AugmentedNet: A Roman Numeral Analysis Network
with Synthetic Training Examples and Additional
Tonal Tasks}},
booktitle = {{Proceedings of the 22nd International Society for
Music Information Retrieval Conference}},
year = 2021,
pages = {404-411},
publisher = {ISMIR},
address = {Online},
month = nov,
venue = {Online},
doi = {10.5281/zenodo.5624533},
url = {https://doi.org/10.5281/zenodo.5624533}
}
Try out the pre-trained network
Clone, create a virtual environment, and get the python dependencies.
```bash git clone https://github.com/napulen/AugmentedNet.git cd AugmentedNet python3 -m venv .env source .env/bin/activate
(.env) pip install -r requirements.txt ```
I have experienced that
pipis sometimes incapable of installing specific package versions depending on your environment. Thisrequirements.txtwas tested on a vanillaUbuntu 20.04, both in native linux and Windows 10 WSL2. A dockertensorflow/tensorflow:2.5-gpuimage should also work.
Run the pre-trained model for inference on a MusicXML file
bash
python -m AugmentedNet.inference AugmentedNetv.hdf5 <input_file>.musicxml
Two files will be generated:
<input_file>_annotated.xml<input_file>_annotated.csv
An annotated MusicXML file and the csv file with the predictions of every time step.
Training the network from scratch
Clone recursively (needed to collect the third-party datasets), create a virtual environment, and get the python dependencies
```bash git clone --recursive https://github.com/napulen/AugmentedNet.git cd AugmentedNet python3 -m venv .env source .env/bin/activate
(.env) pip install -r requirements.txt ```
Using accompanying data
To save you some time, we include the preprocessed tsv files of the real data, as well as the synthetic block-chord templates for texturization. These are available in the release of the latest version.
bash
wget https://github.com/napulen/AugmentedNet/releases/latest/download/dataset.zip
unzip dataset.zip
Now you are ready to train the network.
Generating synthetic examples
There are two ways to generate texturizations: at the tsv-level (legacy) and at the numpy-level (newer).
Texturizations at the tsv-level (legacy)
You can generate one texturization per file in the dataset with this script
python
(.env) python -m AugmentedNet.dataset_tsv_generator --synthesize --texturize
Originally, this is how I trained v1.0.0. The network was trained with exactly twice the amount of real training data.
Texturizations at the npz-level (newer)
After v1.5.0, the tsv-dataset only includes the templates (i.e., block chords). The texturization is done when encoding the numpy arrays for the neural network. Right before training.
There are two options for texturization in this way 1. Generate one texturization per file 2. Generate one texturization per transposition
In the second approach, every time you transpose a synthetic example to a different key, you re-texturize it. The amount of "new" training examples seen by the network is much larger this way.
You can control those settings when training the network.
```bash
No synthetic examples
(.env) python -m AugmentedNet.train
Synthetic examples, one texturization per file
(.env) python -m AugmentedNet.train
Synthetic examples, one texturization per transposition
(.env) python -m AugmentedNet.train
See next section for the <compulsory_args>.
At the moment, the code for generating the texturizations is not extremely simple, if you only wanted to do that. However, raise an issue, reach out, and I'll make my best effort to help you on your use case.
Training the network
If you want to train the network, the minimum call looks like this.
bash
(.env) python -m AugmentedNet.train debug testexperiment
The code is integrated with mlflow. In the training script, debug and testexperiment refer to the experiment and run names passed down to mlflow. You can access more CLI parameters by running python -m AugmentedNet.train --help.
After training the network, you will get a path to the trained hdf5 model, which looks something like this:
The trained model is available in: .model_checkpoint/debug/testexperiment-220101T000000/81-6.000-0.78.hdf5
You can use that trained model for inference, using the same workflow shown above.
About the AugmentedNet
The neural network architecture
The architecture is a CRNN (Convolutional Recurrent Neural Network) with an alternative representation of pitch spelling at the input.
More information about the neural network architecture can be found in the paper.

Organization of the repo
This repository is organized in the following way
- AugmentedNet has all the source code of the network
- img the image diagrams of the network and code organization
- misc useful, but non-essential, stand-alone scripts that I wrote while developing this project
- notebooks Jupyter notebook playgrounds used throughout the project (e.g., data exploration)
- test unit tests for all relevant modules of the network
The AugmentedNet source code
The general organization of the code is summarized by the following diagram.

Each of the blue rectangles roughly corresponds to a Python module.
The inputs of the network are pairs of (MusicXML, RomanText) files.
The inputs pairs are converted into pandas DataFrame objects, stored as .tsv files.
Later on, these are encoded in a representation that can be dispatched to the neural network.
The module documentation is located here.
Experiments
Visualizing the results with mlflow
All the experiments presented in the paper were monitored using mlflow.
If you want to visualize the experiments with the mlflow ui:
pip install mlflow- Download our mlruns with the AugmentedNet experiments
- Unzip anywhere
- Run
mlflow uifrom the terminal; make sure that./mlruns/is reachable from the current directory - Visit
localhost:5000 - That's it! The experiments should be available in the browser
For extra convenience, I also uploaded the logs to TensorBoard.dev.
Here are the tables of the paper and a link to see the runs of each model in Tensorboard.dev.
Paper results and tensorboard visualizations
AugmentedNet configurations
These are the results for the four different configurations of the AugmentedNet.
| Model | Key | Deg. | Qual. | Inv. | Root | RN |
|----------------------------|---------------|---------------|---------------|---------------|---------------|---------------|
| AugmentedNet6 | 82.7 | 64.4 | 76.6 | 77.4 | 82.5 | 43.3 |
| AugmentedNet6+ | 83.0 | 65.1 | 77.5 | 78.6 | 83.0 | 44.6 |
| AugmentedNet11 | 81.3 | 64.2 | 77.2 | 76.1 | 82.9 | 43.1 |
| AugmentedNet11+ | 83.7 | 66.0 | 77.6 | 77.2 | 83.2 | 45.0 |
Visualize experiments in TensorBoard.dev!
6 and 11 indicate the number of tasks in the multitask learning layout.
+ indicates the use of synthetic training data.
AugmentedNet vs. other models
These are the results for the best AugmentedNet configuration (11+) against other models.
| Test set | Training set | Model | Key | Degree | Quality | Inversion | Root | ComRN | RNconv | RNalt | |-------------------------|--------------|--------------|--------------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|--------------------------------|-----------------------| | Full test set | Full dataset | AugN | 82.9 | 67.0 | 79.7 | 78.8 | 83.0 | 65.6 | 46.4 | 51.5 | | WiR | Full dataset | AugN | 81.8 | 69.2 | 85.9 | 90.3 | 90.3 | 70.2 | 56.4 | 62.4 | | HaydnSun | Full dataset | AugN | 81.2 | 62.9 | 80.2 | 82.7 | 86.5 | 60.4 | 48.6 | 52.1 | | ABC | Full dataset | AugN | 83.6 | 65.6 | 78.0 | 76.9 | 78.9 | 62.6 | 44.5 | 48.4 | | TAVERN | Full dataset | AugN | 88.7 | 60.0 | 77.4 | 78.8 | 81.5 | 66.3 | 42.6 | 52.9 | | WTC | Full dataset | AugN | 77.2 | 69.7 | 75.0 | 74.4 | 82.7 | 61.7 | 46.2 | 47.9 | | WTCcrossval | BPS+WTC | AugN | 85.1(4.0) | 62.9(5.5) | 69.1(1.9) | 70.1(3.7) | 79.2(1.8) | 59.9(3.4) | 42.9(4.2) | 46.9(4.7) | | WTCcrossval | BPS+WTC | CS21 | 56.3(2.5) | - | - | - | - | - | 26.0(1.7) | - | | BPS | Full dataset | AugN | 85.0 | 73.4 | 79.0 | 73.4 | 84.4 | 68.3 | 45.4 | 49.3 | | BPS | All data | Mi20 | 82.9 | 68.3 | 76.6 | 72.0 | - | - | 42.8 | - | | BPS | BPS+WTC | AugN | 82.9 | 70.9 | 80.7 | 72.0 | 85.3 | 67.6 | 44.1 | 47.5 | | BPS | BPS+WTC | CS21 | 79.0 | - | - | - | - | - | 41.7 | - | | BPS | BPS | AugN | 83.0 | 71.2 | 80.3 | 71.1 | 84.1 | 68.5 | 44.0 | 47.4 | | BPS | BPS | Mi20 | 80.6 | 66.5 | 76.3 | 68.1 | - | - | 39.1 | - | | BPS | BPS | CS19 | 78.4 | 65.1 | 74.6 | 62.1 | - | - | - | - | | BPS | BPS | CS18 | 66.7 | 51.8 | 60.6 | 59.1 | - | - | 25.7 | - |
Owner
- Name: Néstor Nápoles López
- Login: napulen
- Kind: user
- Location: Montreal, Québec
- Company: Avid Technology (Sibelius)
- Website: https://napulen.github.io/
- Twitter: napulen
- Repositories: 8
- Profile: https://github.com/napulen
PhD in Music Technology by McGill University. Senior Software Developer at Avid Technology.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Nápoles López" given-names: "Néstor" orcid: "https://orcid.org/0000-0001-7347-2613" - family-names: "Gotham" given-names: "Mark" orcid: "https://orcid.org/0000-0003-0722-3074" - family-names: "Fujinaga" given-names: "Ichiro" orcid: "https://orcid.org/0000-0003-2524-8582" title: "AugmentedNet (source code)" version: 1.0.0 doi: "" date-released: 2021-08-05 url: "https://github.com/napulen/AugmentedNet"
GitHub Events
Total
- Watch event: 5
- Fork event: 1
Last Year
- Watch event: 5
- Fork event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 54
- Total pull requests: 46
- Average time to close issues: 20 days
- Average time to close pull requests: about 1 hour
- Total issue authors: 5
- Total pull request authors: 4
- Average comments per issue: 0.81
- Average comments per pull request: 0.33
- Merged pull requests: 43
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- napulen (49)
- adityac95 (2)
- MarkGotham (1)
- luto65 (1)
- clariguy (1)
Pull Request Authors
- napulen (43)
- clariguy (1)
- giamic (1)
- Alicelavander (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- codecov/codecov-action v1 composite
- JamesIves/github-pages-deploy-action 4.1.4 composite
- actions/checkout v2.3.1 composite
- actions/setup-python v2 composite
- Flask ==2.0.3
- GitPython ==3.1.27
- Jinja2 ==3.0.3
- Keras-Preprocessing ==1.1.2
- Mako ==1.1.6
- Markdown ==3.3.6
- MarkupSafe ==2.1.0
- Pillow ==9.0.1
- PyYAML ==6.0
- Pygments ==2.11.2
- SQLAlchemy ==1.4.32
- Werkzeug ==2.0.3
- absl-py ==0.15.0
- alembic ==1.7.6
- appdirs ==1.4.4
- asttokens ==2.0.5
- astunparse ==1.6.3
- backcall ==0.2.0
- black ==20.8b1
- cachetools ==5.0.0
- certifi ==2021.10.8
- chardet ==4.0.0
- charset-normalizer ==2.0.12
- click ==8.0.4
- cloudpickle ==2.0.0
- coverage ==6.3.1
- cycler ==0.11.0
- databricks-cli ==0.16.4
- debugpy ==1.5.1
- decorator ==5.1.1
- docker ==5.0.3
- entrypoints ==0.4
- executing ==0.8.3
- flatbuffers ==1.12
- fonttools ==4.30.0
- gast ==0.4.0
- gitdb ==4.0.9
- google-auth ==2.6.0
- google-auth-oauthlib ==0.4.6
- google-pasta ==0.2.0
- greenlet ==1.1.2
- grpcio ==1.34.1
- gunicorn ==20.1.0
- h5py ==3.1.0
- idna ==3.3
- importlib-metadata ==4.11.2
- importlib-resources ==5.4.0
- ipykernel ==6.9.2
- ipython ==8.1.1
- itsdangerous ==2.1.0
- jedi ==0.18.1
- joblib ==1.1.0
- jsonpickle ==2.1.0
- jupyter-client ==7.1.2
- jupyter-core ==4.9.2
- keras-nightly ==2.5.0.dev2021032900
- kiwisolver ==1.4.0
- matplotlib ==3.5.1
- matplotlib-inline ==0.1.3
- mido ==1.2.10
- mlflow ==1.23.1
- more-itertools ==8.12.0
- music21 ==6.7.1
- mypy-extensions ==0.4.3
- nest-asyncio ==1.5.4
- numpy ==1.19.5
- oauthlib ==3.2.0
- opt-einsum ==3.3.0
- packaging ==21.3
- pandas ==1.4.1
- parso ==0.8.3
- pathspec ==0.9.0
- pdoc ==10.0.1
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prometheus-client ==0.13.1
- prometheus-flask-exporter ==0.18.7
- prompt-toolkit ==3.0.28
- protobuf ==3.19.4
- psutil ==5.9.0
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pyparsing ==3.0.7
- python-dateutil ==2.8.2
- pytz ==2021.3
- pyzmq ==22.3.0
- querystring-parser ==1.2.4
- regex ==2022.3.2
- requests ==2.27.1
- requests-oauthlib ==1.3.1
- rsa ==4.8
- scipy ==1.8.0
- seaborn ==0.11.2
- six ==1.15.0
- smmap ==5.0.0
- sqlparse ==0.4.2
- stack-data ==0.2.0
- tabulate ==0.8.9
- tensorboard ==2.8.0
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- tensorflow ==2.5.0
- tensorflow-estimator ==2.5.0
- termcolor ==1.1.0
- toml ==0.10.2
- tornado ==6.1
- traitlets ==5.1.1
- typed-ast ==1.5.2
- typing-extensions ==3.7.4.3
- urllib3 ==1.26.8
- wcwidth ==0.2.5
- webcolors ==1.11.1
- websocket-client ==1.3.1
- wrapt ==1.12.1
- zipp ==3.7.0