calamr
CALAMR: Component ALignment for Abstract Meaning Representation
Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization uic-nlp-lab has institutional domain (nlp.lab.uic.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Keywords
Repository
CALAMR: Component ALignment for Abstract Meaning Representation
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
CALAMR: Component ALignment for Abstract Meaning Representation
This repository contains code and data the paper CALAMR: Component ALignment for Abstract Meaning Representation. This code is used to align the components of a bipartite source and summary AMR graph. The results are useful as a semantic graph similarity score (like SMATCH) or to find the summarized portion (as AMR nodes, edges and subgraphs) of a document or the portion of the source that represents the summary.
Inclusion in Your Projects
The purpose of this repository is to reproduce the results in the paper. If want to align AMR graphs for your own work, please refer to the zensols.calamr repository, which has reusable code and examples. If you use this library or the PropBank API or PropBank curated database, please cite our paper.
Documentation
The recommended reading order for this project:
- The conference slides
- The abstract and introduction of the paper CALAMR: Component ALignment for Abstract Meaning Representation
- Overview and implementation guide
- Full documentation
- API reference
Reproducing the Results
To reproduce the results from the paper, first process the corpus. These next steps create the document summarization and parser metrics.
Preprocessing the corpus with the following steps:
- Install a Python 3.10.8 virtual environment on Linux. Note this version of the code assumes Linux, but new version does not.
- Clone this repository:
git clone https://github.com/uic-nlp-lab/calamr - Enter the repository and create release directory that corpora to be
installed:
cd calamr && mkdir download - Download the AMR Release 3.0:
cp .../path/to/download/amr_annotation_3.0_LDC2020T02.tgz download - For reproducing the results that compare with earlier work on the AMR
Release 1.0 corpus, place that corpus file in
downloaddirectory as well. - Install the environment:
./bin/install.sh <path to Python home directory>. If you use conda, create a new conda 3.10.8 environment and set it to the Python home directory it creates (not including thebin/python3directory) - Check the previous step to make sure it successfully creates new Python
environment in directory
pyenv. Also make sure it clones theamr_corefrepository, and applies the patch successfully. - Create the sentence type/align merged corpus file:
./bin/prep.sh mergeanons - Create the mismatch corpus (please contact the authors for the original
corpus file used in the experiments as the random seed was not set):
./bin/prep.sh mismatchcorp - Create the parser output of the corpora:
./bin/prep.sh parsecorp - Create the JAMR output for the corpora. This is a manual process, which includes downloading and installing the JAMR parser. We created this file manually, but will provide it for requests that include proof of purchase of the AMR Release 3.0 corpus.
- Score documents and pairs (document table):
./bin/prep.sh score - Align documents:
./bin/prep.sh align - Output alignment statistics:
./bin/prep.sh alignstats
To recreate the example diagrams from the paper
The micro corpus are short examples for illustrating the alignment algorithm via component diagram. You can add your own sentences to the AMR parser input and rerun the micro corpus create an align steps below.
Follow the steps to creating the virtual environment in result
reproduction section, and then:
1. Create the AMR micro corpus: ./bin/micro.sh createcorp
1. Align the micro corpus graphs: ./bin/micro.sh align
Attribution
This project, or reference model code, uses:
- Python 3.10
- amrlib for AMR parsing.
- amr_coref for AMR co-reference
- zensols.amr for AMR features and summarization data structures.
- Sentence-BERT embeddings
- zensols.propbankdb and zensols.deepnlp for PropBank embeddings
- zensols.nlparse for natural language features and NLP scoring
- Smatch and WLK for scoring.
Citation
If you use this project in your research please use the following BibTeX entry:
bibtex
@inproceedings{landes-di-eugenio-2024-calamr-component,
title = "{CALAMR}: Component {AL}ignment for {A}bstract {M}eaning {R}epresentation",
author = "Landes, Paul and
Di Eugenio, Barbara",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italy",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.236",
pages = "2622--2637"
}
License
Copyright (c) 2023 - 2024 Paul Landes
Owner
- Name: Natural Language Processing Lab @UIC
- Login: uic-nlp-lab
- Kind: organization
- Location: United States of America
- Website: https://nlp.lab.uic.edu/
- Repositories: 7
- Profile: https://github.com/uic-nlp-lab
851 S Morgan St. Chicago, IL 60607
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'CALAMR: Component ALignment for Abstract Meaning Representation'
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
date-released: 2024-05-19
repository-code: https://github.com/uic-nlp-lab/calamr
authors:
- given-names: Paul
family-names: Landes
email: landes@mailc.net
affiliation: University of Illinois at Chicago
orcid: 'https://orcid.org/0000-0003-0985-0864'
preferred-citation:
type: conference-paper
authors:
- given-names: Paul
family-names: Landes
email: landes@mailc.net
affiliation: University of Illinois at Chicago
orcid: 'https://orcid.org/0000-0003-0985-0864'
- given-names: Barbara
family-names: Di Eugenio
affiliation: University of Illinois at Chicago
title: 'CALAMR: Component ALignment for Abstract Meaning Representation'
url: https://aclanthology.org/2024.lrec-main.236/
year: 2024
conference:
name: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
city: Torino
country: IT
date-start: 2024-05-20
date-end: 2024-05-25
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v2.4.0 composite
- actions/setup-python v2 composite
- Flask ==2.2.5
- Jinja2 ==3.1.2
- MarkupSafe ==2.1.1
- PyYAML ==5.4
- Pygments ==2.13.0
- Werkzeug ==2.2.3
- XlsxWriter ==3.0.8
- amrlib ==0.7.1
- ansi2html ==1.8.0
- asttokens ==2.1.0
- blis ==0.7.9
- cached-property ==1.5.2
- cachetools ==5.2.0
- catalogue ==2.0.8
- certifi ==2022.9.24
- charset-normalizer ==2.1.1
- click ==8.0.4
- cloudpickle ==2.2.1
- configparser ==5.2.0
- cycler ==0.11.0
- cymem ==2.0.7
- dash ==2.13.0
- dash-bootstrap-components ==1.5.0
- dash-core-components ==2.0.0
- dash-html-components ==2.0.0
- dash-table ==5.0.0
- decorator ==5.1.1
- et-xmlfile ==1.1.0
- executing ==1.2.0
- filelock ==3.8.0
- fonttools ==4.38.0
- frozendict ==2.3.4
- fsspec ==2023.9.2
- future ==0.18.3
- gensim ==4.3.1
- graphviz ==0.20.1
- h5py ==3.7.0
- huggingface-hub ==0.18.0
- hyperopt ==0.2.7
- idna ==3.4
- igraph ==0.10.3
- interlap ==0.2.7
- ipython ==8.6.0
- itsdangerous ==2.1.2
- joblib ==1.2.0
- jsonpickle ==3.0.1
- kiwisolver ==1.4.4
- langcodes ==3.3.0
- lxml ==4.9.1
- matplotlib ==3.5.3
- matplotlib-inline ==0.1.6
- msgpack ==1.0.4
- msgpack-numpy ==0.4.8
- murmurhash ==1.0.9
- nest-asyncio ==1.5.6
- networkx ==2.8.8
- nltk ==3.7
- numpy ==1.23.4
- openpyxl ==3.1.2
- packaging ==21.3
- pandas ==1.4.4
- parse ==1.19.0
- parso ==0.8.3
- pathy ==0.6.2
- patool ==1.12
- pexpect ==4.8.0
- plac ==1.3.5
- plotly ==5.17.0
- preshed ==3.0.8
- prompt-toolkit ==3.0.32
- protobuf ==3.20.3
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- py4j ==0.10.9.7
- pydantic ==1.8.2
- pyemd ==0.5.1
- pyparsing ==3.0.9
- python-dateutil ==2.8.2
- pytz ==2022.6
- pyvis ==0.2.1
- regex ==2022.10.31
- requests ==2.28.1
- retrying ==1.3.4
- scikit-learn ==1.1.3
- scipy ==1.9.3
- screeninfo ==0.8.1
- sentence-transformers ==2.2.2
- sentencepiece ==0.1.97
- six ==1.16.0
- smart-open ==5.2.1
- spacy ==3.2.4
- spacy-legacy ==3.0.10
- spacy-loggers ==1.0.3
- srsly ==2.4.5
- stack-data ==0.6.0
- tabulate ==0.8.10
- tenacity ==8.2.3
- texttable ==1.6.7
- thinc ==8.0.17
- threadpoolctl ==3.1.0
- tokenizers ==0.12.1
- torch ==1.13.1
- torchvision ==0.14.1
- tqdm ==4.65.0
- traitlets ==5.5.0
- transformers ==4.32.1
- typer ==0.4.2
- typing_extensions ==4.4.0
- urllib3 ==1.26.12
- waitress ==2.1.2
- wasabi ==0.10.1
- wcwidth ==0.2.5
- word2number ==1.1
- zensols.pybuild ==0.1.1