calamr

CALAMR: Component ALignment for Abstract Meaning Representation

https://github.com/uic-nlp-lab/calamr

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
✓
Institutional organization owner
Organization uic-nlp-lab has institutional domain (nlp.lab.uic.edu)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Keywords

abstract-meaning-representation academic-paper amr summarization

Last synced: 6 months ago · JSON representation ·

Repository

CALAMR: Component ALignment for Abstract Meaning Representation

Basic Info

Host: GitHub
Owner: uic-nlp-lab
License: mit
Language: Python
Default Branch: uic_lab_repo
Homepage:
Size: 2.94 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Topics

abstract-meaning-representation academic-paper amr summarization

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

CALAMR: Component ALignment for Abstract Meaning Representation

This repository contains code and data the paper CALAMR: Component ALignment for Abstract Meaning Representation. This code is used to align the components of a bipartite source and summary AMR graph. The results are useful as a semantic graph similarity score (like SMATCH) or to find the summarized portion (as AMR nodes, edges and subgraphs) of a document or the portion of the source that represents the summary.

Inclusion in Your Projects

The purpose of this repository is to reproduce the results in the paper. If want to align AMR graphs for your own work, please refer to the zensols.calamr repository, which has reusable code and examples. If you use this library or the PropBank API or PropBank curated database, please cite our paper.

Documentation

The recommended reading order for this project:

The conference slides
The abstract and introduction of the paper CALAMR: Component ALignment for Abstract Meaning Representation
Overview and implementation guide
Full documentation
API reference

Reproducing the Results

To reproduce the results from the paper, first process the corpus. These next steps create the document summarization and parser metrics.

Preprocessing the corpus with the following steps:

Install a Python 3.10.8 virtual environment on Linux. Note this version of the code assumes Linux, but new version does not.
Clone this repository: git clone https://github.com/uic-nlp-lab/calamr
Enter the repository and create release directory that corpora to be installed: cd calamr && mkdir download
Download the AMR Release 3.0: cp .../path/to/download/amr_annotation_3.0_LDC2020T02.tgz download
For reproducing the results that compare with earlier work on the AMR Release 1.0 corpus, place that corpus file in download directory as well.
Install the environment: ./bin/install.sh <path to Python home directory>. If you use conda, create a new conda 3.10.8 environment and set it to the Python home directory it creates (not including the bin/python3 directory)
Check the previous step to make sure it successfully creates new Python environment in directory pyenv. Also make sure it clones the amr_coref repository, and applies the patch successfully.
Create the sentence type/align merged corpus file: ./bin/prep.sh mergeanons
Create the mismatch corpus (please contact the authors for the original corpus file used in the experiments as the random seed was not set): ./bin/prep.sh mismatchcorp
Create the parser output of the corpora: ./bin/prep.sh parsecorp
Create the JAMR output for the corpora. This is a manual process, which includes downloading and installing the JAMR parser. We created this file manually, but will provide it for requests that include proof of purchase of the AMR Release 3.0 corpus.
Score documents and pairs (document table): ./bin/prep.sh score
Align documents: ./bin/prep.sh align
Output alignment statistics: ./bin/prep.sh alignstats

To recreate the example diagrams from the paper

The micro corpus are short examples for illustrating the alignment algorithm via component diagram. You can add your own sentences to the AMR parser input and rerun the micro corpus create an align steps below.

Follow the steps to creating the virtual environment in result reproduction section, and then: 1. Create the AMR micro corpus: ./bin/micro.sh createcorp 1. Align the micro corpus graphs: ./bin/micro.sh align

Attribution

This project, or reference model code, uses:

Python 3.10
amrlib for AMR parsing.
amr_coref for AMR co-reference
zensols.amr for AMR features and summarization data structures.
Sentence-BERT embeddings
zensols.propbankdb and zensols.deepnlp for PropBank embeddings
zensols.nlparse for natural language features and NLP scoring
Smatch and WLK for scoring.

Citation

If you use this project in your research please use the following BibTeX entry:

bibtex @inproceedings{landes-di-eugenio-2024-calamr-component, title = "{CALAMR}: Component {AL}ignment for {A}bstract {M}eaning {R}epresentation", author = "Landes, Paul and Di Eugenio, Barbara", editor = "Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen", booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)", month = may, year = "2024", address = "Torino, Italy", publisher = "ELRA and ICCL", url = "https://aclanthology.org/2024.lrec-main.236", pages = "2622--2637" }

License

MIT License

Owner

Name: Natural Language Processing Lab @UIC
Login: uic-nlp-lab
Kind: organization
Location: United States of America

Website: https://nlp.lab.uic.edu/
Repositories: 7
Profile: https://github.com/uic-nlp-lab

851 S Morgan St. Chicago, IL 60607

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'CALAMR: Component ALignment for Abstract Meaning Representation'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
date-released: 2024-05-19
repository-code: https://github.com/uic-nlp-lab/calamr
authors:
  - given-names: Paul
    family-names: Landes
    email: landes@mailc.net
    affiliation: University of Illinois at Chicago
    orcid: 'https://orcid.org/0000-0003-0985-0864'
preferred-citation:
  type: conference-paper
  authors:
    - given-names: Paul
      family-names: Landes
      email: landes@mailc.net
      affiliation: University of Illinois at Chicago
      orcid: 'https://orcid.org/0000-0003-0985-0864'
    - given-names: Barbara
      family-names: Di Eugenio
      affiliation: University of Illinois at Chicago
  title: 'CALAMR: Component ALignment for Abstract Meaning Representation'
  url: https://aclanthology.org/2024.lrec-main.236/
  year: 2024
  conference:
    name: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
    city: Torino
    country: IT
    date-start: 2024-05-20
    date-end: 2024-05-25

GitHub Events

Total

Last Year

Dependencies

.github/workflows/test.yml actions

actions/checkout v2.4.0 composite
actions/setup-python v2 composite

src/python/requirements.txt pypi

Flask ==2.2.5
Jinja2 ==3.1.2
MarkupSafe ==2.1.1
PyYAML ==5.4
Pygments ==2.13.0
Werkzeug ==2.2.3
XlsxWriter ==3.0.8
amrlib ==0.7.1
ansi2html ==1.8.0
asttokens ==2.1.0
blis ==0.7.9
cached-property ==1.5.2
cachetools ==5.2.0
catalogue ==2.0.8
certifi ==2022.9.24
charset-normalizer ==2.1.1
click ==8.0.4
cloudpickle ==2.2.1
configparser ==5.2.0
cycler ==0.11.0
cymem ==2.0.7
dash ==2.13.0
dash-bootstrap-components ==1.5.0
dash-core-components ==2.0.0
dash-html-components ==2.0.0
dash-table ==5.0.0
decorator ==5.1.1
et-xmlfile ==1.1.0
executing ==1.2.0
filelock ==3.8.0
fonttools ==4.38.0
frozendict ==2.3.4
fsspec ==2023.9.2
future ==0.18.3
gensim ==4.3.1
graphviz ==0.20.1
h5py ==3.7.0
huggingface-hub ==0.18.0
hyperopt ==0.2.7
idna ==3.4
igraph ==0.10.3
interlap ==0.2.7
ipython ==8.6.0
itsdangerous ==2.1.2
joblib ==1.2.0
jsonpickle ==3.0.1
kiwisolver ==1.4.4
langcodes ==3.3.0
lxml ==4.9.1
matplotlib ==3.5.3
matplotlib-inline ==0.1.6
msgpack ==1.0.4
msgpack-numpy ==0.4.8
murmurhash ==1.0.9
nest-asyncio ==1.5.6
networkx ==2.8.8
nltk ==3.7
numpy ==1.23.4
openpyxl ==3.1.2
packaging ==21.3
pandas ==1.4.4
parse ==1.19.0
parso ==0.8.3
pathy ==0.6.2
patool ==1.12
pexpect ==4.8.0
plac ==1.3.5
plotly ==5.17.0
preshed ==3.0.8
prompt-toolkit ==3.0.32
protobuf ==3.20.3
ptyprocess ==0.7.0
pure-eval ==0.2.2
py4j ==0.10.9.7
pydantic ==1.8.2
pyemd ==0.5.1
pyparsing ==3.0.9
python-dateutil ==2.8.2
pytz ==2022.6
pyvis ==0.2.1
regex ==2022.10.31
requests ==2.28.1
retrying ==1.3.4
scikit-learn ==1.1.3
scipy ==1.9.3
screeninfo ==0.8.1
sentence-transformers ==2.2.2
sentencepiece ==0.1.97
six ==1.16.0
smart-open ==5.2.1
spacy ==3.2.4
spacy-legacy ==3.0.10
spacy-loggers ==1.0.3
srsly ==2.4.5
stack-data ==0.6.0
tabulate ==0.8.10
tenacity ==8.2.3
texttable ==1.6.7
thinc ==8.0.17
threadpoolctl ==3.1.0
tokenizers ==0.12.1
torch ==1.13.1
torchvision ==0.14.1
tqdm ==4.65.0
traitlets ==5.5.0
transformers ==4.32.1
typer ==0.4.2
typing_extensions ==4.4.0
urllib3 ==1.26.12
waitress ==2.1.2
wasabi ==0.10.1
wcwidth ==0.2.5
word2number ==1.1
zensols.pybuild ==0.1.1

src/python/setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

calamr

Science Score: 52.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

CALAMR: Component ALignment for Abstract Meaning Representation

Inclusion in Your Projects

Documentation

Reproducing the Results

To recreate the example diagrams from the paper

Attribution

Citation

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies