https://github.com/disi-unibo-nlp/ddegk

Implementation of Deep Divergence Event Graph Kernels

https://github.com/disi-unibo-nlp/ddegk

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: mdpi.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

biomedical-text-mining event-embedding event-extraction graph-kernels graph-neural-networks graph-representation-learning graph-similarity-learning
Last synced: 4 months ago · JSON representation

Repository

Implementation of Deep Divergence Event Graph Kernels

Basic Info
  • Host: GitHub
  • Owner: disi-unibo-nlp
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 10.9 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
biomedical-text-mining event-embedding event-extraction graph-kernels graph-neural-networks graph-representation-learning graph-similarity-learning
Created over 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme

README.md

DDEGK

Code and data accompanying the MDPI Sensors 2021 paper "Unsupervised Event Graph Representation and Similarity Learning on Biomedical Literature".

DDEGK is an unsupervised and inductive method capable of mapping events into low-dimensional vectors (task-agnostic and whole-graph embeddings), reflecting their structural and semantic similarities. It is designed to be highly interpretable, with a cross-graph isomorphic attention mechanism trained to preserve node and edge attributes. By merging deep learning architectures and natural language processing with symbolic structures and graph theory, it represents a powerful tool for automatically recognizing similarities between biomedical interactions mentioned by researchers in scientific publications, enabling their aggregation, quantification, and retrieval. It leverages deep graph kernel without requiring computing the entire kernel matrix, providing greater scalability. Specifically, DDEGK compares events against a small set of anchor ones, learning embeddings based on target-anchor divergences. Thus learned embeddings make event-centered operations simpler and faster than comparable operations on graphs.

Overview

DDEGK overview DDEGK overview. Structural and semantic divergence scores from a set of anchor event graphs are used to compose the vector representation of a target event graph. Divergence is measured through pre-trained Node-to-Edges encoder models, one for each anchor.



DDEGK divergence prediction with cross-graph attention Anchor-based target event graph encoder for divergence prediction. Attention layers map the target event graph nodes onto the anchor graph, being aware of node and edge attributes.

Datasets

We applied and evaluated DDEGK on nine real-world datasets originally designed for biomedical event extraction, mainly introduced by the ongoing BioNLP-ST series: ST09, GE11, EPI11, ID11, MLEE, GE13, CG13, PC13, and GRO13. Download original corpora (6GB of uncompressed <.txt, .a1, .a2> files) and samples used within the experiments.

Get started

Installing dependencies

Dependencies are listed in the file requirements.txt, they can be installed with pip install -r requirements.txt.
It is necessary to use Python 3.6.

Compute event graph embeddings

  1. Prepare the dataset. Create a folder under data named as your dataset. It must contain a list of BioNLP Standoff files, i.e., triples composed of a text document (.txt), entity (.a1), and event (.a2) annotations. Alternatively, it is possible to provide .ann files.
  2. Run DDEGK. Execute python -m src.events_embedding --help to see all the available parameters.
  3. Analyze the results. The output will be saved in the folder results/<datasetname>/ddegk/results.json. The provided Jupyter notebooks can be used to visualize the results.

An example command:

python -m src.events_embedding --dataset=test --node-embedding-coeff=1 --node-label-coeff=1 --edge-label-coeff=1 --prototype-choice=random --num-prototypes=2

Cite

If you found this repository useful, please consider citing the following paper:

Frisoni G., Moro G., Carlassare G. and Carbonaro A. "Unsupervised Event Graph Representation and Similarity Learning on Biomedical Literature." Sensors, 2021.

@article{frisoni2021ddegk,
  title={Unsupervised Event Graph Representation and Similarity Learning on Biomedical Literature},
  author={Giacomo, Frisoni and Gianluca, Moro and Giulio, Carlassare and Antonella, Carbonaro},
  journal={Sensors},
  year={2021}
}

We thank Eleonora Bertoni for her precious help in preparing the datasets, implementing the baseline, and conducting the experiments.

Owner

  • Name: DISI UniBo NLP
  • Login: disi-unibo-nlp
  • Kind: user
  • Location: Italy

NLU Research Group @ University of Bologna @ Department of Computer Science and Engineering (DISI)

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • absl-py ==0.7.0
  • networkx ==2.3
  • numpy ==1.17
  • scipy ==1.3.0
  • sklearn ==0.0
  • tensorflow ==1.14
  • tf_slim ==1.1.0
  • torch ==1.10.0
  • tqdm ==4.32.2
  • transformers ==4.11.3