corpus-annotation-graph-builder

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph.

https://github.com/dlr-sc/corpus-annotation-graph-builder

Science Score: 85.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    5 of 10 committers (50.0%) from academic institutions
  • Institutional organization owner
    Organization dlr-sc has institutional domain (www.dlr.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary

Keywords

annotation python
Last synced: 6 months ago · JSON representation ·

Repository

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph.

Basic Info
  • Host: GitHub
  • Owner: DLR-SC
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 3.81 MB
Statistics
  • Stars: 12
  • Watchers: 4
  • Forks: 1
  • Open Issues: 7
  • Releases: 4
Topics
annotation python
Created about 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

Welcome to the Corpus Annotation Graph Builder (CAG)

Badge: PyPI version Badge: Made with Python Badge: Open in VSCode Badge: Black DOI License: MIT Twitter: DLR Software

cag is a Python Library offering an architectural framework to employ the build-annotate pattern when building Graphs.


Official Documentation.

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph. CAG is built on top of ArangoDB and its Python drivers (PyArango). The build-and-annotate pattern consists of two phases (see Figure below): (1) data is collected from different sources (e.g., publication databases, online encyclopedias, news feeds, web portals, electronic libraries, repositories, media platforms) and preprocessed to build the core nodes, which we call Objects of Interest. The component responsible for this phase is the Graph-Creator. (2) Annotations are extracted from the OOIs, and corresponding annotation nodes are created and linked to the core nodes. The component dealing with this phase is the Graph-Annotator.

cag

This framework aims to offer researchers a flexible but unified and reproducible way of organizing and maintaining their interlinked document collections in a Corpus Annotation Graph.

Installation

Direct install via pip

The package can also be installed directly via pip. pip install cag

This will allow you to use the module cag from any python script locally. The two main packages are cag.framework and cag.view_wrapper.

Manual cloning

Clone the repository, go to the root folder and then run:

pip install -e .

Citation

Please cite us in case you use CAG

@inproceedings{el-baff-etal-2023-corpus,
  title = "Corpus Annotation Graph Builder ({CAG}): An Architectural Framework to Create and Annotate a Multi-source Graph",
  author = "El Baff, Roxanne  and
    Hecking, Tobias  and
    Hamm, Andreas  and
    Korte, Jasper W.  and
    Bartsch, Sabine",
  booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
  month = may,
  year = "2023",
  address = "Dubrovnik, Croatia",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.eacl-demo.28",
  pages = "248--255"
}

Usage

  • After the installation, a project scaffold can be created with the command cag start-project
  • Graph Creation [jupyter notebook]
  • Graph Annotation [jupyter notebook]

Zenodo refs

Latest Version

  • v1.6.0 DOI

Previous Version

  • v1.5.17DOI
  • v1.5.0 DOI
  • v1.4.0 DOI

Owner

  • Name: DLR Institute for Software Technology
  • Login: DLR-SC
  • Kind: organization
  • Email: opensource@dlr.de
  • Location: Cologne, Berlin, Braunschweig, Oberpfaffenhofen, Bremen

German Aerospace Center (DLR)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Corpus Annotation Graph Builder (CAG): An Architectural
  Framework to Build and Annotate a Multi-source Graph
message: >-
  If you use CAG in your research, please cite it using
  these metadata.
type: software
authors:
  - family-names: El Baff
    given-names: Roxanne
    affiliation: German Aerospace Center (DLR)
    orcid: 'https://orcid.org/0000-0001-6661-8661'
  - family-names: Hecking
    given-names: Tobias
    affiliation: German Aerospace Center (DLR)
    orcid: 'https://orcid.org/0000-0003-0833-7989'
  - family-names: Hamm
    given-names: Andreas
    affiliation: German Aerospace Center (DLR)
    orcid: 'https://orcid.org/0000-0001-5854-851X'
  - family-names: Korte
    given-names: Jasper W.
    affiliation: German Aerospace Center (DLR)
    orcid: 'https://orcid.org/0000-0002-5559-8842'
  - family-names: Bartsch
    given-names: Sabine
    affiliation: Technical University of Darmstadt
repository-code: 'https://github.com/DLR-SC/corpus-annotation-graph-builder'
url: 'https://cagraph.info'
abstract: >-
  CAG is a Python Library offering an architectural
  frammework to employ the build-annotate pattern when
  building Graphs.
keywords:
  - graph
  - architectural framework
  - graph creator
  - graph annotator
license: MIT
version: 1.4.0
date-released: '2023-06-03'
references:
 - conference: "The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023): : System Demonstrations"
 - year: 2023
 - month: 5

GitHub Events

Total
  • Issues event: 2
  • Watch event: 1
Last Year
  • Issues event: 2
  • Watch event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 234
  • Total Committers: 10
  • Avg Commits per committer: 23.4
  • Development Distribution Score (DDS): 0.62
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
El Baff, Roxanne r****f@d****e 89
roxanneelbaff r****f@g****m 62
Benedikt Kantz b****z@d****e 46
Norman Nabhan n****n@d****e 26
Norman Müller 1****r 4
Hecking, Tobias t****g@d****e 2
sant_si s****m@d****e 2
johannes_honeder 4****r 1
Niklas Frondorf 2****6 1
Dominik Opitz 3****m 1
Committer Domains (Top 20 + Academic)
dlr.de: 5

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 29
  • Total pull requests: 24
  • Average time to close issues: 17 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 5
  • Total pull request authors: 5
  • Average comments per issue: 0.79
  • Average comments per pull request: 0.25
  • Merged pull requests: 22
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • roxanneelbaff (15)
  • muelldlr (10)
  • destructivedata (2)
  • 0x6e66 (1)
  • johndolier (1)
Pull Request Authors
  • roxanneelbaff (14)
  • muelldlr (7)
  • johndolier (1)
  • TechDom (1)
  • 0x6e66 (1)
Top Labels
Issue Labels
enhancement (4) priority::medium (4) bug (3) priority::high (2) priority::low (2) question (2) documentation (1) help wanted (1)
Pull Request Labels

Dependencies

.github/workflows/main.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • arangodb 3.9.2 docker
requirements.txt pypi
  • dataclasses >=0.6
  • empath >=0.89
  • networkx >=2.8.5
  • nltk >=3.4.5
  • pyArango >=2.0.1
  • pytest >=7.1.2
  • python-arango >=7.4.1
  • python-slugify *
  • pyvis >=0.2.1
  • rich *
  • spacy >=3.4.1
  • spacy_arguing_lexicon >=0.0.3
  • tomli >=2.0.1
  • tqdm >=4.43.0
  • typer *