the-corpus-as-a-network

Turning source documents into a graph with NLP

https://github.com/maehr/the-corpus-as-a-network

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

named-entity-recognition natural-language-processing social-network-analysis
Last synced: 6 months ago · JSON representation ·

Repository

Turning source documents into a graph with NLP

Basic Info
Statistics
  • Stars: 8
  • Watchers: 2
  • Forks: 0
  • Open Issues: 1
  • Releases: 1
Topics
named-entity-recognition natural-language-processing social-network-analysis
Created about 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation Codemeta

README.md

The corpus as a network

GitHub issues GitHub forks GitHub stars GitHub license DOI

Turning source documents into a graph with NLP

Moritz Mähr (University of Bern)

December 12, 2022

Lecture series "Einblicke in die Digital Humanities" (fall semester 2022)

Abstract: For the research project "The Evolution of Internet Governance" at the University of Bern, a corpus was compiled. The born digital sources date from the years between 1969 and 1999 and are relatively homogeneous. This allowed to build different network representations (graphs) of the indicated human and non-human actors, locations and events from the corpus using NLP (rule-based annotations as well as automated Named Entity Recognition). The process of annotating the corpus and constructing bipartite graphs is the subject of this lecture.

Installation

Use the package manager poetry to install the dependencies.

bash poetry install

Usage

bash poetry run jupyter notebook notebooks/the-corpus-as-a-network.ipynb

| Nbviewer | Jupyter Notebook | Jupyter Lab | HTML | | --- | -- | --- | --- | | the-corpus-as-a-network.ipynb | Binder | Binder | HTML |

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors and acknowledgment

  • Moritz Mähr - Initial work - maehr

See also the list of contributors who participated in this project.

License

BSD-3-clause

Owner

  • Name: Moritz Mähr
  • Login: maehr
  • Kind: user
  • Location: Bern & Basel
  • Company: @DHBern & @Stadt-Geschichte-Basel

#DH #STS #NLP #SNA #graphs #DigitalHistory #HistoryOfComputing 👷 associate researcher @DHBern and digital lead @Stadt-Geschichte-Basel

Citation (CITATION.cff)

cff-version: 1.2.0
title: The corpus as a network
message: >-
  If you use this dataset, please cite it using the
  metadata from this file.
type: dataset
authors:
  - given-names: Moritz
    family-names: Mähr
    email: moritz.maehr@unibe.ch
    affiliation: University of Bern
    orcid: 'https://orcid.org/0000-0002-1367-1618'
version: 1.0.0
doi: 10.5281/zenodo.7430555
date-released: 2022-12-12

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/BSD-3-Clause",
  "codeRepository": "https://github.com/maehr/the-corpus-as-a-network",
  "dateCreated": "2022-12-12",
  "datePublished": "2022-12-12",
  "dateModified": "2022-12-12",
  "issueTracker": "https://github.com/maehr/the-corpus-as-a-network/issues",
  "name": "The corpus as a network",
  "version": "0.1.0",
  "description": "For the research project \"The Evolution of Internet Governance\" at the University of Bern, a corpus was compiled. The born digital sources date from the years between 1969 and 1999 and are relatively homogeneous. This allowed to build different network representations (graphs) of the indicated human and non-human actors, locations and events from the corpus using NLP (rule-based annotations as well as automated Named Entity Recognition). The process of annotating the corpus and constructing bipartite graphs is the subject of this lecture.",
  "applicationCategory": "Digital Humanities",
  "developmentStatus": "wip",
  "referencePublication": "https://zenodo.org/10.5281/zenodo.7430555",
  "programmingLanguage": [
    "Python3"
  ],
  "author": [
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0002-1367-1618",
      "givenName": "Moritz",
      "familyName": "Mhr",
      "email": "moritz.maehr@unibe.ch",
      "affiliation": {
        "@type": "Organization",
        "name": "Digital Humanities, University of Bern"
      }
    }
  ]
}

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 15
  • Average time to close issues: N/A
  • Average time to close pull requests: 11 days
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.53
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 15
Past Year
  • Issues: 0
  • Pull requests: 15
  • Average time to close issues: N/A
  • Average time to close pull requests: 11 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.53
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 15
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (22)
Top Labels
Issue Labels
Pull Request Labels
dependencies (22)

Dependencies

poetry.lock pypi
  • 136 dependencies
pyproject.toml pypi
  • jupyter ^1.0.0
  • matplotlib ^3.6.2
  • networkx ^2.8.8
  • pandas ^1.5.2
  • pyarrow ^10.0.1
  • python ^3.10
  • requests ^2.28.1
  • rich ^12.6.0
  • scipy ^1.9.3
  • spacy ^3.4.3
requirements.txt pypi
  • jupyter ==1.0.0
  • matplotlib ==3.6.2
  • networkx ==2.8.8
  • pandas ==1.5.2
  • pyarrow ==10.0.1
  • requests ==2.28.1
  • rich ==12.6.0
  • scipy ==1.9.3
  • spacy ==3.4.3