the-corpus-as-a-network
Turning source documents into a graph with NLP
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Keywords
Repository
Turning source documents into a graph with NLP
Basic Info
- Host: GitHub
- Owner: maehr
- License: bsd-3-clause
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://maehr.github.io/the-corpus-as-a-network/
- Size: 5.33 MB
Statistics
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
The corpus as a network
Turning source documents into a graph with NLP
Moritz Mähr (University of Bern)
December 12, 2022
Lecture series "Einblicke in die Digital Humanities" (fall semester 2022)
Abstract: For the research project "The Evolution of Internet Governance" at the University of Bern, a corpus was compiled. The born digital sources date from the years between 1969 and 1999 and are relatively homogeneous. This allowed to build different network representations (graphs) of the indicated human and non-human actors, locations and events from the corpus using NLP (rule-based annotations as well as automated Named Entity Recognition). The process of annotating the corpus and constructing bipartite graphs is the subject of this lecture.
Installation
Use the package manager poetry to install the dependencies.
bash
poetry install
Usage
bash
poetry run jupyter notebook notebooks/the-corpus-as-a-network.ipynb
| Nbviewer | Jupyter Notebook | Jupyter Lab | HTML |
| --- | -- | --- | --- |
| the-corpus-as-a-network.ipynb | |
| HTML |
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Versioning
We use SemVer for versioning. For the versions available, see the tags on this repository.
Authors and acknowledgment
- Moritz Mähr - Initial work - maehr
See also the list of contributors who participated in this project.
License
Owner
- Name: Moritz Mähr
- Login: maehr
- Kind: user
- Location: Bern & Basel
- Company: @DHBern & @Stadt-Geschichte-Basel
- Website: moritzmaehr.ch
- Repositories: 43
- Profile: https://github.com/maehr
#DH #STS #NLP #SNA #graphs #DigitalHistory #HistoryOfComputing 👷 associate researcher @DHBern and digital lead @Stadt-Geschichte-Basel
Citation (CITATION.cff)
cff-version: 1.2.0
title: The corpus as a network
message: >-
If you use this dataset, please cite it using the
metadata from this file.
type: dataset
authors:
- given-names: Moritz
family-names: Mähr
email: moritz.maehr@unibe.ch
affiliation: University of Bern
orcid: 'https://orcid.org/0000-0002-1367-1618'
version: 1.0.0
doi: 10.5281/zenodo.7430555
date-released: 2022-12-12
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"license": "https://spdx.org/licenses/BSD-3-Clause",
"codeRepository": "https://github.com/maehr/the-corpus-as-a-network",
"dateCreated": "2022-12-12",
"datePublished": "2022-12-12",
"dateModified": "2022-12-12",
"issueTracker": "https://github.com/maehr/the-corpus-as-a-network/issues",
"name": "The corpus as a network",
"version": "0.1.0",
"description": "For the research project \"The Evolution of Internet Governance\" at the University of Bern, a corpus was compiled. The born digital sources date from the years between 1969 and 1999 and are relatively homogeneous. This allowed to build different network representations (graphs) of the indicated human and non-human actors, locations and events from the corpus using NLP (rule-based annotations as well as automated Named Entity Recognition). The process of annotating the corpus and constructing bipartite graphs is the subject of this lecture.",
"applicationCategory": "Digital Humanities",
"developmentStatus": "wip",
"referencePublication": "https://zenodo.org/10.5281/zenodo.7430555",
"programmingLanguage": [
"Python3"
],
"author": [
{
"@type": "Person",
"@id": "https://orcid.org/0000-0002-1367-1618",
"givenName": "Moritz",
"familyName": "Mhr",
"email": "moritz.maehr@unibe.ch",
"affiliation": {
"@type": "Organization",
"name": "Digital Humanities, University of Bern"
}
}
]
}
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 15
- Average time to close issues: N/A
- Average time to close pull requests: 11 days
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.53
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 15
Past Year
- Issues: 0
- Pull requests: 15
- Average time to close issues: N/A
- Average time to close pull requests: 11 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.53
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 15
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (22)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- 136 dependencies
- jupyter ^1.0.0
- matplotlib ^3.6.2
- networkx ^2.8.8
- pandas ^1.5.2
- pyarrow ^10.0.1
- python ^3.10
- requests ^2.28.1
- rich ^12.6.0
- scipy ^1.9.3
- spacy ^3.4.3
- jupyter ==1.0.0
- matplotlib ==3.6.2
- networkx ==2.8.8
- pandas ==1.5.2
- pyarrow ==10.0.1
- requests ==2.28.1
- rich ==12.6.0
- scipy ==1.9.3
- spacy ==3.4.3