germaparlpy

The GermaParlPy Python package provides functionality to deserialize, serialize, manage, and query the GermaParlTEI corpus and derived corpora.

https://github.com/nolram567/germaparlpy

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The GermaParlPy Python package provides functionality to deserialize, serialize, manage, and query the GermaParlTEI corpus and derived corpora.

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

GermaParlPy

PyPi Latest Release DOI

The GermaParlPy Python package provides functionality to deserialize, serialize, manage, and query the GermaParlTEI[^1] corpus and derived corpora.

The GermaParlTEI corpus comprises the plenary protocols of the German Bundestag (parliament), encoded in XML according to the TEI standard. The current version covers the first 19 legislative periods, encompassing transcribed speeches from the Bundestag's constituent session on 7 September 1949 to the final sitting of the Angela Merkel era in 2021. This makes it a valuable resource for research in various scientific disciplines.

For detailed information on the library, visit the official website.

Use Cases

Potential use cases range from the examination of research questions in political science, history or linguistics to the compilation of training data sets for AI.

In addition, this library makes it possible to access the GermaParl corpus in Python and apply powerful NLP libraries such as spacy or gensim to it. Previously, the corpus could only be accessed using the PolMineR package in the R programming language.

Installation

GermaParlPy is available on PyPi:

sh pip install germaparlpy

API Reference

Click here for the full API Reference.

XML Structure

Click here to learn more about the XML Structure of the underlying corpus GermaParlTEI[^1].

Tutorials

I have prepared three example scripts that showcase the utilisation and potential use cases of GermaParlPy. You can find the scripts in the /example directory or here.

Contributing

Contributions and feedback are welcome! Feel free to write an issue or open a pull request.

License

The code is licensed under the MIT License.

The GermaParl corpus, which is not part of this repository, is licensed under a CLARIN PUB+BY+NC+SA license.

Credits

Developed by Marlon-Benedikt George.

The underlying data set, the GermaParl corpus, was compiled and released by Blätte & Leonhardt (2024)[^1]. See also their R-Library PolMineR in the context of the PolMine-Project, which served as an inspiration for this library.

[^1]: Blaette, A.and C. Leonhardt. Germaparl corpus of plenary protocols. v2.2.0-rc1, Zenodo, 22 July 2024, doi:10.5281/zenodo.12795193

Owner

  • Login: Nolram567
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: GermaParlPy
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Marlon-Benedikt
    family-names: George
    orcid: 'https://orcid.org/0009-0000-4381-7035'
identifiers:
  - type: doi
    value: 10.5281/zenodo.15180629
repository-code: 'https://github.com/Nolram567/GermaParlPy'
url: 'https://nolram567.github.io/GermaParlPy/'
repository: 'https://pypi.org/project/germaparlpy/'
abstract: >-
  The GermaParlPy Python package provides functionality to
  deserialize, serialize, manage, and query the GermaParlTEI
  corpus by Blätte & Leonhardt (2024) and derived corpora.

  The GermaParlTEI corpus comprises the plenary protocols of
  the German Bundestag (parliament), encoded in XML
  according to the TEI standard. The current version covers
  the first 19 legislative periods, encompassing transcribed
  speeches from the Bundestag's constituent session on 7
  September 1949 to the final sitting of the Angela Merkel
  era in 2021. This makes it a valuable resource for
  research in various scientific disciplines.
license: MIT
version: 1.0.4
date-released: '2025-05-18'

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Public event: 1
  • Push event: 16
  • Create event: 4
Last Year
  • Release event: 1
  • Watch event: 1
  • Public event: 1
  • Push event: 16
  • Create event: 4

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 25 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
pypi.org: germaparlpy

The GermaParlPy Python package provides functionality to deserialize, serialize, manage, and query the GermaParlTEI corpus and derived corpora.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 25 Last month
Rankings
Dependent packages count: 9.3%
Average: 31.0%
Dependent repos count: 52.6%
Maintainers (1)
Last synced: 6 months ago