Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary
Keywords
Repository
German Parliamentary Corpus (GerParCor)
Basic Info
Statistics
- Stars: 23
- Watchers: 3
- Forks: 8
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md

GerParCor
German Parliamentary Corpus (GerParCor)
Abstract
In 2022, the largest German-speaking corpus of parliamentary protocols from three different centuries, on a national and federal level from the countries of Germany, Austria, Switzerland and Liechtenstein, was collected and published - GerParCor. Through GerParCor, it became possible to provide for the first time various parliamentary protocols which were not available digitally and, moreover, could not be retrieved and processed in a uniform manner. Furthermore, GerParCor was additionally preprocessed using NLP methods and made available in XMI format. In this paper, GerParCor is significantly updated by including all new parliamentary protocols in the corpus, as well as adding and preprocessing further parliamentary protocols previously not covered, so that a period up to 1797 is now covered. Besides the integration of a new, state-of-the-art and appropriate NLP preprocessing for the handling of large text corpora, this update also provides an overview of the further reuse of GerParCor by presenting various provisioning capabilities such as API’s, among others.
GerParCor is available via https://gerparcor.texttechnologylab.org
GerParCor 2022
GerParCor 2022 is available via http://lrec2022.gerparcor.texttechnologylab.org
| # | Parliament | Sessions | From | Until | Status / Download | --- | --- | --- | --- | --- | --- | | 1 | Reichstag (NG + Zoll) | 1990 | 02/25/1867 | 05/24/1895 | Download | | 2 | Reichstag (Empire) | 2183 | 12/03/1895 | 10/26/1918 | Download | | 3 | Weimar Republic | 1328 | 02/06/1919 | 12/09/1932 | Download | | 4 | ThirdReich | 20 | 03/21/1933 | 04/24/1942 | Download | | 5 | Bundesrat | 1008 | 09/07/1949 | 10/08/2021 | Download | | 6 | Bundestag | 4158 | 09/07/1949 | 09/07/2021 | Download | | 7 | Baden-Würtemberg | 412 | 06/05/1984 | 09/29/2021 | Download | | 8 | Bayern | 2221 | 12/16/1946 | 10/14/2021 | Download | | 9 | Berlin | 582 | 04/02/1989 | 09/16/2021 | Download | | 10 | Brandenburg | 442 | 10/26/1990 | 08/27/2021 | Download | | 11 | Bremen | 1102 | 07/04/1995 | 09/16/2021 | Download | | 12 | Hamburg | 586 | 10/08/1997 | 11/03/2021 | Download | | 13 | Hessen | 1297 | 02/04/1947 | 09/29/2021 | Download | | 14 | Mecklenburg-Vorpommern | 659 | 10/26/1990 | 06/11/2021 | Download | | 15 | Niedersachsen | 1109 | 06/22/1982 | 09/15/2021 | Download | | 16 | Nordrhein-Westfalen | 2041 | 05/21/1947 | 10/08/2021 | Download | | 17 | Rheinland-Pfalz | 1562 | 07/24/1947 | 09/22.2021 | Download | | 18 | Saarland | 876 | 07/23/1959 | 09/15/2021 | Download | | 19 | Sachsen | 690 | 10/27/1990 | 11/18/2021 | Download | | 20 | Sachsen-Anhalt | 607 | 10/28/1990 | 09/17/2021 | Download | | 21 | Schleswig-Holstein | 1776 | 02/26/1946 | 02/11/2021 | Download | | 22 | Thüringen | 761 | 10/25/1990 | 11/19/2021 | Download | | 23 | Liechtenstein | 504 | 03/13/1997 | 11/06/2021 | Download | | 24 | Nationalrat (AT) | 4267 | 10/21/1918 | 05/17/2021 | Download | | 25 | Nationlarat (CH) | 368 | 12/06/1999 | 12/09/2021 | Download |
Cite
If you want to use the project or the corpus, please quote this as follows:
G. Abrami, M. Bagci, L. Hammerla, and A. Mehler, “German Parliamentary Corpus (GerParCor),” in Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 2022, pp. 1900-1906. [Link] [PDF]
G. Abrami, M. Bagci and A. Mehler, “German Parliamentary Corpus (GerParCor) Reloaded,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, (LREC-COLING 2024), Torino, Italy, 2024, pp. 7707-7716. [Link] [PDF]
BibTeX
``` @InProceedings{Abrami:Bagci:Hammerla:Mehler:2022, author = {Abrami, Giuseppe and Bagci, Mevl\"{u}t and Hammerla, Leon and Mehler, Alexander}, title = {German Parliamentary Corpus (GerParCor)}, booktitle = {Proceedings of the Language Resources and Evaluation Conference}, month = {June}, year = {2022}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {1900--1906}, url = {https://aclanthology.org/2022.lrec-1.202} }
@inproceedings{Abrami:et:al:2024, address = {Torino, Italy}, author = {Abrami, Giuseppe and Bagci, Mevl{\"u}t and Mehler, Alexander}, booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)}, editor = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen}, month = {may}, pages = {7707--7716}, publisher = {ELRA and ICCL}, title = {{G}erman Parliamentary Corpus ({G}er{P}ar{C}or) Reloaded}, url = {https://aclanthology.org/2024.lrec-main.681}, year = {2024} }
```
Owner
- Name: Text Technology Lab
- Login: texttechnologylab
- Kind: organization
- Location: Frankfurt am Main
- Website: https://www.texttechnologylab.org
- Twitter: ttlab_ffm
- Repositories: 77
- Profile: https://github.com/texttechnologylab
The Text Technology Lab, headed by Prof. Alexander Mehler, is part of the Department of Computer Science and Mathematics at the Goethe Universität in Frankfurt.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this corpus, please cite it as below."
authors:
- family-names: "Abrami"
given-names: "Giuseppe"
orcid: "https://orcid.org/0000-0002-7084-4909"
- family-names: "Bagci"
given-names: "Mevlüt"
orcid: "https://orcid.org/0009-0007-6160-5571"
- family-names: "Mehler"
given-names: "Alexander"
orcid: "https://orcid.org/0000-0003-2567-7539"
title: "German Parliamentary Corpus (GerParCor)"
version: 1.0
date-released: 2024-05-22
license: AGPLv3
preferred-citation:
authors:
- family-names: "Abrami"
given-names: "Giuseppe"
orcid: "https://orcid.org/0000-0002-7084-4909"
- family-names: "Bagci"
given-names: "Mevlüt"
orcid: "https://orcid.org/0009-0007-6160-5571"
- family-names: "Mehler"
given-names: "Alexander"
orcid: "https://orcid.org/0000-0003-2567-7539"
title: "{G}erman Parliamentary Corpus ({G}er{P}ar{C}or) Reloaded"
year: 2024
collection-title: "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)"
publisher:
name: "ELRA and ICCL"
pages: "7707--7716"
url: "https://aclanthology.org/2024.lrec-main.681"
collection-type: proceedings
conference:
date-end: "2024-05-22"
name: "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)"
type: conference-paper
url: "https://github.com/texttechnologylab/GerParCor"
GitHub Events
Total
- Issues event: 2
- Watch event: 2
- Issue comment event: 1
- Push event: 2
- Fork event: 1
Last Year
- Issues event: 2
- Watch event: 2
- Issue comment event: 1
- Push event: 2
- Fork event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: about 2 months
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: about 2 months
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ziorufus (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- com.github.texttechnologylab.textimager-uima:textimager-uima-spacy a1a0b0e94fdb448eb23327c760cb06f80f75c436
- com.github.texttechnologylab.textimager-uima:textimager-uima-types 0.3.0.2
- com.github.texttechnologylab:UIMATypeSystem 1d23e466bc
- com.github.texttechnologylab:Utilities 27866a7214
- com.github.texttechnologylab:textimager-client dirty-tei-dcd9b4979e-1
- com.google.api-client:google-api-client 1.30.9
- org.codehaus.plexus:plexus-utils 2.0.6
- org.jsoup:jsoup 1.14.2
- Pillow ==9.0.0
- Pillow ==8.4.0
- bs4 *
- certifi ==2021.10.8
- cycler ==0.11.0
- dkpro-cassis *
- editdistpy ==0.1.3
- fonttools ==4.28.3
- kiwisolver ==1.3.2
- matplotlib ==3.5.1
- numpy ==1.22.1
- opencv-python ==4.5.5.62
- pdf2image ==1.16.0
- pytesseract ==0.3.8
- python-dateutil ==2.8.2
- requests *
- scipy ==1.7.3
- selenium *
- symspellpy ==6.7.6
- textract *
- tqdm ==4.62.3
- tqdm *
- urllib3 *
- xmi-reader *