https://github.com/clarin-eric/pressmint

PressMint: Interoperable Corpora of Historical Newspapers

https://github.com/clarin-eric/pressmint

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.0%) to scientific vocabulary

Keywords

clarin corpus-data historical-data linguistic-dataset newspaper tei
Last synced: 5 months ago · JSON representation

Repository

PressMint: Interoperable Corpora of Historical Newspapers

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Topics
clarin corpus-data historical-data linguistic-dataset newspaper tei
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme Contributing

README.md

PressMint: Interoperable Corpora of Historical Newspapers

The CLARIN PressMint project plans to compile corpora of historical newspapers for a number of countries and languages.

PressMint corpora are to be interoperable, i.e. encoded to a common PressMint schema, a customisation of the TEI Guidelines, but with various down-stream formats (TSV, CoNLL-U, JSON etc.) also available. The same scripts should process the common data in any PressMint corpus, despite the different kind of information included in the corpora.

The PressMint Git workflow, scripts and documentation will be based on the ParlaMint project, which builds richly annotated corpora of parliamentary proceedings for a large number of countries and autonomous regions.

This Git repository is, as yet, a stub with content still to be added. Note that there are several branches for different parts of the development.

The repository contains the following directories:

  • The Samples directory contains directories by contributing (CLARIN) country. It will eventually include samples for all variants and formats of the PressMint corpora.

Owner

  • Name: CLARIN ERIC
  • Login: clarin-eric
  • Kind: organization
  • Email: trac@clarin.eu
  • Location: Utrecht, The Netherlands

CLARIN central source code hub

GitHub Events

Total
  • Member event: 6
  • Push event: 16
  • Pull request event: 3
  • Create event: 4
Last Year
  • Member event: 6
  • Push event: 16
  • Pull request event: 3
  • Create event: 4