Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.9%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: HTR-School-Vienna
  • License: cc-by-sa-4.0
  • Default Branch: main
  • Size: 266 KB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Description

The participants of the Winter School of Handwritten Text Recognition of Medieval Manuscripts Latin / Greek / Czech, Byzantine Greek Group, trained the Transkribus model "15th c. liturgical" for Byzantine Greek. The model was trained from images of the codex Dresden, SLUB A. 151. This repository contains a) transcriptions from 66 images set as "ground truth" and b) automatic transcriptions from the rest of the pages (to added later). The automatic transcriptions have an error rate of 22,3%. The errors are mostly wrong accents and breathings. To enhance searchability despite the erroneous transcription, try fuzzy search. Alternatively, you can remove the accents from the txt file (e.g. https://dev.to/djemos/removeaccents-py-5dmd).

Origin of the data:

  • Images available under http://digital.slub-dresden.de/id345703170
  • Description or citation of transcription guidelines ***

Data organisation

The transcriptions of the 66 pages marked as "ground truth" are in the separate file "groundtruth". All transcriptions, whether checked and marked as ground truth by the editors or not, are in the file "alltranscriptions".

How to cite

This dataset was created by Angelos Zaloumis, Canan Arıkan-Caba, Carole Hofstetter, Eirini Afentoulidou, Ekaterini Mitsiou, Emanuele Scieri, Georgi Mitov, Konstantina Tsakona, Kyriaco Nikias, Louiza Argyriou, Panagiotis Leontaridis. The digitisation is not copyright free, but the transcription is. However, properly annotating a corpus takes time and is a task that should be recognised. If you use any item from this corpus as ground truth, cite the dataset using the following information.

Copy citation BibTeX from Zenodo

Copyright and licence

This dataset was created as part of the Winter School of Handwritten Text Recognition of Medieval Manuscripts 2023/2024, Vienna at the Österreichische Akademie der Wissenschaften, Institut für Mittelalterforschung, all transcriptions are licensed under the Creative Commons 4 licence. Images were provided by the Saxon State and University Library (SLUB) and are licensed under the Public Domain Mark - No Copyright Protection.

Owner

  • Name: HTR School Vienna
  • Login: HTR-School-Vienna
  • Kind: organization

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:

  - family-names: Afentoulidou
    given-names: Eirini
    orcid: https://orcid.org/0000-0001-7290-7865
  - family-names: Arıkan-Caba
    given-names: Canan
    orcid: https://orcid.org/0000-0003-3281-5178
  - family-names: Hofstetter
    given-names: Carole
    orcid: https://orcid.org/0000-0002-0961-7491
  - family-names: Mitov
    given-names: Georgi
    orcid: https://orcid.org/0000-0001-8971-4971
  - family-names: Leontaridis
    given-names: Panagiotis
    orcid: https://orcid.org/0009-0009-1820-8567
  - family-names: Argyriou
    given-names: Louiza
    orcid: https://orcid.org/0009-0004-8591-2355
  - family-names: Tsakona
    given-names: Konstantina
    orcid: https://orcid.org/0009-0002-9406-1981
  - family-names: Zaloumis
    given-names: Angelos
    orcid: https://orcid.org/0009-0008-1314-6638
  - family-names: Nikias
    given-names: Kyriaco
    orcid: https://orcid.org/0000-0002-3769-8866
- family-names: Scieri
    given-names: Emanuele
    orcid: https://orcid.org/0009-0009-1182-1449
title: "Transkribus - 2023--byzantine-greek"
version: 1.0.0
identifiers:
  - type: doi
    value: 10.5281/zenodo.1234
date-released: 2024-01-30

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1