Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: HTRogene
  • Default Branch: main
  • Size: 330 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

HTRogne - Medieval Latin

License: CC BY 4.0 characters badge regions badge lines badge files badge

Introduction

HTRogne is an exploratory project funded by Biblissima+, aiming to develop generic models for automatic transcription of medieval and early modern manuscripts.
This repository focuses on the Medieval Italian corpus, providing ground-truth data for Handwritten Text Recognition (HTR) and layout segmentation.
The dataset is designed to support the creation of robust and reliable HTR models for Italian manuscripts.

| Shelfmark | Links | Type | Century | Color Pages | Main Zones | Lines | Characters | Genre | |--------------------------------------------------------------------------------|----------------------------------------------------|--------|-----------|---------------|--------------|---------|--------------|-----------------------| | Paris, BnF, lat. 17226 | B | prose | 7 | | 20 | 602 | 7085 | Narratives | | Saint-Omer, BM 764 | B | prose | 9 | | 10 | 253 | 9582 | Narratives | | Angers, Archives dpartementales de Maine-et-Loire - H(039) 2 n 176-177 | | prose | 11 | | 1 | 37 | 3341 | Documents of practice | | Angers, Archives dpartementales de Maine-et-Loire - H(045) 1 n 4 | | prose | 11 | | 1 | 52 | 2776 | Documents of practice | | Le Havre, Bm 332 | B | prose | 11 | | 10 | 336 | 13387 | Narratives | | Semur-en-Auxois, Bibliothque municipale, Ms. 1 | B | prose | 11 | | 15 | 230 | 9555 | Narratives | | Archives dpartementales des Yvelines, 45H8 17 | | prose | 12 | | 1 | 41 | 1662 | Documents of practice | | Archives dpartementales des Yvelines, 45H8 8 | | prose | 12 | | 1 | 7 | 350 | Documents of practice | | Besanon, Bibliothque diocsaine - Par. 03 | | prose | 12 | | 2 | 23 | 2163 | Documents of practice | | Bruges. Bibliothque publique, Ms. 403 | B | prose | 12 | | 10 | 445 | 16693 | Narratives | | Laval, Archives de la Mayenne, H 154 | | prose | 12 | | 9 | 388 | 10372 | Documents of practice | | Liege, Archives de l'tat, T51.12 | | prose | 12 | | 1 | 5 | 295 | Documents of practice | | Liege, Archives de l'tat, T51.13 | | prose | 12 | | 1 | 25 | 1925 | Documents of practice | | Liege, Archives de l'tat, T51.14 | | prose | 12 | | 1 | 11 | 913 | Documents of practice | | Paris, Bibliothque de l'Ecole nationale suprieure des Beaux-Arts - Mn.Mas 38 | B | prose | 12 | | 1 | 39 | 2130 | Documents of practice | | Auxerre, Archives dpartementales de l'Yonne - H 2404 | B | prose | 13 | | 1 | 13 | 1086 | Documents of practice | | Cambridge, Corpus Christi College, MS 29 | B | prose | 13 | | 19 | 1608 | 41477 | Narratives | | Graz, Universittsbibliothek, Ms. 1265 | | prose | 13 | | 20 | 868 | 33793 | Treatises | | Nice, AM, AA 1/04 | | prose | 13 | | 1 | 9 | 811 | Documents of practice | | Saint-Omer, BM 716, Tome 7 | B | prose | 13 | | 10 | 445 | 16780 | Narratives | | Paris, Archives nationales, LL 106B | | prose | 14 | | 12 | 171 | 4359 | Documents of practice | | Paris, Archives nationales, LL 108A | | prose | 14 | | 25 | 561 | 15275 | Documents of practice | | Paris, Archives nationales, LL 108B | | prose | 14 | | 26 | 372 | 12401 | Documents of practice | | Paris, BnF, lat. 15168 | B | prose | 14 | | 20 | 887 | 28072 | Treatises | | Paris, Archives nationales, LL 110 | | prose | 15 | | 17 | 305 | 9405 | Documents of practice | | Paris, Archives nationales, LL 125 | | prose | 15 | | 10 | 436 | 18555 | Documents of practice | | Paris, BIU Sant, Mdecine, 5119 | B | prose | 15 | | 11 | 331 | 9880 | Treatises | | Ghent, UL, HS.0011 | B | prose | 16 | | 15 | 1128 | 31383 | Treatises | | Paris, BnF, Smith-Lesouf 35 | B | prose | 16 | | 6 | 127 | 4600 | Treatises |

Dataset Overview

The dataset comprises carefully selected manuscripts, each containing approximately 10 columns of text (equivalent to 5 bi-column pages or 10 single-column pages).
The data adheres to the Segmonto guidelines, ensuring consistency and compatibility with other datasets following the same standards.
Each image is accompanied by two XML files:

  • Files suffixed with .chocomufin.xml are normalized for compliance with broader datasets.
  • The other XML files contain repository-specific information.

We recommend using the normalized .chocomufin.xml files for most applications.

Total number of pages

133

Regions

  • MainZone (277)
  • TableZone (32)
  • MarginTextZone (261)
  • StampZone (14)
  • NumberingZone (84)
  • SealZone (3)
  • DropCapitalZone (129)
  • RunningTitleZone (38)
  • DigitizationArtefactZone (56)
  • GraphicZone (5)
  • DamageZone (14)

Lines

  • DefaultLine (9179)
  • HeadingLine (302)
  • InterlinearLine (274)

Funding and Support

This project is funded by Biblissima+, an observatory for medieval and Renaissance written cultural heritage.
Biblissima+ focuses on the study of the circulation of books and the transmission of texts from the 8th to 18th centuries.
Learn more at the Biblissima+ project page.

License

This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
You are free to share and adapt the material, provided appropriate credit is given.

Citation

If you use this dataset in your research, please cite it as follows:

Hermand, F., Brootcorne, M., Vlachou-Efstathiou, M., Boschetti, F., Fischer, F., Chagu, A., & Clrice, T. HTRogne, Medieval Latin corpus of ground-truth for Handwritten Text Recognition and Layout Segmentation [Data set]. https://github.com/HTRogene/latin

Acknowledgments

We extend our gratitude to the transcribers and supervisors who contributed to the creation of this dataset.

Special thanks to Biblissima+ for their financial support and commitment to advancing the study of medieval manuscripts.

For more information about the HTRogne project and other related resources, please visit the Biblissima+ project page.

Owner

  • Name: HTRogene
  • Login: HTRogene
  • Kind: organization

GitHub Events

Total
  • Release event: 3
  • Watch event: 1
  • Push event: 6
  • Public event: 1
  • Create event: 3
Last Year
  • Release event: 3
  • Watch event: 1
  • Push event: 6
  • Public event: 1
  • Create event: 3