Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: HTRomance-Project
  • Default Branch: main
  • Size: 369 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 10
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

HTRomance, Medieval French corpus of ground-truth for Handwritten Text Recognition

and Layout Segmentation

characters badge regions badge lines badge files badge Tests

Introduction

This ground-truth dataset has been carefully built around the idea of having generic data for building a strong and reliable model for HTR of Latin manuscripts. Each manuscript should have around 10 columns (5 bi-columns pages or 10 pages of single column).

Data follow the Segmonto guidelines.

[!NOTE] The repository contains two XML files per image. The ones suffixed with .chocomufin.xml are normalized in order to be compliant with other datasets following the same guidelines. The others are more specific to this repository. We recommend using the normalized documents.

Credits

  • Transcriptions: No Leroy
  • Supervision and manuscript selection: Ariane Pinche & Jean-Baptiste Camps.
  • Project management: Thibault Clrice & Alix Chagu.

Transcription guidelines

The transcription guidelines are described in a paper available on HAL and published in the Journal for Open Humanities Data. The paper provides specific details about the selection process, the transcription methods and choices, as well as details about the output (mainly the Generic CREMMA Model for Medieval Manuscripts (Latin and Old French) for Kraken)

Data

ALTO and images can be found in the directory called data/. Each subfolder of data/ corresponds to a single manuscript, identified by its shelfmark.

| Shelfmark | Links | Range | Type | Century | Color | Pages | Main Zones | Lines | Characters | Genre | Content | |----------------------------------------------------------------------------|--------------------------------|-------------|------------|-----------|---------|---------|--------------|---------|--------------|------------|-----------------------------------------------------------| | BnF, NAF 23686 | | 112ra-114rb | prose | 13 | | 5 | 10 | 424 | 17817 | | Vie de saint Alexis | | BnF, fr. 1443 | | 1ra-3rb | vers | 13 | | 5 | 11 | 418 | 10829 | | Garin le Loherain | | BnF, fr. 1553 | | 506r-508v | vers | 13 | | 5 | 10 | 506 | 11154 | | Le Meunier d'Arleux | | BnF, fr. 1635 | | fol. 4v-5v | vers | 13 | | 3 | 7 | 217 | 4833 | | Testament de l'ne | | BnF, fr. 12581 | | 373r-375v | vers | 13 | | 4 | 8 | 306 | 9289 | | Li Fabliaus des Treces | | BnF, fr. 20050 | | 4r-5v | vers | 13 | | 4 | 4 | 84 | 3793 | | Le chansonnier de saint Germain | | BnF, fr. 1669 | | 1r-3v | prose | 13 | | 3 | 11 | 484 | 10183 | Narratives | roman | | BnF, fr. 747 | | 1v-2v | prose | 13 | | 1 | 2 | 91 | 4351 | | Estoire du Roman del Saint Graal | | BnF, fr. 104 | | 1r-2v | prose | 13 | | 4 | 10 | 404 | 15398 | | Roman de Tristan | | BnF, fr. 2168 | | 88rb | vers | 13 | | 5 | 10 | 370 | 7964 | | Le sacristain | | BnF, NAF 10039 | | 1r-3r | verse | 13 | | 4 | 4 | 118 | 3165 | | Roman d'Aspremont | | BnF, fr. 1450 | | 1r-2v | verse | 13 | | 4 | 14 | 711 | 14855 | | Roman de Troie | | BnF, fr. 17229 | | 127r-129r | prose | 13 | | 3 | 12 | 479 | 12511 | | Legendier | | BnF, fr. 23117 | | 299vc-304rb | prose | 13 | | 5 | 21 | 736 | 19858 | | Vie de saint Martin | | BnF, fr. 6447 | | 270r-271v | prose | 13 | | 4 | 8 | 383 | 13246 | | Vie de saint Martin | | BnF, fr. 2173 | | 96r-97v | vers | 13 | | 4 | 8 | 240 | 5269 | | La Mal Honte | | BnF, fr. 19152 | | 120vd-122rc | vers | 13 | | 4 | 13 | 529 | 11087 | | C'est li Romanz des Braies | | BnF, fr. 12615 | | 230v-231r | vers | 13 | | 2 | 5 | 65 | 3336 | | chansonnier de Noailles _ Chanson d'amour d'Adam le bossu | | BnF, fr. 12603 | | 203ra-205ra | vers | 13 | | 5 | 16 | 442 | 14125 | | Fierabras | | BnF, fr. 12554 | | 1r-2v | prose | 14 | | 4 | 4 | 184 | 7183 | Narratives | roman | | BnF, fr. 5024 | | 1r-3r | prose | 14 | | 4 | 16 | 238 | 10631 | | Le formulaire d'Odart Morchesne | | BnF, fr. 13568 | | 1r-3v | historique | 14 | | 5 | 10 | 199 | 3373 | | Mmoires de saint Louis | | BnF, fr. 13568 | | 1r-5r | prose | 14 | | 5 | 10 | 199 | 3373 | | Mmoires de Froissart | | BnF, fr. 574 | | 4v-5v | religieux | 14 | | 3 | 6 | 116 | 2026 | | Image du monde | | BnF, Arsenal, ms. 3525 | | 88v-91v | vers | 14 | | 7 | 8 | 185 | 4377 | | Dit des trois Dames de Paris_ | | BnF, fr. 12558 | | 1ra-3ra | vers | 14 | | 5 | 10 | 440 | 14016 | | Chevalier du cygne | | BnF, fr. 840 | | 266r-267v | didactique | 14 | | 4 | 8 | 256 | 6374 | | Art de Dictier | | BnF, fr. 619 | | 1ra-4vb | prose | 14 | | 6 | 12 | 356 | 11147 | | Gaston Phbus, Livre de chasse | | BnF, Arsenal, ms. 3346 | | 1r-3v | prose | 15 | | 5 | 11 | 286 | 7194 | | Garin le lorrain | | BnF, fr. 1357 | | 1v-5r | prose | 15 | | 4 | 10 | 320 | 12682 | | Simon de Phares, Recueil des plus celebres astrologues | | BnF, fr. 2701 | | 121r-121v | prose | 15 | | 2 | 11 | 551 | 11654 | | Epitre de Juvnal des ursins | | BnF, Arsenal, ms. 3350 | | 1v-3v | - | 15 | | 4 | 8 | 271 | 7959 | Narratives | _ | | BnF, fr. 11610 | | 1r-4r | prose | 15 | | 7 | 8 | 166 | 5431 | | Roman du comte dArtois. | | BnF, fr. 1881 | | 93r-95r | verse | 16 | | 4 | 11 | 194 | 3941 | Narratives | Vie de saint Alexis | | BnF, fr. 1881 | | 93r-96r | vers | 16 | | 4 | 11 | 194 | 3941 | Narratives | chanson |

Metrics

Total number of pages

147

Regions

  • MainZone (338)
  • DropCapitalZone (220)
  • NumberingZone (70)
  • RunningTitleZone (15)
  • MarginTextZone (41)
  • StampZone (12)
  • DamageZone (4)
  • GraphicZone (17)
  • QuireMarksZone (2)
  • MusicZone (6)

Lines

  • DefaultLine (11045)
  • HeadingLine (75)
  • InterlinearLine (38)
  • DropCapitalLine (4)

Funding

This project was funded by the Bibliothque nationale de France through the 2022 project calls from Datalab for 2023.

Cite the project

Clrice, T., Chagu, A., Gille-Levenson, M., Brisville-Fertin, O., Pinche, A., Camps, J., Fischer, F., Boschetti, F., Guadagnini, E., Guilhem Couffignal, G., Canteaut, O., Romary, L., Reboul, M., Perreaux, N., Poibeau, T., Smith, M., Norindr, J., Glaise, A., Navas Farr, M., Bordier, J., Leroy, N., Alba, R., & Rubin, G. HTRomance [Data set]. https://htromance-project.github.io/ @misc{Clerice_HTRomance, author = {Clrice, Thibault and Chagu, Alix and Gille-Levenson, Matthias and Brisville-Fertin, Olivier and Pinche, Ariane and Camps, Jean-Baptiste and Fischer, Franz and Boschetti, Federico and Guadagnini, Elisa and Guilhem Couffignal, Gilles and Canteaut, Olivier and Romary, Laurent and Reboul, Marianne and Perreaux, Nicolas and Poibeau, Thierry and Smith, Marc and Norindr, Jade and Glaise, Anthony and Navas Farr, Marina and Bordier, Julie and Leroy, No and Alba, Rachele and Rubin, Giorgia}, title = {{HTRomance}}, url = {https://htromance-project.github.io/} }

Infrastructure

This project relied on the CREMMA infrastructure.

Owner

  • Name: HTRomance-Project
  • Login: HTRomance-Project
  • Kind: organization

GitHub Events

Total
  • Release event: 1
  • Push event: 6
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 6
  • Create event: 1

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • andymckay/get-gist-action master composite
.github/workflows/release.yml actions
  • actions/checkout v2 composite
  • rymndhng/release-on-push-action master composite
.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/validate-cff.yml actions
  • actions/checkout v3 composite
  • dieghernan/cff-validator v3 composite