Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: DEEDS-Project
  • Language: Python
  • Default Branch: main
  • Size: 321 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 11
Created almost 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

DEEDS HTR Dataset

characters badge regions badge lines badge files badge

A collection of transcriptions for medieval cartularies written in Latin used in the training of HTR models to read medieval texts.

Context / Project

  • This repository houses a compilation of transcriptions from medieval cartularies written in Latin, specifically curated to serve as training data for HTR models dedicated to deciphering medieval charter texts written in Latin.
  • This dataset was created using eScriptorium, as an interface for HTR ground truth production, and Kraken, an HTR and layout engine.
  • Composed of primarily English cartulary material ranging from the 9th - 16th Centuries
  • More of a focus on material from the 10th - 15th centuries, to develop the machines capacity to read more complex scripts (compared to earlier scripts which the machine already performs relatively well on)
  • Dataset is mostly made from pre-existing transcriptions
    • Supplemented with machine-generated transcriptions that were corrected manually
  • This dataset is comprised of transcription data from the following cartularies/manuscripts:
    • British Library, Cotton MS Nero E VI
    • Bibliothque Virtuelle des Manuscrits Mdivaux, Cartulaire de l'abbaye du Mont-Saint-Michel, AVRANCHES, Bibliothque municipale, 0210, 1154-1158
    • Bodleian Library, Founder's and Benefactors' book of Tewkesbury Abbey, Bodleian Library MS. Top. Glouc. d. 2, 16th century
    • Christ Church Library, Cartulary of Eynsham Abbey, Christ Church MS 341, 11961197
    • Thomas Fisher Rare Book Library, Confirmatio chartarum, Charta de foresta, et alia statuta regum Henrici III, Edwardi I, et Edwardi II, fisher2:134, 1316-1422
    • British Library, Cartulary of Reading Abbey, Egerton MS 3031, 1190-1199
    • Archives Departementales de Saone-et-Loire, Cartulary of the bishopric of Autun, G 443, 13th century
    • Trinity College Dublin, Cartulary of Torre Abbey, IE TCD MS 524, 13th century

Dataset

| Shelfmark | Holding Institution | Century | Script | Content | |-------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | British Library, MS_Nero E VI | British Library | Secretary (double compartimented a, looped d and tironian et, forked r, final s is a closed loop) | 200r-204r; 205r; 206r-209r; 210r; 211r; 212r; 213r; | Cartulary of the Order of the Hospital of St. John of Jerusalem in England (title); behalf of the prior and brethen of the Hospital of St. John of Jerusalem and begun under Grand Prior Robert Botyll (produced on); Clerkenwell, Essex, England (composed at); Latin (language); vellum (material). Content: Prima Camera of the Hospitaller Cartulary (ff. 3-288), counties: Reynham (200r), Morehall (200v), Westhurrok/Purflete (201-203v), Turrok Grey (204), Chaureth (205-214v). Two hands: first hand until fol. 204 (Turrok Grey), second hand from Chaureth (205r). | | AVRANCHES, Bibliothque municipale, 0210 | Bibliothque Virtuelle des Manuscrits Mdivaux | Praegothica | 5r-9v | Cartulaire du Mont Saint Michel (title); montois monk or abbot (author); Mont St Michel, France (composed at); Latin (language); parchment (material). Content: original part of the cartulary (ff. 1-108), literary text of the Revelatio (ff. 5-10). | | Bodleian Library MS. Top. Glouc. d. 2 | Oxford Library | 6v-7v; 12r: Secretary. 9r-11v: imitates at 13th c. hand: cursiva anglicana imitation | 6v-7v; 9r-12r | Founder's and benefactors' book of Tewkesbury Abbey (title); several hands; Tewkesbury, Gloucestershire, Benedictine abbey of St Mary the Virgin, England (composed at); Latin (language); parchment (material). Charter of William fitz Count, with his portrait (ff.6-7v); Chronicle of founders and benefactors (ff. 8-40v). | | Christ Church MS 341 | Christ Church, University of Oxford | Late protogotic bookhands | 11r; 13v-14r; 15v, 16r; 21r | Cartulary of Eynsham Abbey (title); Monks of Eynsham (authors); Eynsham, England (composed at); Latin (language); parchment (material). Rubric: Carta Regis thelredi de fundatione huius Ecclesie (ff. 7-45), cartulary of Eynsham abbey. |

Collaborators

  • cole Pratique des Hautes tudes
  • cole Nationale des Chartes
  • ALMAnaCH, Inria

Funding

This project is funded by the Social Sciences and Humanities Research Council of Canada (SSHRCC), under the project Text as Image, Image as Text: Charter integrity and topic modelling, an Insight Grant under the code 1350911.

Additional funding is provided by the University of Torontos Work-Study program through its Career & Co-Curricular Learning Network.

Transcription guidelines

The transcription guidelines are described in a paper available on HAL: the project follows the guidelines from the CREMMA Medieval datasets.

Owner

  • Name: DEEDS-Project
  • Login: DEEDS-Project
  • Kind: organization

GitHub Events

Total
  • Release event: 3
  • Push event: 6
  • Create event: 3
Last Year
  • Release event: 3
  • Push event: 6
  • Create event: 3

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • andymckay/get-gist-action master composite
.github/workflows/release.yml actions
  • actions/checkout v2 composite
  • rymndhng/release-on-push-action master composite
.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/validate-cff.yml actions
  • actions/checkout v3 composite
  • dieghernan/cff-validator v3 composite