htr-dataset
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: DEEDS-Project
- Language: Python
- Default Branch: main
- Size: 321 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 11
Metadata Files
README.md
DEEDS HTR Dataset
A collection of transcriptions for medieval cartularies written in Latin used in the training of HTR models to read medieval texts.
Context / Project
- This repository houses a compilation of transcriptions from medieval cartularies written in Latin, specifically curated to serve as training data for HTR models dedicated to deciphering medieval charter texts written in Latin.
- This dataset was created using eScriptorium, as an interface for HTR ground truth production, and Kraken, an HTR and layout engine.
- Composed of primarily English cartulary material ranging from the 9th - 16th Centuries
- More of a focus on material from the 10th - 15th centuries, to develop the machines capacity to read more complex scripts (compared to earlier scripts which the machine already performs relatively well on)
- Dataset is mostly made from pre-existing transcriptions
- Supplemented with machine-generated transcriptions that were corrected manually
- This dataset is comprised of transcription data from the following cartularies/manuscripts:
- British Library, Cotton MS Nero E VI
- Bibliothque Virtuelle des Manuscrits Mdivaux, Cartulaire de l'abbaye du Mont-Saint-Michel, AVRANCHES, Bibliothque municipale, 0210, 1154-1158
- Bodleian Library, Founder's and Benefactors' book of Tewkesbury Abbey, Bodleian Library MS. Top. Glouc. d. 2, 16th century
- Christ Church Library, Cartulary of Eynsham Abbey, Christ Church MS 341, 11961197
- Thomas Fisher Rare Book Library, Confirmatio chartarum, Charta de foresta, et alia statuta regum Henrici III, Edwardi I, et Edwardi II, fisher2:134, 1316-1422
- British Library, Cartulary of Reading Abbey, Egerton MS 3031, 1190-1199
- Archives Departementales de Saone-et-Loire, Cartulary of the bishopric of Autun, G 443, 13th century
- Trinity College Dublin, Cartulary of Torre Abbey, IE TCD MS 524, 13th century
Dataset
| Shelfmark | Holding Institution | Century | Script | Content | |-------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | British Library, MS_Nero E VI | British Library | Secretary (double compartimented a, looped d and tironian et, forked r, final s is a closed loop) | 200r-204r; 205r; 206r-209r; 210r; 211r; 212r; 213r; | Cartulary of the Order of the Hospital of St. John of Jerusalem in England (title); behalf of the prior and brethen of the Hospital of St. John of Jerusalem and begun under Grand Prior Robert Botyll (produced on); Clerkenwell, Essex, England (composed at); Latin (language); vellum (material). Content: Prima Camera of the Hospitaller Cartulary (ff. 3-288), counties: Reynham (200r), Morehall (200v), Westhurrok/Purflete (201-203v), Turrok Grey (204), Chaureth (205-214v). Two hands: first hand until fol. 204 (Turrok Grey), second hand from Chaureth (205r). | | AVRANCHES, Bibliothque municipale, 0210 | Bibliothque Virtuelle des Manuscrits Mdivaux | Praegothica | 5r-9v | Cartulaire du Mont Saint Michel (title); montois monk or abbot (author); Mont St Michel, France (composed at); Latin (language); parchment (material). Content: original part of the cartulary (ff. 1-108), literary text of the Revelatio (ff. 5-10). | | Bodleian Library MS. Top. Glouc. d. 2 | Oxford Library | 6v-7v; 12r: Secretary. 9r-11v: imitates at 13th c. hand: cursiva anglicana imitation | 6v-7v; 9r-12r | Founder's and benefactors' book of Tewkesbury Abbey (title); several hands; Tewkesbury, Gloucestershire, Benedictine abbey of St Mary the Virgin, England (composed at); Latin (language); parchment (material). Charter of William fitz Count, with his portrait (ff.6-7v); Chronicle of founders and benefactors (ff. 8-40v). | | Christ Church MS 341 | Christ Church, University of Oxford | Late protogotic bookhands | 11r; 13v-14r; 15v, 16r; 21r | Cartulary of Eynsham Abbey (title); Monks of Eynsham (authors); Eynsham, England (composed at); Latin (language); parchment (material). Rubric: Carta Regis thelredi de fundatione huius Ecclesie (ff. 7-45), cartulary of Eynsham abbey. |
Collaborators
- cole Pratique des Hautes tudes
- cole Nationale des Chartes
- ALMAnaCH, Inria
Funding
This project is funded by the Social Sciences and Humanities Research Council of Canada (SSHRCC), under the project Text as Image, Image as Text: Charter integrity and topic modelling, an Insight Grant under the code 1350911.
Additional funding is provided by the University of Torontos Work-Study program through its Career & Co-Curricular Learning Network.
Transcription guidelines
The transcription guidelines are described in a paper available on HAL: the project follows the guidelines from the CREMMA Medieval datasets.
Owner
- Name: DEEDS-Project
- Login: DEEDS-Project
- Kind: organization
- Repositories: 1
- Profile: https://github.com/DEEDS-Project
GitHub Events
Total
- Release event: 3
- Push event: 6
- Create event: 3
Last Year
- Release event: 3
- Push event: 6
- Create event: 3
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- andymckay/get-gist-action master composite
- actions/checkout v2 composite
- rymndhng/release-on-push-action master composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- dieghernan/cff-validator v3 composite