genauto-td-htr
Ground Truth generated by GenAuto project for French Civil registry "Tables Décennales"
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.8%) to scientific vocabulary
Repository
Ground Truth generated by GenAuto project for French Civil registry "Tables Décennales"
Basic Info
- Host: GitHub
- Owner: jpmjpmjpm
- License: cc-by-4.0
- Default Branch: master
- Size: 2.72 MB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
GenAuto TD Corpus
Description
Ground Truth dataset for French handwritten pages of Civil Registry "Tables Décennales"
Content
150 images and Alto XML files divided into 3 sub-corpus.
Only first names, last names and dates are transcribed and only for birth sections of the documents.
The Alto files contain:
- Segmentation of the transcribed texts.
- Transcription of the texts.
- Polygonalization of the transcribed text zones (performed by kraken OCR solution).
| # | name | nb of images | GT for segmenter? | GT for recognizer? | link(s) to source images | |-----|:--------------|:------------:|:-----------------:|:------------------:|------------------------------------------------------------------------:| | 1 | sermaises | (69) | y | y | Archives départementales du Loiret (Sermaises) | | 2 | rom-1883-1892 | (41) | y | y | Archives départementales de l'Aube (Romilly-sur-Seine) | | 3 | rom-1893-1902 | (40) | y | y | Archives départementales de l'Aube (Romilly-sur-Seine) |
Annotation system
Portions of text that are superscripted are preceded with ^ such as "1er" will be transcribed as "1^er".
If several words are superscripted, each word starts with a "^".
How to cite
This dataset was built by Jean-François Boutet and Jean-Pierre Merx.
The original works and their digitization are all copyright-free, but properly annotating a corpus takes time and is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:
``` title : 'GenAuto TD Corpus' url: 'https://github.com/jpmjpmjpm/genauto-td-htr.git' project-name: 'GenAuto' project-website: '' authors: - name: 'Boutet' surname: 'Jean-François' roles: - 'transcriber' - 'aligner' - name: 'Merx' surname: 'Jean-Pierre' roles: - 'transcriber' - 'aligner' - 'project-manager' description: '150 transcribed images from "Tables Décennales" French Civil Registry. Those come from Sermaises and Romilly-sur-Seine municipalities. ' language: 'French'
other-languages:
- "Optional"
script: 'Latin' script-type: 'only-manuscript' time: 1792--1902 hands: - count: 'less-than-11' precision: 'estimated' license: - {name: 'CC-BY 4.0', url: 'https://creativecommons.org/licenses/by/4.0/'} format: 'Alto-XML' volume: - {count: "300", metric: "pages"} - {count: "150, metric: "images"} ```
This work is licensed under a Creative Commons Attribution 4.0 International License.
Owner
- Name: Jean-Pierre Merx
- Login: jpmjpmjpm
- Kind: user
- Location: Paris, France
- Company: AIctivate
- Website: http://www.mathcounterexamples.net/
- Repositories: 1
- Profile: https://github.com/jpmjpmjpm
A software engineer who turned to sales, but who is still passionate by technology. Amateur mathematician. See more at https://fr.linkedin.com/in/jpmerx.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use those data, please cite it as below." authors: - family-names: "Boutet" given-names: "Jean-François" - family-names: "Merx" given-names: "Jean-Pierre" orcid: "https://orcid.org/0000-0001-5545-2993" title: "GenAuto TD Corpus" version: 1.0.0 doi: 10.5281/zenodo.5507403 date-released: 2021-09-14 url: "https://github.com/jpmjpmjpm/genauto-td-htr.git"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
