genauto-td-htr

Ground Truth generated by GenAuto project for French Civil registry "Tables Décennales"

https://github.com/jpmjpmjpm/genauto-td-htr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Ground Truth generated by GenAuto project for French Civil registry "Tables Décennales"

Basic Info
  • Host: GitHub
  • Owner: jpmjpmjpm
  • License: cc-by-4.0
  • Default Branch: master
  • Size: 2.72 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Created over 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

CC BY 4.0 Lines Badge Chars Badge Region Badges

GenAuto TD Corpus

Description

Ground Truth dataset for French handwritten pages of Civil Registry "Tables Décennales"

Content

150 images and Alto XML files divided into 3 sub-corpus.

Only first names, last names and dates are transcribed and only for birth sections of the documents.

The Alto files contain:

  • Segmentation of the transcribed texts.
  • Transcription of the texts.
  • Polygonalization of the transcribed text zones (performed by kraken OCR solution).

| # | name | nb of images | GT for segmenter? | GT for recognizer? | link(s) to source images | |-----|:--------------|:------------:|:-----------------:|:------------------:|------------------------------------------------------------------------:| | 1 | sermaises | (69) | y | y | Archives départementales du Loiret (Sermaises) | | 2 | rom-1883-1892 | (41) | y | y | Archives départementales de l'Aube (Romilly-sur-Seine) | | 3 | rom-1893-1902 | (40) | y | y | Archives départementales de l'Aube (Romilly-sur-Seine) |

Annotation system

Portions of text that are superscripted are preceded with ^ such as "1er" will be transcribed as "1^er". If several words are superscripted, each word starts with a "^".

How to cite

This dataset was built by Jean-François Boutet and Jean-Pierre Merx.

The original works and their digitization are all copyright-free, but properly annotating a corpus takes time and is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:

``` title : 'GenAuto TD Corpus' url: 'https://github.com/jpmjpmjpm/genauto-td-htr.git' project-name: 'GenAuto' project-website: '' authors: - name: 'Boutet' surname: 'Jean-François' roles: - 'transcriber' - 'aligner' - name: 'Merx' surname: 'Jean-Pierre' roles: - 'transcriber' - 'aligner' - 'project-manager' description: '150 transcribed images from "Tables Décennales" French Civil Registry. Those come from Sermaises and Romilly-sur-Seine municipalities. ' language: 'French'

other-languages:

- "Optional"

script: 'Latin' script-type: 'only-manuscript' time: 1792--1902 hands: - count: 'less-than-11' precision: 'estimated' license: - {name: 'CC-BY 4.0', url: 'https://creativecommons.org/licenses/by/4.0/'} format: 'Alto-XML' volume: - {count: "300", metric: "pages"} - {count: "150, metric: "images"} ```

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Owner

  • Name: Jean-Pierre Merx
  • Login: jpmjpmjpm
  • Kind: user
  • Location: Paris, France
  • Company: AIctivate

A software engineer who turned to sales, but who is still passionate by technology. Amateur mathematician. See more at https://fr.linkedin.com/in/jpmerx.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use those data, please cite it as below."
authors:
- family-names: "Boutet"
  given-names: "Jean-François"
- family-names: "Merx"
  given-names: "Jean-Pierre"
  orcid: "https://orcid.org/0000-0001-5545-2993"
title: "GenAuto TD Corpus"
version: 1.0.0
doi: 10.5281/zenodo.5507403
date-released: 2021-09-14
url: "https://github.com/jpmjpmjpm/genauto-td-htr.git"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1