timeuscorpus

Ground Truth datasets for French 18th and 19th HTR produced by the ANR project TIME US

https://github.com/htr-united/timeuscorpus

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary

Keywords

dataset french htr
Last synced: 6 months ago · JSON representation

Repository

Ground Truth datasets for French 18th and 19th HTR produced by the ANR project TIME US

Basic Info
  • Host: GitHub
  • Owner: HTR-United
  • License: cc-by-4.0
  • Default Branch: main
  • Homepage:
  • Size: 223 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 3
Topics
dataset french htr
Created about 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.md

TIMEUS CORPUS

CC BY 4.0 DOI

Files Badges Regions Badges Lines Badges Chars Badges

Description

Ground Truth datasets for French 18th and 19th HTR produced by the ANR projet TIME US.

Content

Data are stored in the data/ folder. Each folder is organized as such: - all the images are at the root level - ALTO XML versions are in the alto/ folder - PAGE XML versions are in the page/ folder

| # | name | nb of images | GT for segmenter? | GT for recognizer? | description | | --- | :---- | :---: | :---: | :---: | :--- | | 1 | cphparistissage1858 | (159) | n | y | Registers from the Prud'hommes Court for the Textile Industry in Paris, january to june 1858 | | 2 | cphparistissage1878 | (89) | n | y | Registers from the Prud'hommes Court for the Textile Industry in Paris, january 1878 | <!--| | | | | | |-->

Annotation system

... <!-- todo -->

How to cite

This dataset was built within the ANR project TIME US. It is maintained by Alix Chagu (@alix-tz). The original documents are copyright-free, so are the digitization and the transcription. However, digitizing archives and properly annotating a corpus takes time and it is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:

Chagu, A., Champougny, K., Meissel, N., Genero, J., Skilbeck-Gaborit, E., Vanneau, L., Bey, L., Le Fourner, V., Albert, A., Riondet, C., & Martini, M. Time Us Corpus [Data set]. https://github.com/HTR-United/timeuscorpus

@misc{Chague_Time_Us_Corpus, author = {Chagu, Alix and Champougny, Kvin and Meissel, Nina and Genero, Jean-Damien and Skilbeck-Gaborit, Eden and Vanneau, Laurie and Bey, Laura and Le Fourner, Victoria and Albert, Anas and Riondet, Charles and Martini, Manuela}, title = {{Time Us Corpus}}, url = {https://github.com/HTR-United/timeuscorpus} }

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

Owner

  • Name: HTR United
  • Login: HTR-United
  • Kind: organization
  • Location: France

GitHub Events

Total
Last Year

Dependencies

.github/workflows/htr-united.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • andymckay/get-gist-action master composite