Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: truckthomas
  • Default Branch: main
  • Size: 695 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

FONDUE-JAZZHOT-20TH-PRINT-ZINE

characters badge regions badge lines badge files badge

This repository is a work in progress for Thomas Gauffroy-Naudin's DH certificate project, supervised by Simon Gabay at the University of Geneva.

It currently contains a sample corpus for training an OCR model to transcribe 20th-century print periodicals, specifically using data from the French jazz magazine JAZZ HOT from 1966 to 1980. The sources primarily focus on the news and program columns of these magazines.

Description: - A detailed list of texts is provided below. - the "data" folder contains XML files for training the model, along with the corresponding PNG page images.

"CORPUS_TRAINING" folder

| id | date | pages | notes | |-----------|---------|-------|-------| | JH6602 | 1966-02 | 2 | | | JH6612 | 1966-12 | 2 | | | JH6704 | 1967-04 | 5 | | | JH6803 | 1968-03 | 4 | | | JH6804 | 1968-04 | 1 | | | JH6811 | 1968-11 | 5 | | | JH6902 | 1969-02 | 1 | | | JH6910 | 1969-10 | 1 | | | JH6912 | 1969-12 | 7 | | | JH7009 | 1970-09 | 1 | | | JH7101 | 1971-01 | 2 | | | JH7202 | 1972-02 | 3 | | | JH7304 | 1973-04 | 6 | | | JH7403 | 1974-03 | 4 | | | JH7501 | 1975-01 | 4 | | | JH7605 | 1976-05 | 3 | | | JH7702 | 1977-02 | 4 | | | JH7804 | 1978-04 | 6 | | | JH7909 | 1979-09 | 1 | |

How to cite

Cf. htr-united.yml file.

Licences

Images of are personnal scans of sources from Thomas Gauffroy-Naudin's archive. Transcriptions are under CC BY-NC 4.0 license.

Creative Commons License

Owner

  • Login: truckthomas
  • Kind: user

GitHub Events

Total
  • Member event: 1
  • Push event: 26
Last Year
  • Member event: 1
  • Push event: 26