Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: PonteIneptique
  • License: mpl-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 4.93 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 3 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

neNequitia

neNequitia is a software aimed at evaluating CER without ground-truth, to help design transcription campaign in HTR data creation campaigns. By providing insight on the estimated results of models, users can focus on seemingly badly transcribed manuscripts or improve the medium results.

Cite

Use the CITATION.cff or

bibtex @inproceedings{clerice:hal-03828529, TITLE = {{Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts}}, AUTHOR = {Cl{\'e}rice, Thibault}, URL = {https://hal-enc.archives-ouvertes.fr/hal-03828529}, BOOKTITLE = {{Computational Humanities Research Conference (CHR) 2022}}, ADDRESS = {Antwerp, Belgium}, YEAR = {2022}, MONTH = Dec, KEYWORDS = {HTR ; OCR Quality Evaluation ; Historical languages ; Spelling Variation}, PDF = {https://hal-enc.archives-ouvertes.fr/hal-03828529/file/CHR2022___State_of_HTR.pdf}, HAL_ID = {hal-03828529}, HAL_VERSION = {v1}, }

Install

Use pip instal -r requirements

Structure

  • Jupyter notebook models are used for analyzing and running experiments.
  • The nenequitia module is a stand-alone module for development.

Data

Most of the data and models for the paper are available on the release page ( https://github.com/PonteIneptique/neNequitia/releases/tag/chr2022-release )

The list of manuscripts, their automatic transcription with the best model, the full ground truth in XML format of the paper and the predictions of NeNequitia for the automatic transcription of the manuscripts are to be found here : https://zenodo.org/record/7234399#.Y1-d_L7MJhE

License

Mozilla Public License 2.0

Owner

  • Name: Thibault Clérice
  • Login: PonteIneptique
  • Kind: user
  • Location: Chantilly, France
  • Company: PSL ENS - Lattice

Simply working on stuff.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: NeNequitia
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Thibault
    family-names: Clerice
    email: thibault.clerice+citationcff@chartes.psl.eu
    orcid: 'https://orcid.org/0000-0003-1852-9204'
    affiliation: Centre Jean Mabillon
identifiers:
  - type: doi
    value: 10.5281/zenodo.7233985
    description: Zenodo Release of the CHR paper version
repository-code: 'https://github.com/PonteIneptique/neNequitia'
abstract: >-
  As more and more projects openly release ground
  truth for handwritten text recognition (HTR), we
  expect the quality of automatic transcription to
  improve on unseen data. Getting models robust to
  scribal and material changes is a necessary step
  for specific data mining tasks. However, evaluation
  of HTR results requires ground truth to compare
  prediction statistically. In the context of modern
  languages, successful attempts to evaluate quality
  have been done using lexical features or n-grams.
  This, however, proves difficult in the context of
  spelling variation that both Old French and Latin
  have, even more so in the context of sometime
  heavily abbreviated manuscripts. We propose a new
  method based on deep learning where we attempt to
  categorize each line error rate into four error
  rate ranges (0 < 10% < 25% < 50% < 100%) using
  three different encoder (GRU with Attention,
  BiLSTM, TextCNN). To train these models, we propose
  a new dataset engineering approach using early
  stopped model, as an alternative to rule-based fake
  predictions. Our model largely outperforms the
  n-gram approach. We also provide an example
  application to qualitatively analyse our
  classifier, using classification on new prediction
  on a sample of 1,800 manuscripts ranging from the
  9th century to the 15th.
license: MPL-2.0
version: Paper
date-released: '2022-10-31'
preferred-citation:
  title: "Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts"
  authors:
    - given-names: Thibault
      family-names: Clerice
      email: thibault.clerice+citationcff@chartes.psl.eu
      orcid: 'https://orcid.org/0000-0003-1852-9204'
      affiliation: Centre Jean Mabillon
  type: conference-paper
  collection-type: proceedings
  collection-title: "Proceedings of the Conference on Computational Humanities Research 2022"
  url: "https://hal-enc.archives-ouvertes.fr/hal-03828529"
  conference:
    name: "CHR 2022: Computational Humanities Research Conference"
    date-start: "2022-12-12"
    country: "Belgium"
    city: "Antwerp"
    alias: "CHR2022"

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels