nenequitia

https://github.com/ponteineptique/nenequitia

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: PonteIneptique
License: mpl-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 4.93 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 1

Created almost 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

neNequitia

neNequitia is a software aimed at evaluating CER without ground-truth, to help design transcription campaign in HTR data creation campaigns. By providing insight on the estimated results of models, users can focus on seemingly badly transcribed manuscripts or improve the medium results.

Cite

Use the CITATION.cff or

bibtex @inproceedings{clerice:hal-03828529, TITLE = {{Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts}}, AUTHOR = {Cl{\'e}rice, Thibault}, URL = {https://hal-enc.archives-ouvertes.fr/hal-03828529}, BOOKTITLE = {{Computational Humanities Research Conference (CHR) 2022}}, ADDRESS = {Antwerp, Belgium}, YEAR = {2022}, MONTH = Dec, KEYWORDS = {HTR ; OCR Quality Evaluation ; Historical languages ; Spelling Variation}, PDF = {https://hal-enc.archives-ouvertes.fr/hal-03828529/file/CHR2022___State_of_HTR.pdf}, HAL_ID = {hal-03828529}, HAL_VERSION = {v1}, }

Install

Use pip instal -r requirements

Structure

Jupyter notebook models are used for analyzing and running experiments.
The nenequitia module is a stand-alone module for development.

Data

Most of the data and models for the paper are available on the release page ( https://github.com/PonteIneptique/neNequitia/releases/tag/chr2022-release )

The list of manuscripts, their automatic transcription with the best model, the full ground truth in XML format of the paper and the predictions of NeNequitia for the automatic transcription of the manuscripts are to be found here : https://zenodo.org/record/7234399#.Y1-d_L7MJhE

License

Mozilla Public License 2.0

Owner

Name: Thibault Clérice
Login: PonteIneptique
Kind: user
Location: Chantilly, France
Company: PSL ENS - Lattice

Website: https://twitter.com/ponteineptique
Twitter: ponteineptique
Repositories: 81
Profile: https://github.com/PonteIneptique

Simply working on stuff.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: NeNequitia
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Thibault
    family-names: Clerice
    email: thibault.clerice+citationcff@chartes.psl.eu
    orcid: 'https://orcid.org/0000-0003-1852-9204'
    affiliation: Centre Jean Mabillon
identifiers:
  - type: doi
    value: 10.5281/zenodo.7233985
    description: Zenodo Release of the CHR paper version
repository-code: 'https://github.com/PonteIneptique/neNequitia'
abstract: >-
  As more and more projects openly release ground
  truth for handwritten text recognition (HTR), we
  expect the quality of automatic transcription to
  improve on unseen data. Getting models robust to
  scribal and material changes is a necessary step
  for specific data mining tasks. However, evaluation
  of HTR results requires ground truth to compare
  prediction statistically. In the context of modern
  languages, successful attempts to evaluate quality
  have been done using lexical features or n-grams.
  This, however, proves difficult in the context of
  spelling variation that both Old French and Latin
  have, even more so in the context of sometime
  heavily abbreviated manuscripts. We propose a new
  method based on deep learning where we attempt to
  categorize each line error rate into four error
  rate ranges (0 < 10% < 25% < 50% < 100%) using
  three different encoder (GRU with Attention,
  BiLSTM, TextCNN). To train these models, we propose
  a new dataset engineering approach using early
  stopped model, as an alternative to rule-based fake
  predictions. Our model largely outperforms the
  n-gram approach. We also provide an example
  application to qualitatively analyse our
  classifier, using classification on new prediction
  on a sample of 1,800 manuscripts ranging from the
  9th century to the 15th.
license: MPL-2.0
version: Paper
date-released: '2022-10-31'
preferred-citation:
  title: "Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts"
  authors:
    - given-names: Thibault
      family-names: Clerice
      email: thibault.clerice+citationcff@chartes.psl.eu
      orcid: 'https://orcid.org/0000-0003-1852-9204'
      affiliation: Centre Jean Mabillon
  type: conference-paper
  collection-type: proceedings
  collection-title: "Proceedings of the Conference on Computational Humanities Research 2022"
  url: "https://hal-enc.archives-ouvertes.fr/hal-03828529"
  conference:
    name: "CHR 2022: Computational Humanities Research Conference"
    date-start: "2022-12-12"
    country: "Belgium"
    city: "Antwerp"
    alias: "CHR2022"

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

nenequitia

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

neNequitia

Cite

Install

Structure

Data

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels