nenequitia
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: PonteIneptique
- License: mpl-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 4.93 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
neNequitia
neNequitia is a software aimed at evaluating CER without ground-truth, to help design transcription campaign in HTR data creation campaigns. By providing insight on the estimated results of models, users can focus on seemingly badly transcribed manuscripts or improve the medium results.
Cite
Use the CITATION.cff or
bibtex
@inproceedings{clerice:hal-03828529,
TITLE = {{Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts}},
AUTHOR = {Cl{\'e}rice, Thibault},
URL = {https://hal-enc.archives-ouvertes.fr/hal-03828529},
BOOKTITLE = {{Computational Humanities Research Conference (CHR) 2022}},
ADDRESS = {Antwerp, Belgium},
YEAR = {2022},
MONTH = Dec,
KEYWORDS = {HTR ; OCR Quality Evaluation ; Historical languages ; Spelling Variation},
PDF = {https://hal-enc.archives-ouvertes.fr/hal-03828529/file/CHR2022___State_of_HTR.pdf},
HAL_ID = {hal-03828529},
HAL_VERSION = {v1},
}
Install
Use pip instal -r requirements
Structure
- Jupyter notebook models are used for analyzing and running experiments.
- The
nenequitiamodule is a stand-alone module for development.
Data
Most of the data and models for the paper are available on the release page ( https://github.com/PonteIneptique/neNequitia/releases/tag/chr2022-release )
The list of manuscripts, their automatic transcription with the best model, the full ground truth in XML format of the paper and the predictions of NeNequitia for the automatic transcription of the manuscripts are to be found here : https://zenodo.org/record/7234399#.Y1-d_L7MJhE
License
Mozilla Public License 2.0
Owner
- Name: Thibault Clérice
- Login: PonteIneptique
- Kind: user
- Location: Chantilly, France
- Company: PSL ENS - Lattice
- Website: https://twitter.com/ponteineptique
- Twitter: ponteineptique
- Repositories: 81
- Profile: https://github.com/PonteIneptique
Simply working on stuff.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: NeNequitia
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Thibault
family-names: Clerice
email: thibault.clerice+citationcff@chartes.psl.eu
orcid: 'https://orcid.org/0000-0003-1852-9204'
affiliation: Centre Jean Mabillon
identifiers:
- type: doi
value: 10.5281/zenodo.7233985
description: Zenodo Release of the CHR paper version
repository-code: 'https://github.com/PonteIneptique/neNequitia'
abstract: >-
As more and more projects openly release ground
truth for handwritten text recognition (HTR), we
expect the quality of automatic transcription to
improve on unseen data. Getting models robust to
scribal and material changes is a necessary step
for specific data mining tasks. However, evaluation
of HTR results requires ground truth to compare
prediction statistically. In the context of modern
languages, successful attempts to evaluate quality
have been done using lexical features or n-grams.
This, however, proves difficult in the context of
spelling variation that both Old French and Latin
have, even more so in the context of sometime
heavily abbreviated manuscripts. We propose a new
method based on deep learning where we attempt to
categorize each line error rate into four error
rate ranges (0 < 10% < 25% < 50% < 100%) using
three different encoder (GRU with Attention,
BiLSTM, TextCNN). To train these models, we propose
a new dataset engineering approach using early
stopped model, as an alternative to rule-based fake
predictions. Our model largely outperforms the
n-gram approach. We also provide an example
application to qualitatively analyse our
classifier, using classification on new prediction
on a sample of 1,800 manuscripts ranging from the
9th century to the 15th.
license: MPL-2.0
version: Paper
date-released: '2022-10-31'
preferred-citation:
title: "Ground-truth Free Evaluation of HTR on Old French and Latin Medieval Literary Manuscripts"
authors:
- given-names: Thibault
family-names: Clerice
email: thibault.clerice+citationcff@chartes.psl.eu
orcid: 'https://orcid.org/0000-0003-1852-9204'
affiliation: Centre Jean Mabillon
type: conference-paper
collection-type: proceedings
collection-title: "Proceedings of the Conference on Computational Humanities Research 2022"
url: "https://hal-enc.archives-ouvertes.fr/hal-03828529"
conference:
name: "CHR 2022: Computational Humanities Research Conference"
date-start: "2022-12-12"
country: "Belgium"
city: "Antwerp"
alias: "CHR2022"
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0