Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: FactoDeepLearning
  • License: other
  • Language: Python
  • Default Branch: master
  • Size: 1.68 MB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 1
  • Open Issues: 5
  • Releases: 0
Created over 5 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

Vertical Attention Network: an end-to-end model for handwritten text recognition at paragraph level.

This project is under CeCILL-C license (full details in LICENSE_CeCILL-C.md).

This repository is a public implementation of the paper: "End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network".

To discover my other works, here is my academic page.

The paper is available at https://arxiv.org/abs/2012.03868.

Click to see demo

It focuses on Optical Character Recognition (OCR) applied at line and paragraph levels.

We obtained the following results at line level:

| Dataset | cer | wer | |:------------:|:----:|:-----:| | IAM | 4.97 | 16.31 | | RIMES | 3.08 | 8.14 | | READ2016 | 4.10 | 16.29 |

For the paragraph level, here are the results:

| Dataset | cer | wer | |:------------:|:----:|:-----:| | IAM | 4.45 | 14.55 | | RIMES | 1.91 | 6.72 | | READ2016 | 3.59 | 13.94 |

Pretrained model weights are available here and here.

Table of contents: 1. Getting Started 2. Datasets 3. Training And Evaluation

Getting Started

Implementation has been tested with Python 3.6.

Clone the repository:

git clone https://github.com/FactoDeepLearning/VerticalAttentionOCR.git

Install the dependencies:

pip install -r requirements.txt

Datasets

This section is dedicated to the datasets used in the paper: download and formatting instructions are provided for experiment replication purposes.

IAM

Details

IAM corresponds to English grayscale handwriting images (from the LOB corpus). We provide a script to format this dataset for the commonly used split for result comparison purposes. The different splits are as follow:

| | train | validation | test | |:---------:|:-----:|:----------:|:-----:| | line | 6,482 | 976 | 2,915 | | paragraph | 747 | 116 | 336 |

Download

  • Register at the FKI's webpage
  • Download the dataset here
  • Move the following files into the folder Datasets/raw/IAM/
    • formsA-D.tgz
    • formsE-H.tgz
    • formsI-Z.tgz
    • lines.tgz
    • ascii.tgz

RIMES

Details

RIMES corresponds to french grayscale handwriting images. We provide a script to format this dataset for the commonly used split for result comparison purposes. The different splits are as follow:

| | train | validation | test | |:---------:|:-----:|:----------:|:-----:| | line | 9,947 | 1,333 | 778 | | paragraph | 1400 | 100 | 100 |

Download

  • Fill in the a2ia user agreement form available here and send it by email to rimesnda@a2ia.com. You will receive by mail a username and a password
  • Login in and download the data from here
  • Move the following files into the folder Datasets/raw/RIMES/
    • eval2011annotated.xml
    • eval2011gray.tar
    • training2011gray.tar
    • training_2011.xml

READ 2016

Details

READ 2016 corresponds to Early Modern German RGB handwriting images. We provide a script to format this dataset for the commonly used split for result comparison purposes. The different splits are as follow:

| | train | validation | test | |:---------:|:-----:|:----------:|:-----:| | line | 8,349 | 1,040 | 1,138| | paragraph | 1584 | 179 | 197 |

Download

  • From root folder:

cd Datasets/raw mkdir READ_2016 cd READ_2016 wget https://zenodo.org/record/1164045/files/{Test-ICFHR-2016.tgz,Train-And-Val-ICFHR-2016.tgz}

Format the datasets

  • Comment/Uncomment the following lines from the main function of the script "format_datasets.py" according to your needs and run it

``` if name == "main":

# format_IAM_line()
# format_IAM_paragraph()

# format_RIMES_line()
# format_RIMES_paragraph()

# format_READ2016_line()
# format_READ2016_paragraph()

```

  • This will generate well-formated datasets, usable by the training scripts.

Training And Evaluation

You need to have a properly formatted dataset to train a model, please refer to the section Datasets.

Two scripts are provided to train respectively line and paragraph level models: OCR/lineOCR/ctc/mainlinectc.py and OCR/documentOCR/vattention/mainpg_va.py

Training a model leads to the generation of output files ; they are located in the output folder OCR/lineOCR/ctc/outputs/#TrainingName or OCR/documentOCR/v_attention/outputs/#TrainingName.

The outputs files are split into two subfolders: "checkpoints" and "results". "checkpoints" contains model weights for the last trained epoch and for the epoch giving the best valid CER. "results" contains tensorboard log for loss and metrics as well as text file for used hyperparameters and results of evaluation.

Training can use apex package for mix-precision and Distributed Data Parallel for usage on multiple GPU.

All hyperparameters are specified and editable in the training scripts (meaning are in comments).

Evaluation is performed just after training ending (training is stopped when the maximum elapsed time is reached or after a maximum number of epoch as specified in the training script)

Citation

bibtex @ARTICLE{Coquenet2023a, author={Coquenet, Denis and Chatelain, Clement and Paquet, Thierry}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, title={End-to-end Handwritten Paragraph Text Recognition Using a Vertical Attention Network}, year={2023}, volume={45}, number={1}, pages={508-524}, doi={10.1109/TPAMI.2022.3144899} }

License

This project is under Cecill-C license.

Owner

  • Login: FactoDeepLearning
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'VAN: Vertical Attention Network'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: 'Denis '
    family-names: Coquenet
    orcid: 'https://orcid.org/0000-0001-5203-9423'
  - name: Université de Rouen Normandie
  - name: INSA Rouen
  - name: LITIS
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2012.03868'
repository-code: 'https://github.com/FactoDeepLearning/VAN/'
license: CECILL-C

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
  • Fork event: 1
Last Year
  • Watch event: 2
  • Push event: 1
  • Fork event: 1

Dependencies

requirements.txt pypi
  • Markdown ==3.3.3
  • MarkupSafe ==1.1.1
  • PasteDeploy ==2.1.1
  • Pillow ==8.0.1
  • PyWavelets ==1.1.1
  • SQLAlchemy ==1.3.20
  • WTForms ==2.3.3
  • WebOb ==1.8.6
  • Werkzeug ==1.0.1
  • absl-py ==0.11.0
  • anykeystore ==0.2
  • apex ==0.1
  • cachetools ==4.1.1
  • certifi ==2020.11.8
  • chardet ==3.0.4
  • cryptacular ==1.5.5
  • cycler ==0.10.0
  • decorator ==4.4.2
  • defusedxml ==0.6.0
  • editdistance ==0.5.3
  • future ==0.18.2
  • google-auth ==1.23.0
  • google-auth-oauthlib ==0.4.2
  • grpcio ==1.33.2
  • hupper ==1.10.2
  • idna ==2.10
  • imageio ==2.9.0
  • importlib-metadata ==2.0.0
  • kiwisolver ==1.3.1
  • matplotlib ==3.3.3
  • networkx ==2.5
  • numpy ==1.19.4
  • oauthlib ==3.1.0
  • opencv-python ==4.4.0.46
  • pbkdf2 ==1.3
  • plaster ==1.0
  • plaster-pastedeploy ==0.7
  • protobuf ==3.13.0
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • pyparsing ==2.4.7
  • pyramid ==1.10.5
  • pyramid-mailer ==0.15.1
  • python-dateutil ==2.8.1
  • python3-openid ==3.2.0
  • repoze.sendmail ==4.4.1
  • requests ==2.25.0
  • requests-oauthlib ==1.3.0
  • rsa ==4.6
  • scikit-image ==0.17.2
  • scipy ==1.5.4
  • six ==1.15.0
  • tensorboard ==2.4.0
  • tensorboard-plugin-wit ==1.7.0
  • tifffile ==2020.9.3
  • torch ==1.6.0
  • torchvision ==0.7.0
  • tqdm ==4.51.0
  • transaction ==3.0.0
  • translationstring ==1.4
  • urllib3 ==1.26.2
  • velruse ==1.1.1
  • venusian ==3.0.0
  • wtforms-recaptcha ==0.3.2
  • zipp ==3.4.0
  • zope.deprecation ==4.4.0
  • zope.interface ==5.2.0
  • zope.sqlalchemy ==1.3