fasterdan

https://github.com/factodeeplearning/fasterdan

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, ieee.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: FactoDeepLearning
License: other
Language: Python
Default Branch: main
Size: 128 KB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 0

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

This project is under CeCILL-C license (full details in LICENSE_CECILL-C.md).

This repository is a public implementation of the paper: "Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition", International Conference on Document Analysis and Recognition, 2023.

The paper is available on Arxiv.

Click to see the demo:

Pretrained model weights are available here and here

Table of contents: 1. Getting Started 2. Datasets 3. Training And Evaluation

Getting Started

We used Python 3.10.4, Pytorch 1.12.0 and CUDA 10.2.

Clone the repository:

git clone https://github.com/FactoDeepLearning/FasterDAN.git

Install the dependencies in conda env:

conda create --name fdan conda activate fdan cd FasterDAN pip install -e . cd faster_dan

Datasets

We used three datasets in the paper: RIMES 2009, READ 2016 and MAURDOR.

RIMES dataset at page level was distributed during the evaluation compaign of 2009.

The MAURDOR dataset was distributed during the evaluation compaign of 2013. It is now available here.

READ 2016 dataset corresponds to the one used in the ICFHR 2016 competition on handwritten text recognition. It can be found here

Raw dataset files must be placed in Datasets/raw/{dataset_name} \ where dataset name is "READ 2016", "RIMES" or "Maurdor".

Training And Evaluation

Step 1: Download the datasets and place the raw files in the following folder: Datasets/raw/{dataset_name}

Step 2: Format the dataset

python3 Datasets/dataset_formatters/read2016_formatter.py python3 Datasets/dataset_formatters/rimes_formatter.py python3 Datasets/dataset_formatters/maurdor_formatter.py

Step 3: Add any font you want as .ttf file in the folder Fonts

Step 4 : Generate synthetic line dataset and pretrain on it

cd OCR/line_OCR/ctc/ python3 main_syn_line.py # generation python3 main_line_ctc_syn.py # training There are two lines in this script to adapt to the used dataset: model.generate_syn_line_dataset("READ_2016_syn_line") dataset_name = "READ_2016"

Weights and evaluation results are stored in OCR/line_OCR/ctc/outputs

Step 6 : Training the Faster DAN / DAN

cd OCR/document_OCR/faster_dan/ python3 main_faster_dan.py # faster dan python3 main_std_dan.py # original dan

Weights and evaluation results are stored in OCR/document_OCR/dan/outputs

Remarks (for pre-training and training)

Scripts are given for the READ 2016 dataset and must be adapted for RIMES 2009 and MAURDOR (mostly dataset_name parameter, and pretraining paths) All hyperparameters are specified and editable in the training scripts (meaning are in comments).\ Evaluation is performed just after training ending (training is stopped when the maximum elapsed time is reached or after a maximum number of epoch as specified in the training script).\ The outputs files are split into two subfolders: "checkpoints" and "results". \ "checkpoints" contains model weights for the last trained epoch and for the epoch giving the best CER on the validation set. \ "results" contains tensorboard log for loss and metrics as well as text file for used hyperparameters and results of evaluation.

Citation

bibtex @inproceedings{Coquenet2023fasterdan, author = {Coquenet, Denis and Chatelain, Clément and Paquet, Thierry}, title = {Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition}, booktitle={International Conference on Document Analysis and Recognition (ICDAR)}, year={2023}, pages={182--199}, series={Lecture Notes in Computer Science}, volume={14190}, doi={10.1007/978-3-031-41685-9_12}, url={https://arxiv.org/abs/2301.10593}, }

License

This project is under CeCILL-C license.

Owner

Login: FactoDeepLearning
Kind: user

Website: https://factodeeplearning.github.io/
Repositories: 4
Profile: https://github.com/FactoDeepLearning

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'Faster DAN'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: 'Denis '
    family-names: Coquenet
    orcid: 'https://orcid.org/0000-0001-5203-9423'
  - name: Université de Rouen Normandie
  - name: INSA Rouen
  - name: LITIS
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2301.10593'
repository-code: 'https://github.com/FactoDeepLearning/FasterDAN/'
license: CECILL-C

GitHub Events

Total

Issues event: 1
Watch event: 6
Issue comment event: 1
Public event: 1
Push event: 2
Fork event: 1

Last Year

Issues event: 1
Watch event: 6
Issue comment event: 1
Public event: 1
Push event: 2
Fork event: 1

Dependencies

setup.py pypi

editdistance *
fonttools *
networkx *
opencv-python *
pillow *
pyunpack *
scikit-learn *
tensorboard *
torch ==1.12.1
torchvision ==0.13.1
tqdm *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science