character-queries

The official implementation of "Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation"

https://github.com/jungomi/character-queries

Last synced: 6 months ago · JSON representation ·

Repository

The official implementation of "Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation"

Basic Info

Host: GitHub
Owner: jungomi
License: mit
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2309.03072
Size: 85.9 KB

Statistics

Stars: 8
Watchers: 2
Forks: 2
Open Issues: 1
Releases: 0

Created almost 3 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation

Model Overview

Models

Download the model checkpoints:

| Model | #params | IAM Test Set T | IAM Test Set F | VNOnDB Test Set | |------------------------------------------------------|---------|----------------|----------------|-----------------| | Character Query Transformer | 6.47M | 92.28 | 95.11 | 92.06 |

Ground Truth

The ground truth segmentation annotations for IAM-OnDB and HANDS-VNOnDB can be downloaded from SWITCHDrive or with the direct links to each file in the table below.

Note: This does not contain the on-line handwriting dataset themselves, but only the ground truth segmentation annotations. The IAM-OnDB and HANDS-VNOnDB datasets need to be downloaded separately.

| Dataset | File needed | GT Train | GT Validation | GT Test | |------------------------|-------------------------------------|------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------------------------------------| | IAM-OnDB | lineStrokes-all.tar.gz | trainset_segmented.json | testsetvsegmented.json | testsettsegmented.json, testsetfsegmented.json | | HANDS-VNOnDB | InkData_word.zip | InkDatawordtrain_segmented.json | InkDatawordvalidation_segmented.json | InkDatawordtest_segmented.json |

Direct downloads when clicking on the zip/tar.gz file requires you to be logged in on their website, it might be necessary to click on the dataset link an log in from there.

Convert to JSON

In order to use the ground truth, it needs to be combined with the corresponding data points from the respective datasets. The data points are stored in a single archive (zip/tar.gz) in InkML files, to make it simpler to use for the data loading, they are converted to individual JSON files, which contain both the segmentation annotations as well as all necessary point information. It can be converted with the convert_gt.py script as follows:

```sh

IAM-OnDB (train, validation, test set T and test set F)

python convertgt.py -d data/iam/lineStrokes-all.tar.gz -s data/gt/iam/trainsetsegmented.json path/to/gt/iam/testsetvsegmented.json data/gt/iam/testsettsegmented.json data/gt/iam/testsetfsegmented.json -o data/converted/iam

HANDS-VNOnDB (train, validation and test set)

python convertgt.py -d data/vnondb/InkDataword.zip -s data/gt/vnondb/InkDatawordtrainsegmented.json data/gt/vnondb/InkDatawordvalidationsegmented.json data/gt/vnondb/InkDatawordtest_segmented.json -o data/converted/vnondb -t vnondb ```

After this there is one directory for each subset and a corresponding .tsv file, which can be used as an index. The following file structure is produced with the aforementioned commands:

``` data/converted/ ├── iam │ ├── testsetfsegmented/ │ │ ├── a01-013z-01.xml.json │ │ ... │ ├── testsetfsegmented.tsv │ ├── testsettsegmented/ │ │ ├── a01-000u-03.xml.json │ │ ... │ ├── testsettsegmented.tsv │ ├── testsetvsegmented/ │ │ ├── a01-003-01.xml.json │ │ ... │ ├── testsetvsegmented.tsv │ ├── trainsetsegmented/ │ │ ├── a01-001w-02.xml.json │ │ ... │ └── trainsetsegmented.tsv └── vnondb ├── InkDatawordtestsegmented/ │ ├── 20151208014671051.inkml0.json │ ... ├── InkDatawordtestsegmented.tsv ├── InkDatawordtrainsegmented/ │ ├── 201406030003BCCTC.inkml0.json │ ... ├── InkDatawordtrainsegmented.tsv ├── InkDatawordvalidationsegmented/ │ ├── 20151224014178181.inkml0.json │ ... └── InkDatawordvalidation_segmented.tsv

```

Requirements

The dependencies can automatically be installed with the install_requirements.py script.

sh python install_requirements.py

It installs all dependencies listed in requirements.txt and dev dependencies (checker, linter, formatter).

Optionally, the targets to install can be specified as arguments:

```sh

Install all dependencies

python install_requirements.py deps dev

Equivalent to the above (default)

python install_requirements.py all ```

For convenience: all = [deps, dev].

Manually with pip

All dependencies can be installed manually with pip.

sh pip install -r requirements.txt

On Windows the PyTorch packages may not be available on PyPi, hence you need to point to the official PyTorch registry:

sh pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

If you'd like to use a different installation method or another CUDA version with PyTorch follow the instructions on PyTorch - Getting Started.

Usage

Training

Training is done with the train.py script:

sh python train.py --name some-name --train-gt /path/to/gt.tsv --validation-gt /path/to/gt.tsv difficult=/another/path/some-gt.tsv --chars /path/to/chars.tsv --fp16 --ema

The --name option is used to give it a name, otherwise the current date and time is used as a name. Bes resume from the given checkpoint, if not specified it starts fresh.

Multiple validation datasets can be specified, optionally with a name, --validation-gt /path/to/gt.tsv difficult=/another/path/some-gt.tsv would use two validation sets. When no name is specified, the name of the ground truth file and its parent directory is used. In the previous example the two sets would have the names: to/gt and difficult. The best checkpoints are determined by the average across all validation sets.

In order to know which characters are available to the model, the --chars option needs to be the path to a TSV file with a list of characters, where each character is on a new line. It is a TSV file because that allows to have multiple columns, where the first column is the character and any additional column will be ignored, which can be useful when storing statistics of the occurring characters within the dataset in the additional columns. A simple text file with just the characters, each on its own line, works just as well.

Modern GPUs contain Tensor Cores (starting from V100 and RTX series) which enable mixed precision calculation, using optimised fp16 operations while still keeping the fp32 weights and therefore precision.

It can be enabled by setting the --fp16 flag.

Other GPUs without Tensor Cores do not benefit from using mixed precision since they only do fp32 operations and you may find it even becoming slower.

The --ema flag enables the Exponential Moving Average (EMA) of the model parameters, which helps stabilise the final model and is recommend to always use.

For all options see python train.py --help.

Training a Character Query Transformer

Most of the default values are set for the character queries, so only a few additional arguments are needed besides specifying the datasets.

sh python train.py \ --name character-queries-iam-and-vnondb \ --gt-train data/converted/combined-iam-vnondb/train.tsv \ --gt-validation \ IAM_Validation=data/converted/iam/testset_v_segmented.tsv \ VNONDB_Validation=data/converted/vnondb/InkData_word_validation_segmented.tsv \ --chars data/converted/combined-iam-vnondb/chars.tsv \ -b $BATCH_SIZE \ --fp16 \ --ema \ --features-normalise

Training an LSTM

For the LSTM more options need to be changed.

sh python train.py \ --name lstm-iam-and-vnondb \ --gt-train data/converted/combined-iam-vnondb/train.tsv \ --gt-validation \ IAM_Validation=data/converted/iam/testset_v_segmented.tsv \ VNONDB_Validation=data/converted/vnondb/InkData_word_validation_segmented.tsv \ --chars data/converted/combined-iam-vnondb/chars.tsv \ -b $BATCH_SIZE \ --fp16 \ --ema \ --features x:delta y:delta index stroke global_index ctc_spike:embed \ -l 3e-3 \ --lr-warmup 5 \ --activation relu \ -m rnn

Logs

During the training various types of logs are created with Lavd and everything can be found in log/ and is grouped by the experiment name.

Summary
Checkpoints
Top 5 Checkpoints
Event logs

To visualise the logged data run:

sh lavd log/

Exporting Model

A model can be exported (JIT compiled) such that it can be used in Python or C++ directly without having to manually define the models. It can be loaded directly with torch.jit.load in Python or with the equivalent function torch::jit::load in C++.

sh python export_model.py -c log/some-name/best/ -o exported/best-model.ptc

When a directory is given to -c/--checkpoint instead of the model checkpoint directly, it will automatically look for the model.pt file in that directory.

The exported model will be saved where to the path given to -o/--output or if not specified, the model will be saved as exported/{model-kind}-{date}.ptc, e.g. exported/rnn-2022-03-22.ptc. It is recommended to use the file extension .ptc, where the c stands for compiled, in order to easily distinguish the exported models from saved checkpoints.

Development

To ensure consistency in the code, the following tools are used and also verified in CI:

ruff: Linting
mypy: Type checking
black: Formatting
isort: Import sorting / formatting

All of these tools are installed with the installation script, when all dependencies are installed and are also available with the dev group:

```sh python install_requirements.py

Or only the these dev tools

python install_requirements.py dev ```

It is recommended to have an editor configured such that it uses these tools, for example with the Python language server, which uses the Language Server Protocol (LSP), which allows you to easily see the errors / warnings and also format the code (potentially, automatically on save) and other helpful features.

Almost all configurations are kept at their default, but because of conflicts, a handful of them needed to be changed. These modified options are configured in pyproject.toml, hence if your editor does not agree with CI, it is most likely due to the config not being respected, or by using a different tool that may be used as a substitute.

Pre-Commit Hooks

All checks can be run on each commit with the Python package pre-commit.

First it needs to be installed:

sh pip install pre-commit

And afterwards the git pre-commit hooks need to be created:

sh pre-commit install

From now on, the hook will run the checks automatically for the changes in the commit (not all files).

However, you can run the checks manually on all files if needed with the -a/--all flag:

sh pre-commit run --all

Debugger

Python's included debugger pdb does not work for multi-processing and just crashes when the breakpoint is reached. There is a workaround to make it work with multiple processes, which is included here, but it is far from pleasant to use since the same TTY is shared and often alternates, making the debugging session frustrating, especially since the readline features do not work with this workaround.

A much better debugger uses the Debugger Adapter Protocol (DAP) for remote debugging, which allows to have a full debugging experience from any editor that supports DAP. In order to enable this debugger you need to have debugpy installed.

sh pip install debugpy

To start a debugging sessions, a breakpoint needs to be set with custom breakpoint function defined in debugger.py:

```py from debugger import breakpoint

...

breakpoint("Optional Message") ```

This will automatically enable the debugger at the specified port (default: 5678) and for every additional process, it will simply create a different session, with the port incremented by one.

If debugpy is not installed, it will fall back to the multi-processing version of PDB..

Should your editor not support DAP (e.g. PyCharm doesn't and probably won't ever), it is easiest to use VSCode for this.

License

The code and models are released under the MIT License and the ground truth annotation data is licensed under CC BY 4.0.

Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

bibtex @inproceedings{jungo2023characterqueries, author={Jungo, Michael and Wolf, Beat and Maksai, Andrii and Musat, Claudiu and Fischer, Andreas}, title={Character Queries: A Transformer-Based Approach to On-line Handwritten Character Segmentation}, doi={10.1007/978-3-031-41676-7_6}, booktitle={Document Analysis and Recognition - ICDAR 2023}, editor={Fink, Gernot A. and Jain, Rajiv and Kise, Koichi and Zanibbi, Richard}, year={2023}, publisher={Springer Nature Switzerland}, pages={98--114}, isbn={978-3-031-41676-7} }

Owner

Name: Michael Jungo
Login: jungomi
Kind: user
Location: Fribourg, Switzerland

Repositories: 33
Profile: https://github.com/jungomi

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
repository-code: "https://github.com/jungomi/character-queries"
title: "Character Queries: A Transformer-Based Approach to On-line Handwritten Character Segmentation"
authors:
  - given-names: Michael
    family-names: Jungo
    email: michael.jungo@hefr.ch
  - given-names: Beat
    family-names: Wolf
    email: beat.wolf@hefr.ch
  - given-names: Andrii
    family-names: Maksai
    email: amaksai@google.com
  - given-names: Claudiu
    family-names: Musat
    email: cmusat@google.com
  - given-names: Andreas
    family-names: Fischer
    email: andreas.fischer@hefr.ch
preferred-citation:
  type: conference-paper
  title: "Character Queries: A Transformer-Based Approach to On-line Handwritten Character Segmentation"
  authors:
    - given-names: Michael
      family-names: Jungo
      email: michael.jungo@hefr.ch
    - given-names: Beat
      family-names: Wolf
      email: beat.wolf@hefr.ch
    - given-names: Andrii
      family-names: Maksai
      email: amaksai@google.com
    - given-names: Claudiu
      family-names: Musat
      email: cmusat@google.com
    - given-names: Andreas
      family-names: Fischer
      email: andreas.fischer@hefr.ch
  doi: 10.1007/978-3-031-41676-7_6
  volume-title: Document Analysis and Recognition - ICDAR 2023
  year: 2023
  publisher:
    name: Springer Nature Switzerland
  pages: 98--114
  isbn: 978-3-031-41676-7
  keywords:
    - On-Line Handwriting
    - Digital Ink
    - Character Segmentation
    - Transformer

GitHub Events

Total

Issues event: 2
Issue comment event: 1

Last Year

Issues event: 2
Issue comment event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: 32 minutes
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 2.67
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: about 1 hour
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0