https://github.com/amazon-science/semimtr-text-recognition

Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)

https://github.com/amazon-science/semimtr-text-recognition

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, scholar.google
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary

Keywords

computer-vision consistency-regularization contrastive-learning deep-learning ocr pytorch scene-text-recognition self-supervised-learning semi-supervised-learning text-recognition
Last synced: 5 months ago · JSON representation

Repository

Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)

Basic Info
  • Host: GitHub
  • Owner: amazon-science
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.23 MB
Statistics
  • Stars: 83
  • Watchers: 4
  • Forks: 12
  • Open Issues: 1
  • Releases: 0
Topics
computer-vision consistency-regularization contrastive-learning deep-learning ocr pytorch scene-text-recognition self-supervised-learning semi-supervised-learning text-recognition
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Multimodal Semi-Supervised Learning for Text Recognition

The official code implementation of SemiMTR Paper | Pretrained Models | SeqCLR Paper | Citation | Demo.

Aviad Aberdam, Roy Ganz, Shai Mazor, Ron Litman

We introduce a multimodal semi-supervised learning algorithm for text recognition, which is customized for modern vision-language multimodal architectures. To this end, we present a unified one-stage pretraining method for the vision model, which suits scene text recognition. In addition, we offer a sequential, character-level, consistency regularization in which each modality teaches itself. Extensive experiments demonstrate state-of-the-art performance on multiple scene text recognition benchmarks.

Figures

semimtr vision model pretraining

Figure 1: SemiMTR vision model pretraining: Contrastive learning



semimtr fine-tuning

Figure 2: SemiMTR model fine-tuning: Consistency regularization

Getting Started

Run Demo with Pretrained Model Open In Colab

Dependencies

  • Inference and demo requires PyTorch >= 1.7.1
  • For training and evaluation, install the dependencies

pip install -r requirements.txt

Pretrained Models

Download pretrained models:

Pretrained vision models:

Pretrained language model:

For fine-tuning SemiMTR without vision and language pretraining, locate the above models in a workdir directory, as follows:

workdir
├── semimtr_vision_model_real_l_and_u.pth
├── abinet_language_model.pth
└── semimtr_real_l_and_u.pth

SemiMTR Models Accuracy

|Training Data|IIIT|SVT|IC13|IC15|SVTP|CUTE|Avg.|COCO|RCTW|Uber|ArT|LSVT|MLT19|ReCTS|Avg.| |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| |Synth (ABINet)|96.4|93.2|95.1|82.1|89.0|89.2|91.2|63.1|59.7|39.6|68.3|59.5|85.0|86.7|52.0| |Real-L+U|97.0|95.8|96.1|84.7|90.7|94.1|92.8|72.2|76.1|58.5|71.6|77.1|90.4|92.4|65.4| |Real-L+U+Synth|97.4|96.8|96.5|84.7|92.9|95.1|93.3|73.0|75.7|58.6|72.4|77.5|90.4|93.1|65.8| |Real-L+U+TextOCR|97.3|97.7|96.9|86.0|92.2|94.4|93.7|73.8|77.7|58.6|73.5|78.3|91.3|93.3|66.1|

Datasets

  • Download preprocessed lmdb dataset for training and evaluation. Link
  • For training the language model, download WikiText103. Link
  • The final structure of data directory can be found in DATA.md.

Training

  1. Pretrain vision model CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/semimtr_pretrain_vision_model.yaml
  2. Pretrain language model CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/pretrain_language_model.yaml
  3. Train SemiMTR CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/semimtr_finetune.yaml

Note:

  • You can set the checkpoint path for vision and language models separately for specific pretrained model, or set to None to train from scratch

Training ABINet

  1. Pre-train vision model CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/abinet_pretrain_vision_model.yaml
  2. Pre-train language model CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/pretrain_language_model.yaml
  3. Train ABINet CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/abinet_finetune.yaml

Evaluation

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/semimtr_finetune.yaml --run_only_test

Arguments:

  • --checkpoint /path/to/checkpoint set the path of evaluation model
  • --test_root /path/to/dataset set the path of evaluation dataset
  • --model_eval [alignment|vision] which sub-model to evaluate

Citation

If you find our method useful for your research, please cite

``` @article{aberdam2022multimodal, title={Multimodal Semi-Supervised Learning for Text Recognition}, author={Aberdam, Aviad and Ganz, Roy and Mazor, Shai and Litman, Ron}, journal={arXiv preprint arXiv:2205.03873}, year={2022} }

@inproceedings{aberdam2021sequence, title={Sequence-to-sequence contrastive learning for text recognition}, author={Aberdam, Aviad and Litman, Ron and Tsiper, Shahar and Anschel, Oron and Slossberg, Ron and Mazor, Shai and Manmatha, R and Perona, Pietro}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={15302--15312}, year={2021} } ```

Acknowledgements

This implementation is based on the repository ABINet.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Contact

Feel free to contact us if there is any question: Aviad Aberdam

Owner

  • Name: Amazon Science
  • Login: amazon-science
  • Kind: organization

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 43
  • Total Committers: 3
  • Avg Commits per committer: 14.333
  • Development Distribution Score (DDS): 0.116
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Aviad Aberdam a****m@a****m 38
Aviad Aberdam a****m@g****m 4
Amazon GitHub Automation 5****o 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 18
  • Total pull requests: 0
  • Average time to close issues: 4 days
  • Average time to close pull requests: N/A
  • Total issue authors: 12
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • YusenZhang826 (2)
  • dikubab (2)
  • csguoh (1)
  • leduy-it (1)
  • qiutzh (1)
  • icecream-Tnak (1)
  • wulinunu (1)
  • WongVi (1)
  • YuNie24 (1)
  • shantzhou (1)
  • yangxcccscsa (1)
  • baolongliu (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels