https://github.com/amazon-science/semimtr-text-recognition
Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Keywords
Repository
Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)
Basic Info
Statistics
- Stars: 83
- Watchers: 4
- Forks: 12
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Multimodal Semi-Supervised Learning for Text Recognition
The official code implementation of SemiMTR Paper | Pretrained Models | SeqCLR Paper | Citation | Demo.
Aviad Aberdam, Roy Ganz, Shai Mazor, Ron Litman
We introduce a multimodal semi-supervised learning algorithm for text recognition, which is customized for modern vision-language multimodal architectures. To this end, we present a unified one-stage pretraining method for the vision model, which suits scene text recognition. In addition, we offer a sequential, character-level, consistency regularization in which each modality teaches itself. Extensive experiments demonstrate state-of-the-art performance on multiple scene text recognition benchmarks.
Figures
Figure 1: SemiMTR vision model pretraining: Contrastive learning
Figure 2: SemiMTR model fine-tuning: Consistency regularization
Getting Started
Run Demo with Pretrained Model
Dependencies
- Inference and demo requires PyTorch >= 1.7.1
- For training and evaluation, install the dependencies
pip install -r requirements.txt
Pretrained Models
Download pretrained models:
Pretrained vision models:
Pretrained language model:
For fine-tuning SemiMTR without vision and language pretraining, locate the above models in a workdir directory, as follows:
workdir
├── semimtr_vision_model_real_l_and_u.pth
├── abinet_language_model.pth
└── semimtr_real_l_and_u.pth
SemiMTR Models Accuracy
|Training Data|IIIT|SVT|IC13|IC15|SVTP|CUTE|Avg.|COCO|RCTW|Uber|ArT|LSVT|MLT19|ReCTS|Avg.| |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| |Synth (ABINet)|96.4|93.2|95.1|82.1|89.0|89.2|91.2|63.1|59.7|39.6|68.3|59.5|85.0|86.7|52.0| |Real-L+U|97.0|95.8|96.1|84.7|90.7|94.1|92.8|72.2|76.1|58.5|71.6|77.1|90.4|92.4|65.4| |Real-L+U+Synth|97.4|96.8|96.5|84.7|92.9|95.1|93.3|73.0|75.7|58.6|72.4|77.5|90.4|93.1|65.8| |Real-L+U+TextOCR|97.3|97.7|96.9|86.0|92.2|94.4|93.7|73.8|77.7|58.6|73.5|78.3|91.3|93.3|66.1|
Datasets
- Download preprocessed lmdb dataset for training and evaluation. Link
- For training the language model, download WikiText103. Link
- The final structure of
datadirectory can be found inDATA.md.
Training
- Pretrain vision model
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/semimtr_pretrain_vision_model.yaml - Pretrain language model
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/pretrain_language_model.yaml - Train SemiMTR
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/semimtr_finetune.yaml
Note:
- You can set the
checkpointpath for vision and language models separately for specific pretrained model, or set toNoneto train from scratch
Training ABINet
- Pre-train vision model
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/abinet_pretrain_vision_model.yaml - Pre-train language model
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/pretrain_language_model.yaml - Train ABINet
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config configs/abinet_finetune.yaml
Evaluation
CUDA_VISIBLE_DEVICES=0 python main.py --config configs/semimtr_finetune.yaml --run_only_test
Arguments:
--checkpoint /path/to/checkpointset the path of evaluation model--test_root /path/to/datasetset the path of evaluation dataset--model_eval [alignment|vision]which sub-model to evaluate
Citation
If you find our method useful for your research, please cite
``` @article{aberdam2022multimodal, title={Multimodal Semi-Supervised Learning for Text Recognition}, author={Aberdam, Aviad and Ganz, Roy and Mazor, Shai and Litman, Ron}, journal={arXiv preprint arXiv:2205.03873}, year={2022} }
@inproceedings{aberdam2021sequence, title={Sequence-to-sequence contrastive learning for text recognition}, author={Aberdam, Aviad and Litman, Ron and Tsiper, Shahar and Anschel, Oron and Slossberg, Ron and Mazor, Shai and Manmatha, R and Perona, Pietro}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={15302--15312}, year={2021} } ```
Acknowledgements
This implementation is based on the repository ABINet.
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Contact
Feel free to contact us if there is any question: Aviad Aberdam
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Aviad Aberdam | a****m@a****m | 38 |
| Aviad Aberdam | a****m@g****m | 4 |
| Amazon GitHub Automation | 5****o | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 18
- Total pull requests: 0
- Average time to close issues: 4 days
- Average time to close pull requests: N/A
- Total issue authors: 12
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- YusenZhang826 (2)
- dikubab (2)
- csguoh (1)
- leduy-it (1)
- qiutzh (1)
- icecream-Tnak (1)
- wulinunu (1)
- WongVi (1)
- YuNie24 (1)
- shantzhou (1)
- yangxcccscsa (1)
- baolongliu (1)