https://github.com/alixunxing/dig

Official PyTorch implementation of `Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition`

https://github.com/alixunxing/dig

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (2.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Official PyTorch implementation of `Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition`

Basic Info
  • Host: GitHub
  • Owner: alixunxing
  • License: other
  • Default Branch: main
  • Homepage:
  • Size: 121 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of ayumiymk/DiG
Created about 3 years ago · Last pushed over 3 years ago

https://github.com/alixunxing/DiG/blob/main/

# Official PyTorch implementation of [DiG](https://arxiv.org/pdf/2207.00193)

This repository is built upon [MoCo V3](https://github.com/facebookresearch/moco-v3) and [MAE](https://github.com/pengzhiliang/MAE-pytorch), thanks very much!

## Data preparation
All datasets for pre-training and fine-tuning are processed from public datasets.

Unlabeled Real Data CC-OCR
Synthetic Text Data SynthText, Synth90k (Baiduyun with passwd: wi05)
Annotated Real Data TextOCR, OpenImageTextV5
Scene Text Recognition Benchmarks IIIT5k, SVT, IC13, IC15, SVTP, CUTE, COCOText, CTW, Total-Text, HOST, WOST
Handwritten Text Training Data CVL, IAM
Handwritten Text Recognition Benchmarks CVL, IAM
## Setup ``` conda env create -f environment.yml ``` ## Run 1. Pre-training ```bash # Set the path to save checkpoints OUTPUT_DIR='output/pretrain_dig' # path to imagenet-1k train set DATA_PATH='/path/to/pretrain_data/' # batch_size can be adjusted according to the graphics card OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining_moco.py \ --image_alone_path ${DATA_PATH} \ --mask_ratio 0.7 \ --batch_size 128 \ --opt adamw \ --output_dir ${OUTPUT_DIR} \ --epochs 10 \ --warmup_steps 5000 \ --max_len 25 \ --num_view 2 \ --moco_dim 256 \ --moco_mlp_dim 4096 \ --moco_m 0.99 \ --moco_m_cos \ --moco_t 0.2 \ --num_windows 4 \ --contrast_warmup_steps 0 \ --contrast_start_epoch 0 \ --loss_weight_pixel 1. \ --loss_weight_contrast 0.1 \ --only_mim_on_ori_img \ --weight_decay 0.1 \ --opt_betas 0.9 0.999 \ --model pretrain_simmim_moco_ori_vit_small_patch4_32x128 \ --patchnet_name no_patchtrans \ --encoder_type vit \ ``` 2. Fine-tuning ```bash # Set the path to save checkpoints OUTPUT_DIR='output/' # path to imagenet-1k set DATA_PATH='/path/to/finetune_data' # path to pretrain model MODEL_PATH='/path/to/pretrain/checkpoint.pth' # batch_size can be adjusted according to the graphics card OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 --master_port 10041 run_class_finetuning.py \ --model simmim_vit_small_patch4_32x128 \ --data_path ${DATA_PATH} \ --eval_data_path ${DATA_PATH} \ --finetune ${MODEL_PATH} \ --output_dir ${OUTPUT_DIR} \ --batch_size 256 \ --opt adamw \ --opt_betas 0.9 0.999 \ --weight_decay 0.05 \ --data_set image_lmdb \ --nb_classes 97 \ --smoothing 0. \ --max_len 25 \ --epochs 10 \ --warmup_epochs 1 \ --drop 0.1 \ --attn_drop_rate 0.1 \ --drop_path 0.1 \ --dist_eval \ --lr 1e-4 \ --num_samples 1 \ --fixed_encoder_layers 0 \ --decoder_name tf_decoder \ --use_abi_aug \ --num_view 2 \ ``` 3. Evaluation ```bash # Set the path to save checkpoints OUTPUT_DIR='output/' # path to imagenet-1k set DATA_PATH='/path/to/test_data' # path to finetune model MODEL_PATH='/path/to/finetune/checkpoint.pth' # batch_size can be adjusted according to the graphics card OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=$opt_nproc_per_node --master_port 10040 run_class_finetuning.py \ --model simmim_vit_small_patch4_32x128 \ --data_path ${DATA_PATH} \ --eval_data_path ${DATA_PATH} \ --output_dir ${OUTPUT_DIR} \ --batch_size 512 \ --opt adamw \ --opt_betas 0.9 0.999 \ --weight_decay 0.05 \ --data_set image_lmdb \ --nb_classes 97 \ --smoothing 0. \ --max_len 25 \ --resume ${MODEL_PATH} \ --eval \ --epochs 20 \ --warmup_epochs 2 \ --drop 0.1 \ --attn_drop_rate 0.1 \ --dist_eval \ --num_samples 1000000 \ --fixed_encoder_layers 0 \ --decoder_name tf_decoder \ --beam_width 0 \ ``` ## Result | model | pretrain | finetune | average accuracy | weight | |:--------:|:--------:|:--------:|:--------:| :--------:| | vit-small | 10e | 10e | 85.21% | [pretrain](https://1drv.ms/u/s!AgwG2MwdV23ckOhlLmStGZ03RSQLMA?e=WN9fJ9) [finetune](https://1drv.ms/u/s!AgwG2MwdV23ckOhm29tonOUPja4yXQ?e=ed3Cfs)| ## Citation If you find this project helpful for your research, please cite the following paper: ``` @inproceedings{DiG, author = {Mingkun Yang and Minghui Liao and Pu Lu and Jing Wang and Shenggao Zhu and Hualin Luo and Qi Tian and Xiang Bai}, editor = {Jo{\~{a}}o Magalh{\~{a}}es and Alberto Del Bimbo and Shin'ichi Satoh and Nicu Sebe and Xavier Alameda{-}Pineda and Qin Jin and Vincent Oria and Laura Toni}, title = {Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition}, booktitle = {{MM} '22: The 30th {ACM} International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022}, pages = {4214--4223}, publisher = {{ACM}}, year = {2022}, url = {https://doi.org/10.1145/3503161.3547784}, doi = {10.1145/3503161.3547784}, timestamp = {Fri, 14 Oct 2022 14:25:06 +0200}, biburl = {https://dblp.org/rec/conf/mm/YangLLWZLTB22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ``` ## License This project is under the CC-BY-NC 4.0 license. See [LICENSE](https://github.com/ayumiymk/DiG/blob/main/LICENSE) for details. If you are going to use it in a product, we suggest you [contact us](xbai@hust.edu.cn) regarding possible patent issues.

Owner

  • Login: alixunxing
  • Kind: user

GitHub Events

Total
Last Year