https://github.com/amazon-science/textadain-robust-recognition

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary

Keywords

deep-learning handwriting-recognition ocr pytorch regularization scene-text-recognition shortcut-learning text-recognition

Last synced: 5 months ago · JSON representation

Repository

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

Basic Info

Host: GitHub
Owner: amazon-science
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 690 KB

Statistics

Stars: 21
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 0

Topics

deep-learning handwriting-recognition ocr pytorch regularization scene-text-recognition shortcut-learning text-recognition

Created over 3 years ago · Last pushed over 3 years ago

https://github.com/amazon-science/textadain-robust-recognition/blob/main/

## TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers
This is the official pytorch implementation of [TextAdaIN](https://arxiv.org/abs/2105.03906) (ECCV 2022).

**[Oren Nuriel](https://scholar.google.com/citations?hl=en&user=x3j-9RwAAAAJ),
[Sharon Fogel](https://scholar.google.com/citations?hl=en&user=fJHpwNkAAAAJ),
[Ron Litman](https://scholar.google.com/citations?hl=en&user=69GY5dEAAAAJ)**

TextAdaIN creates local distortions in the feature map which prevent the network from overfitting to local statistics.  It does so by viewing each feature map as a sequence of elements and deliberately mismatching fine-grained feature statistics between elements in a mini-batch.


![TextAdaIN](./figures/teaser_fig_v3.svg)





## Overcoming the shortcut
Below we see the attention maps of a text recognizer before and after applying local corruptions to the input image.
Each example shows the input image (bottom), attention map (top) and model prediction (left). Each line in the attention map is a time step representing the attention per character prediction. (a) The baseline model, which uses local statistics as a shortcut, misinterprets the corrupted images. (b) Our proposed  method which overcomes this shortcut, enhances performance on both standard and challenging testing conditions

![Attention](./figures/attn_viz.svg)

## Integrating into your favorite text recognizer backbone 
Sample code for the class can be found in [TextAdaIN.py](./TextAdaIN.py)

As there are weights to this module, after training with this, the model can be loaded with or without this module.

```
# in the init of a pytorch module for training (no learnable weights, and isn't applied during inference so can load with or without)
    self.text_adain = TextAdaIN()



# in the forward
    out = self.conv(out)
    out = self.text_adain(out)
    out = self.bn(out)
```


## Results
Below are the results for a variety of settings - scene text and handwriting and multiple architectures, with and without TextAdaIN.
Applying TextAdaIN in state-of-the-art recognizers increases performance. 



  
    Method
    Scene Text
    Handwritten
  
  
    Regular
    Irregular
    IAM
    RIMES
  
  
    5,529
    3,010
    17,990
    7,734
  






  
     Baek et al. (CTC)
    88.7

    72.9
    80.6
    87.8
  
  
    + TextAdaIN
    89.5 (+0.8)
    73.8 (+0.9)
    81.5 (+0.9)
    90.7 (+2.9)
  
  
     Baek et al. (Attn)
    92.0
    77.4
    82.7
    90.2
  
  
    + TextAdaIN
    92.2 (+0.2)
    77.7 (+0.3)
    84.1 (+1.4)
    93.0 (+2.8)
  
  
     Litman et al.
    93.6
    83.0
    85.7
    93.3
  
  
    + TextAdaIN
    94.2 (+0.6)
    83.4 (+0.4)
    87.3 (+1.6)
    94.4 (+1.1)
  
  
     Fang et al.
    93.9
    82.0
    85.4
    92.0
  
  
    + TextAdaIN
    94.2 (+0.3)
    82.8 (+0.8)
    86.3 (+0.9)
    93.0 (+1.0)
  



## Experiments - Plug n' play

### Standard Text Recognizer

To run with the [Baek et al](https://github.com/clovaai/deep-text-recognition-benchmark) framework, insert the TextAdaIN module into the ResNet backbone after every convolutional layer in the [feature extractor](https://github.com/clovaai/deep-text-recognition-benchmark/blob/master/modules/feature_extraction.py) as described above.
After this is done, simply run the commandline as instructed in the [training & evaluation section](https://github.com/clovaai/deep-text-recognition-benchmark#training-and-evaluation)

For scene text we use the original configurations.

When training on handwriting datasets we run with the following configurations.
```
python train.py --train_data  --valid_data  --select_data / --batch_ratio 1 --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn --exp-name handwriting --sensitive --rgb --num_iter 200000 --batch_size 128 --textadain 
```

### ABINet

To run with [ABINet](https://github.com/FangShancheng/ABINet), insert the TextAdaIN module into the ResNet backbone after every convolutional layer into the [feature extractor](https://github.com/FangShancheng/ABINet/blob/main/modules/resnet.py) as described above.
After this is done, simply run the command line as instructed in the [training section](https://github.com/FangShancheng/ABINet#training)

Please refer to the implementation details in the paper for further information.

## Citation
If you find this work useful please consider citing it:
```
@article{nuriel2021textadain,
  title={TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers},
  author={Nuriel, Oren and Fogel, Sharon and Litman, Ron},
  journal={arXiv preprint arXiv:2105.03906},
  year={2021}
}
```

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License

This project is licensed under the Apache-2.0 License.

Method	Scene Text	Handwritten
Baek et al. (CTC)	88.7	72.9	80.6	87.8
+ TextAdaIN	89.5 (+0.8)	73.8 (+0.9)	81.5 (+0.9)	90.7 (+2.9)
Baek et al. (Attn)	92.0	77.4	82.7	90.2
+ TextAdaIN	92.2 (+0.2)	77.7 (+0.3)	84.1 (+1.4)	93.0 (+2.8)
Litman et al.	93.6	83.0	85.7	93.3
+ TextAdaIN	94.2 (+0.6)	83.4 (+0.4)	87.3 (+1.6)	94.4 (+1.1)
Fang et al.	93.9	82.0	85.4	92.0
+ TextAdaIN	94.2 (+0.3)	82.8 (+0.8)	86.3 (+0.9)	93.0 (+1.0)

Owner

Name: Amazon Science
Login: amazon-science
Kind: organization

Website: https://amazon.science
Twitter: AmazonScience
Repositories: 80
Profile: https://github.com/amazon-science

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: 3 months
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 1.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amazon-science/textadain-robust-recognition

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

https://github.com/amazon-science/textadain-robust-recognition/blob/main/

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Method	Scene Text		Handwritten
	Regular	Irregular	IAM	RIMES
	5,529	3,010	17,990	7,734
Baek et al. (CTC)	88.7	72.9	80.6	87.8
+ TextAdaIN	89.5 (+0.8)	73.8 (+0.9)	81.5 (+0.9)	90.7 (+2.9)
Baek et al. (Attn)	92.0	77.4	82.7	90.2
+ TextAdaIN	92.2 (+0.2)	77.7 (+0.3)	84.1 (+1.4)	93.0 (+2.8)
Litman et al.	93.6	83.0	85.7	93.3
+ TextAdaIN	94.2 (+0.6)	83.4 (+0.4)	87.3 (+1.6)	94.4 (+1.1)
Fang et al.	93.9	82.0	85.4	92.0
+ TextAdaIN	94.2 (+0.3)	82.8 (+0.8)	86.3 (+0.9)	93.0 (+1.0)