https://github.com/bytedance/sptsv2

The official implementation of SPTS v2: Single-Point Text Spotting

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, sciencedirect.com, ieee.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary

Keywords

artificial-intelligence computer-vision deep-learning ocr research

Last synced: 5 months ago · JSON representation

Repository

The official implementation of SPTS v2: Single-Point Text Spotting

Basic Info

Host: GitHub
Owner: bytedance
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 675 KB

Statistics

Stars: 136
Watchers: 5
Forks: 18
Open Issues: 15
Releases: 0

Archived

Topics

artificial-intelligence computer-vision deep-learning ocr research

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

SPTS v2: Single-Point Scene Text Spotting

The official implementation of SPTS v2: Single-Point Text Spotting. The SPTSv2 which achieves 19× faster inference speed tackles scene text spotting as an end-to-end sequence prediction task and requires only extremely low-cost single-point annotations. Below is the overall architecture of SPTSv2.

Image text

Environment

We recommend using Anaconda to manage environments. Run the following commands to install dependencies. conda create -n sptsv2 python=3.7 -y conda activate sptsv2 conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 -c pytorch git clone git@github.com:bytedance/SPTSv2.git cd SPTSv2 pip install -r requirements.txt

Dataset

CurvedSynText150k [paper]:
- Part1 (94,723) Download (15.8G) (Origin, Google, BaiduNetDisk password: 4k3x)
- Part2 (54,327) Download (9.7G) (Origin, Google, BaiduNetDisk password: a5f5)
Totaltext [paper] [source].
- Download (0.4G) (Google, BaiduNetDisk password: 5nhw)
SCUT-CTW1500 [paper] [source].
- Download (0.8G) (Google, BaiduNetDisk password: 82vs)
MLT [paper].
- Download (6.8G) (Origin, Google, BaiduNetDisk password: zqrm)
ICDAR2013 [paper] [source].
- Download (0.2G) (Google, BaiduNetDisk password: 5ddh)
ICDAR2015 [paper] [source].
- Download (0.1G) (Google, BaiduNetDisk password: wjrh)
Inverse-Text (images): OneDrive, BaiduNetdisk(6a2n).

Please download and extract the above datasets into the data folder following the file structure below.

data ├─CTW1500 │ ├─annotations │ │ test_ctw1500_maxlen25.json │ │ train_ctw1500_maxlen25_v2.json │ ├─ctwtest_text_image │ └─ctwtrain_text_image ├─icdar2013 │ │ ic13_test.json │ │ ic13_train.json │ ├─test_images │ └─train_images ├─icdar2015 │ │ ic15_test.json │ │ ic15_train.json │ ├─test_images │ └─train_images |- inversetext | |- test_images | └─ test_poly.json ├─mlt2017 │ │ train.json │ └─MLT_train_images ├─syntext1 │ │ train.json │ └─syntext_word_eng ├─syntext2 │ │ train.json │ └─emcs_imgs └─totaltext │ test.json │ train.json ├─test_images └─train_images

Train and finetune

The model training in the original paper uses 16 GPUs (2 nodes, 8 A100 GPUs per node). Below are the instructions for the training using a single machine with 8 GPUs, which can be simply modified to multi-node training following PyTorch Distributed Docs.

You can download our pretrained weight from Google Drive or BaiduNetDisk, password: 3pcu, or pretrain the model from scratch using the run.sh file. If finetuning, just set --resume and --finetune in run.sh.

Inference and visualization

The trained models can be obtained after finishing the above steps. You can also download the models for the Total-Text, SCUT-CTW1500, ICDAR2013, ICDAR2015 and inversetext datasets from GoogleDrive or BaiduNetDisk password: 2k2m. Then you can use test.sh or predict.py to output results and visualization.

Image text

Evaluation

First, download the ground-truth files (GoogleDrive, BaiduNetDisk password: 35tr) and lexicons (GoogleDrive, BaiduNetDisk password: 9eml), and extracted them into the evaluation folder.

evaluation │ eval.py ├─gt │ ├─gt_ctw1500 │ ├─gt_ic13 │ ├─gt_ic15 │ └─gt_totaltext └─lexicons ├─ctw1500 ├─ic13 ├─ic15 └─totaltext We provide two evaluation scripts, including eval_ic15.py for evaluating icdar2015 dataset, and eval.py for other benchmarks. The command for evaluating the inference result of Total-Text is: python evaluation/eval.py \ --result_path ./output/totaltext_val.json \ # --with_lexicon \ # uncomment this line if you want to evaluate with lexicons. # --lexicon_type 0 # used for ICDAR2013 and ICDAR2015. 0: Generic; 1: Weak; 2: Strong.

Performance

The end-to-end recognition performances of SPTSv2 on five public benchmarks are:

| Dataset | Strong | Weak | Generic | | ------- | ------ | ---- | ------- | | ICDAR 2013 | 93.9 | 91.8 | 88.6 | | ICDAR 2015 | 82.3 | 77.7 | 72.6 |

| Dataset | None | Full | | ------- | ---- | ---- | | Total-Text | 75.5 | 84.0 | | inversetext | 63.5 | 74.9 | | SCUT-CTW1500 | 63.6 | 84.3 |

Citation

``` @inproceedings{peng2022spts, title={SPTS: Single-Point Text Spotting}, author={Peng, Dezhi and Wang, Xinyu and Liu, Yuliang and Zhang, Jiaxin and Huang, Mingxin and Lai, Songxuan and Zhu, Shenggao and Li, Jing and Lin, Dahua and Shen, Chunhua and Bai, Xiang and Jin, Lianwen}, booktitle={Proceedings of the 30th ACM International Conference on Multimedia}, year={2022} }

@article{liu2023spts, title={SPTS v2: Single-Point Scene Text Spotting}, author={Liu, Yuliang and Zhang, Jiaxin and Peng, Dezhi and Huang, Mingxin and Wang, Xinyu and Tang, Jingqun and Huang, Can and Lin, Dahua and Shen, Chunhua and Bai, Xiang and Jin, Lianwen}, journal={arXiv preprint arXiv:2301.01635}, year={2023} } ```

Copyright

This repository can only be used for non-commercial research purpose.

For commercial use, please contact Jiaxin Zhang (zhangjiaxin.zjx1995@bytedance.com).

Acknowledgement

We sincerely thank Stable-Pix2Seq, Pix2Seq, DETR, Swin-Transformer, SPTS and ABCNet for their excellent works.

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

GitHub Events

Total

Watch event: 11
Fork event: 4

Last Year

Watch event: 11
Fork event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 20
Total pull requests: 2
Average time to close issues: 8 days
Average time to close pull requests: 1 day
Total issue authors: 13
Total pull request authors: 1
Average comments per issue: 1.95
Average comments per pull request: 0.5
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

123jwz (4)
jasper-cell (3)
elhamjan4311 (1)
ZhengyaoFang (1)
zhangjx123 (1)
babble2 (1)
madajie9 (1)
Gavinic (1)
52Hzaaa (1)
KlayMa527 (1)
ocrhei (1)
1037419569 (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bytedance/sptsv2

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

SPTS v2: Single-Point Scene Text Spotting

Environment

Dataset

Train and finetune

Inference and visualization

Evaluation

Performance

Citation

Copyright

Acknowledgement

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels