pyramidtabnet

Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents

https://github.com/muhd-umer/pyramidtabnet

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, springer.com, nature.com
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary

Keywords

computer-vision deep-learning document-analysis implementation pytorch table-detection table-structure-recognition

Keywords from Contributors

mesh interpretability sequences projection interactive optim hacking network-simulation

Last synced: 6 months ago · JSON representation

Repository

Official PyTorch implementation of PyramidTabNet: Transformer-based Table Recognition in Image-based Documents

Basic Info

Host: GitHub
Owner: muhd-umer
License: mit
Language: Python
Default Branch: main
Homepage: https://doi.org/10.1007/978-3-031-41734-4_26
Size: 93 MB

Statistics

Stars: 25
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 1

Topics

computer-vision deep-learning document-analysis implementation pytorch table-detection table-structure-recognition

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

PyramidTabNet

PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents
Muhammad Umer, Muhammad Ahmed Mohsin, Adnan Ul-Hasan, and Faisal Shafait
Presented at ICDAR 2023: International Conference on Document Analysis and Recognition
Springer Link

In this paper, we introduce PyramidTabNet (PTN), a method that builds upon Convolution-less Pyramid Vision Transformer to detect tables in document images. Furthermore, we present a tabular image generative augmentation technique to effectively train the architecture. The proposed augmentation process consists of three steps, namely, clustering, fusion, and patching, for the generation of new document images containing tables. Our proposed pipeline demonstrates significant performance improvements for table detection on several standard datasets. Additionally, it achieves performance comparable to the state-of-the-art methods for structure recognition tasks.

Dependencies

It is recommended to create a new virtual environment so that updates/downgrades of packages do not break other projects.

Environment characteristics
python = 3.9.12 torch = 1.11.0 cuda = 11.3 torchvision = 0.12.0

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

This repo uses toolboxes provided by OpenMMLab to train and test models. Head over to the official documentation of MMDetection for installation instructions if you want to train your own model.
Alternatively, if all you want to do is to test the model, you can install mmdet as a third-party package. Run:

python pip install -r requirements.txt

After all the packages has been successfully installed, install mmcv by executing the following commands:

python pip install -U openmim mim install mmcv-full==1.6.0

Alternatively, you can install mmcv using pip as:

pip install mmcv-full==1.6.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11/index.html

Datasets

We provide the test set of cTDaR - TRACK A in COCO JSON format by default (for evaluation). You can access the full cTDaR dataset from the following publicly available GitHub repo: cTDaR - All Tracks. Other public datasets can be downloaded and placed in data directory for training/evaluation.

Data Augmentation

Refer to augmentation directory for instructions on how to use the scripts to generate new document images.

Run

Following sections provide instructions to evaluate and/or train PyramidTabNet on your own data.
Note: It is recommended to execute the scripts from the project root in order to utilize the relative paths to the test set.

Training

Refer to Data Augmentation to generate additional training samples to improve model performance. ❤️
Before firing up the train.py script, make sure to configure the data keys in the config file
Refer to MMDetection documentation for more details on how to modify the keys.

python python model/train.py path/to/config/file --gpu-id 0

Alternatively, you can launch training on multiple GPUs using the following script:

powershell bash model/dist_train.sh ${CONFIG_FILE} \ ${GPU_NUM} \ [optional args]

Evaluation

Download link of fine-tuned weights are available in this section..
Execute test.py with the appropriate command line arguments. Example usage:

python python model/test.py --config-file path/to/config/file \ --input path/to/directory \ --weights path/to/finetuned/checkpoint \ --device "cuda"

Inference

To perform end-to-end table analysis (visualize detections) on a single image/test directory, execute main.py. Download the weights from Weights & Metrics and place them in the weights/ directory. Example usage:

python python main.py --config-file path/to/config/file \ --input path/to/input/image or directory \ --weights-dir path/to/weights/directory \ --device "cuda"

Detection Inference

To perform table detection on a single image/test directory, execute td.py. Example usage:

python python model/td.py --config-file path/to/config/file \ --input path/to/input/image or directory \ --weights path/to/detection/weights \ --device "cuda" \ --save

Recognition Inference

To perform stucture recognition on a single image/test directory, execute tsr.py. Example usage:

python python model/tsr.py --config-file path/to/config/file \ --input path/to/input/image or directory \ --structure-weights path/to/structure/weights \ --cell-weights path/to/cell/weights \ --device "cuda" \ --save

Weights & Metrics

Evaluation metrics are displayed in the following tables. Note: End-user should place the downloaded weights in the weights/ directory for a streamlined evaluation of scripts.

To download all the weights, execute:

powershell bash weights/get_weights.sh bash weights/fine_tuned.sh

Table Detection

Model

Dataset

Precision

Recall

Link

| | --- | --- | --- | --- | --- | --- | | PyramidTabNet | ICDAR 2017-POD
ICDAR 2019
UNLV
Marmot
TableBank
| 99.8
-
97.7
92.1
98.9 | 99.3
-
94.9
98.2
98.2 | 99.5
98.7
96.3
95.1
98.5 | Link
Link
Link
Link
Link |

**Table Structure Recognition** |

Model

Dataset

Precision

Recall

| | --- | --- | --- | --- | --- | | PyramidTabNet | ICDAR 2013
SciTSR
FinTabNet
| 92.3
98.4
93.2 | 95.3
99.1
88.6 | 93.8
98.7
90.8|

Note: FinTabNet fine-tuned model is for cell-detection.

Common Issues

Machines running variants of Microsoft Windows encounter issues with mmcv imports. Follow the installation guide on the official MMCV documentation to resolve such issues. Example:

TypeScript ModuleNotFoundError: No module named 'mmcv._ext'

For table detection, if you get an error of the following form:

TypeScript Error(s) in loading state_dict for TDModel; Missing key(s) in state_dict

Resolve it by passing in the correct command line argument for --config-file.

BibTeX

If you find this work useful for your research, please cite our paper:

@inproceedings{umer2023pyramidtabnet, title={PyramidTabNet: Transformer-Based Table Recognition in Image-Based Documents}, author={Umer, Muhammad and Mohsin, Muhammad Ahmed and Ul-Hasan, Adnan and Shafait, Faisal}, booktitle={International Conference on Document Analysis and Recognition}, pages={420--437}, year={2023}, organization={Springer} }

Acknowledgements

Special thanks to the following contributors without which this repo would not be possible:

The MMDetection team for creating their amazing framework to push the state of the art computer vision research and enabling us to experiment and build various models very easily.
The authors of Pyramid Vision Transformer (PVT v2) for their wonderful contribution to enhance advancements in computer vision.
The authors of Craft Text Detector for their awesome repository for text detection.
The author of mAP Repo for providing a straightforward script to evaluate deep learning models for object detection metrics.
Google Colaboratory for providing free-high end GPU resources for research and development. All of the code base was developed using their platform and could not be possible without it.

Owner

Name: Muhammad Umer
Login: muhd-umer
Kind: user
Location: Pakistan
Company: Student at https://nust.edu.pk/

Repositories: 2
Profile: https://github.com/muhd-umer

Always eager to learn anything related to ML/AI systems; Has a lot of fun observing loss curves.

GitHub Events

Total

Watch event: 5

Last Year

Watch event: 5

Committers

Last synced: over 1 year ago

All Time

Total Commits: 107
Total Committers: 3
Avg Commits per committer: 35.667
Development Distribution Score (DDS): 0.495

Past Year

Commits: 5
Committers: 1
Avg Commits per committer: 5.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Muhammad Umer	8****r	54
Muhammad Umer	t**r@g**m	52
dependabot[bot]	4****]	1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 3
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.33
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

pyramidtabnet

Science Score: 36.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

PyramidTabNet

Dependencies

Datasets

Data Augmentation

Run

Training

Evaluation

Inference

Detection Inference

Recognition Inference

Weights & Metrics

Common Issues

BibTeX

Acknowledgements

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels