checkbox-detection
Checkbox Detection Model for Scanned Documents
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Keywords
Repository
Checkbox Detection Model for Scanned Documents
Basic Info
- Host: GitHub
- Owner: LynnHaDo
- License: agpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://huggingface.co/spaces/linhdo/checkbox-detector
- Size: 3.09 MB
Statistics
- Stars: 65
- Watchers: 2
- Forks: 3
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
Checkbox Detection
Checkbox Detector Model using YOLOv8 model
View Demo
·
Report Bug
·
Request Feature
Table of Contents
Updates
In this project, I provided 2 models (classification and detection models) trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!
About The Project
The biggest challenge when I approach this problem is the lack of public datasets that contain documents with checkbox annotations. There are only either images of checkboxes alone, or images of scanned documents. As a result, the solution comes down to generating a sufficiently large annotated dataset of document with checkboxes.
Although the idea of using the Copy-Paste technique in augmenting data is simple, how to make that augmented dataset works well with the existing YOLO architecture is the most difficult part, which takes a lot of trial and error. Throughout this process, I experimented with different ways to paste the checkboxes onto the documents, which include pasting boxes contiguously in horizontal and vertical directions, pasting distractors, adding "background" images, pasting while also avoiding text blocks (using the Document Layout Analysis model I created), etc. I ended with over 10,000 images for the training dataset, and to test the model's performance, an additional 150 human-annotated documents are used as the validation dataset. The annotations are in YOLO format (normalized bounding boxes).

sh
1 0.402831 0.965 0.048906 0.032
0 0.904762 0.856 0.018018 0.014
0 0.189189 0.7005 0.036036 0.029
0 0.388031 0.2395 0.037323 0.029
1 0.0199485 0.2185 0.037323 0.029
1 0.741313 0.96 0.046332 0.032
1 0.677606 0.1045 0.047619 0.041
0 0.956242 0.9045 0.041184 0.033
1 0.838481 0.6575 0.037323 0.029
1 0.837838 0.514 0.03861 0.03
0 0.0456885 0.8305 0.032175 0.027
The model was trained on a GPU P100 for 200 epochs. In the end, under the supervision and mentorship of my advisor, I was able to achieve notable inference results, with the model achieving relatively high precision and recall rates after ~100 epochs.

Built With
- YOLOv8: 8.0.153
- Gradio
- Hugging Face Space
- Kaggle
Prerequisites
For generating data
- opencv-python: 4.7.0
- matplotlib: 3.7.1
- numpy: 1.25.2
- albumentations: 1.3.1
For training
- ultralytics: 8.0.153
- gradio
- torch
- ruamel
Installation
- Clone the repo
sh
git clone https://github.com/LynnHaDo/Checkbox-Detection.git
- Install packages
sh
pip install opencv-python
pip install matplotlib
pip install numpy
pip install albumentations
pip install ultralytics
pip install gradio
pip install torch
pip install ruamel
3. Dataset:
- Source documents: RVL-CDIP
- Checkboxes images: currently not publicly available
Works Cited
- Ultralytics YOLOv8
sh
authors:
- family-names: Jocher
given-names: Glenn
orcid: "https://orcid.org/0000-0001-5950-6979"
- family-names: Chaurasia
given-names: Ayush
orcid: "https://orcid.org/0000-0002-7603-6750"
- family-names: Qiu
given-names: Jing
orcid: "https://orcid.org/0000-0003-3783-7069"
title: "YOLO by Ultralytics"
version: 8.0.0
date-released: 2023-1-10
license: AGPL-3.0
url: "https://github.com/ultralytics/ultralytics"
- RVL-CDIP dataset
sh
@inproceedings{harley2015icdar,
title = {Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval},
author = {Adam W Harley and Alex Ufkes and Konstantinos G Derpanis},
booktitle = {International Conference on Document Analysis and Recognition ({ICDAR})}},
year = {2015}
}
- Doclaynet-base dataset
sh
@article{doclaynet2022,
title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
doi = {10.1145/3534678.353904},
url = {https://doi.org/10.1145/3534678.3539043},
author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
year = {2022},
isbn = {9781450393850},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
pages = {3743–3751},
numpages = {9},
location = {Washington DC, USA},
series = {KDD '22}
}
- XFUND dataset
sh
@inproceedings{xu-etal-2022-xfund,
title = "{XFUND}: A Benchmark Dataset for Multilingual Visually Rich Form Understanding",
author = "Xu, Yiheng and
Lv, Tengchao and
Cui, Lei and
Wang, Guoxin and
Lu, Yijuan and
Florencio, Dinei and
Zhang, Cha and
Wei, Furu",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.findings-acl.253",
doi = "10.18653/v1/2022.findings-acl.253",
pages = "3214--3224",
abstract = "Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. However, the existed research work has focused only on the English domain while neglecting the importance of multilingual generalization. In this paper, we introduce a human-annotated multilingual form understanding benchmark dataset named XFUND, which includes form understanding samples in 7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese). Meanwhile, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually rich document understanding. Experimental results show that the LayoutXLM model has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset. The XFUND dataset and the pre-trained LayoutXLM model have been publicly available at https://aka.ms/layoutxlm.",
}
Contact
Linh Do - do24l@mtholyoke.edu/dohalinh2303@gmail.com (personal)
Project Link: https://github.com/LynnHaDo/Checkbox-Detection
LinkedIn: https://linkedin.com/in/Linh Do
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Checkbox Detection Baseline Model
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Linh
family-names: Do
email: do24l@mtholyoke.edu
identifiers:
- type: url
value: >-
https://huggingface.co/spaces/linhdo/checkbox-detector/blob/main/models/detector-model.pt
description: Checkbox Detection Model
repository-code: 'https://github.com/LynnHaDo/Checkbox-Detection'
url: 'https://huggingface.co/spaces/linhdo/checkbox-detector'
abstract: A baseline detection model of checkboxes
keywords:
- object-detection
- deep-learning
- computer-vision
license: MIT
commit: 0f4c0d6
date-released: '2023-08-06'
GitHub Events
Total
- Issues event: 6
- Watch event: 34
- Issue comment event: 7
- Push event: 2
- Fork event: 2
Last Year
- Issues event: 6
- Watch event: 34
- Issue comment event: 7
- Push event: 2
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 3
- Total pull request authors: 0
- Average comments per issue: 0.25
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 0.25
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- pafi-code (2)
- Nomiluks (2)
- HassanBinAli (1)
- clogu (1)
- shobi2015 (1)
- Otterpatsch (1)
- danish-rnd (1)
- ryx2 (1)
- yuluzhong (1)