checkbox-detection

Checkbox Detection Model for Scanned Documents

https://github.com/lynnhado/checkbox-detection

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary

Keywords

computer-vision copy-paste deep-learning document-understanding object-detection python yolov8

Last synced: 6 months ago · JSON representation ·

Repository

Checkbox Detection Model for Scanned Documents

Basic Info

Host: GitHub
Owner: LynnHaDo
License: agpl-3.0
Language: Jupyter Notebook
Default Branch: main
Homepage: https://huggingface.co/spaces/linhdo/checkbox-detector
Size: 3.09 MB

Statistics

Stars: 65
Watchers: 2
Forks: 3
Open Issues: 3
Releases: 0

Topics

computer-vision copy-paste deep-learning document-understanding object-detection python yolov8

Created over 2 years ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

Checkbox Detection

Checkbox Detector Model using YOLOv8 model
View Demo · Report Bug · Request Feature

Table of Contents

Updates
About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Works Cited
Contact

Updates

In this project, I provided 2 models (classification and detection models) trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!

About The Project

The biggest challenge when I approach this problem is the lack of public datasets that contain documents with checkbox annotations. There are only either images of checkboxes alone, or images of scanned documents. As a result, the solution comes down to generating a sufficiently large annotated dataset of document with checkboxes.

Although the idea of using the Copy-Paste technique in augmenting data is simple, how to make that augmented dataset works well with the existing YOLO architecture is the most difficult part, which takes a lot of trial and error. Throughout this process, I experimented with different ways to paste the checkboxes onto the documents, which include pasting boxes contiguously in horizontal and vertical directions, pasting distractors, adding "background" images, pasting while also avoiding text blocks (using the Document Layout Analysis model I created), etc. I ended with over 10,000 images for the training dataset, and to test the model's performance, an additional 150 human-annotated documents are used as the validation dataset. The annotations are in YOLO format (normalized bounding boxes).

sh 1 0.402831 0.965 0.048906 0.032 0 0.904762 0.856 0.018018 0.014 0 0.189189 0.7005 0.036036 0.029 0 0.388031 0.2395 0.037323 0.029 1 0.0199485 0.2185 0.037323 0.029 1 0.741313 0.96 0.046332 0.032 1 0.677606 0.1045 0.047619 0.041 0 0.956242 0.9045 0.041184 0.033 1 0.838481 0.6575 0.037323 0.029 1 0.837838 0.514 0.03861 0.03 0 0.0456885 0.8305 0.032175 0.027

The model was trained on a GPU P100 for 200 epochs. In the end, under the supervision and mentorship of my advisor, I was able to achieve notable inference results, with the model achieving relatively high precision and recall rates after ~100 epochs.

(back to top)

Built With

(back to top)

Prerequisites

For generating data

opencv-python: 4.7.0
matplotlib: 3.7.1
numpy: 1.25.2
albumentations: 1.3.1

For training

ultralytics: 8.0.153
gradio
torch
ruamel

Installation

Clone the repo

sh git clone https://github.com/LynnHaDo/Checkbox-Detection.git

Install packages

sh pip install opencv-python pip install matplotlib pip install numpy pip install albumentations pip install ultralytics pip install gradio pip install torch pip install ruamel 3. Dataset:

Source documents: RVL-CDIP
Checkboxes images: currently not publicly available

(back to top)

Works Cited

Ultralytics YOLOv8

sh authors: - family-names: Jocher given-names: Glenn orcid: "https://orcid.org/0000-0001-5950-6979" - family-names: Chaurasia given-names: Ayush orcid: "https://orcid.org/0000-0002-7603-6750" - family-names: Qiu given-names: Jing orcid: "https://orcid.org/0000-0003-3783-7069" title: "YOLO by Ultralytics" version: 8.0.0 date-released: 2023-1-10 license: AGPL-3.0 url: "https://github.com/ultralytics/ultralytics"

RVL-CDIP dataset

sh @inproceedings{harley2015icdar, title = {Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval}, author = {Adam W Harley and Alex Ufkes and Konstantinos G Derpanis}, booktitle = {International Conference on Document Analysis and Recognition ({ICDAR})}}, year = {2015} }

Doclaynet-base dataset

sh @article{doclaynet2022, title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation}, doi = {10.1145/3534678.353904}, url = {https://doi.org/10.1145/3534678.3539043}, author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J}, year = {2022}, isbn = {9781450393850}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages = {3743–3751}, numpages = {9}, location = {Washington DC, USA}, series = {KDD '22} }

XFUND dataset

sh @inproceedings{xu-etal-2022-xfund, title = "{XFUND}: A Benchmark Dataset for Multilingual Visually Rich Form Understanding", author = "Xu, Yiheng and Lv, Tengchao and Cui, Lei and Wang, Guoxin and Lu, Yijuan and Florencio, Dinei and Zhang, Cha and Wei, Furu", booktitle = "Findings of the Association for Computational Linguistics: ACL 2022", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-acl.253", doi = "10.18653/v1/2022.findings-acl.253", pages = "3214--3224", abstract = "Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. However, the existed research work has focused only on the English domain while neglecting the importance of multilingual generalization. In this paper, we introduce a human-annotated multilingual form understanding benchmark dataset named XFUND, which includes form understanding samples in 7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese). Meanwhile, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually rich document understanding. Experimental results show that the LayoutXLM model has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset. The XFUND dataset and the pre-trained LayoutXLM model have been publicly available at https://aka.ms/layoutxlm.", }

Contact

Linh Do - do24l@mtholyoke.edu/dohalinh2303@gmail.com (personal)

Project Link: https://github.com/LynnHaDo/Checkbox-Detection

LinkedIn: https://linkedin.com/in/Linh Do

(back to top)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Checkbox Detection Baseline Model
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Linh
    family-names: Do
    email: do24l@mtholyoke.edu
identifiers:
  - type: url
    value: >-
      https://huggingface.co/spaces/linhdo/checkbox-detector/blob/main/models/detector-model.pt
    description: Checkbox Detection Model
repository-code: 'https://github.com/LynnHaDo/Checkbox-Detection'
url: 'https://huggingface.co/spaces/linhdo/checkbox-detector'
abstract: A baseline detection model of checkboxes
keywords:
  - object-detection
  - deep-learning
  - computer-vision
license: MIT
commit: 0f4c0d6
date-released: '2023-08-06'

GitHub Events

Total

Issues event: 6
Watch event: 34
Issue comment event: 7
Push event: 2
Fork event: 2

Last Year

Issues event: 6
Watch event: 34
Issue comment event: 7
Push event: 2
Fork event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 0
Average time to close issues: 1 day
Average time to close pull requests: N/A
Total issue authors: 3
Total pull request authors: 0
Average comments per issue: 0.25
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 0
Average time to close issues: 1 day
Average time to close pull requests: N/A
Issue authors: 3
Pull request authors: 0
Average comments per issue: 0.25
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

pafi-code (2)
Nomiluks (2)
HassanBinAli (1)
clogu (1)
shobi2015 (1)
Otterpatsch (1)
danish-rnd (1)
ryx2 (1)
yuluzhong (1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

checkbox-detection

Science Score: 57.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Checkbox Detection

Updates

About The Project

Built With

Prerequisites

For generating data

For training

Installation

Works Cited

Contact

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels