https://github.com/chris10m/rfb-text-detection

A Dense Text Detection model using Receptive Field Blocks

https://github.com/chris10m/rfb-text-detection

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

cnn-keras east east-detection keras keras-models rfb-text text text-detect text-detecting text-detection text-detector
Last synced: 5 months ago · JSON representation

Repository

A Dense Text Detection model using Receptive Field Blocks

Basic Info
  • Host: GitHub
  • Owner: Chris10M
  • License: mit
  • Language: C++
  • Default Branch: master
  • Size: 20.2 MB
Statistics
  • Stars: 31
  • Watchers: 5
  • Forks: 11
  • Open Issues: 9
  • Releases: 0
Topics
cnn-keras east east-detection keras keras-models rfb-text text text-detect text-detecting text-detection text-detector
Created almost 7 years ago · Last pushed about 3 years ago
Metadata Files
Readme License

README.md

Receptive Field Blocks Text Detection Module (RFBTD Module)

A Dense Text Detection model using Receptive Field Blocks

Introduction

This repo contains text detection based on Receptive Field Blocks. The text detection model provides a dense receptive field, for predicting text boxes in dense natural scene images like documents, articles etc.

The model is also inspired from EAST: An Efficient and Accurate Scene Text Detector, where the RRBOX part and loss function is taken from.

The features of the model are summarized below: + Keras implementation for lucid and clean code. + Backbone: Resnet50 + Inference time for 720p images:

**GPU VERSION**
+ Graphics Card: MX130
+ Inference Time: 700ms
+ Batch Size: 1

**CPU VERSION**
+ CPU: Intel i7-8550U CPU @ 1.80GHz
+ Inference Time: 1750ms
+ Batch Size: 1
  • The pre-trained model provided achieves 47.09(Single Crop, Resize Only) F1-score on ICDAR 2015, but was not trained on ICDAR 2015. To improve accuracy fine-tune on ICDAR 2015, and predict with multiple crops.

  • The model is tuned for predicting text boxes for natural scene documents, like bank statements, forms, recipts, etc, and evidently do OCR on these text boxes.

Please cite these paper, paper if you find this useful.

Contents

  1. Live Demo
  2. Installation
  3. Download Pre-Trained Model
  4. Demo
  5. Eval
  6. Train
  7. Examples

LiveDemo

The model is hosted on my server so if you want to try it out live, click here and the model is hosted with the resolution resized to @480p. This is due to the model not fiting on my single core, 512mb RAM server. So accuracy will flounder due to the small image input.

Installation

  • Requirements from requirements.txt

    tensorflow==1.13.1

    Keras==2.2.4

    numpy==1.16.2

    opencvcontribpython==4.0.0.21

    plumbum==1.6.7

Download

The Pre-trained Model is available at this Link GoogleDrive

Demo

If you've downloaded the pre-trained model, run python3 run_demo.py the images are taken from testimages/inputimages and the output is predicted to testimages/predictedimages

Eval

If you want to benchmark it on ICDAR 2015, run python eval.py the images are taken from evalimages/evaluationimages and the output is predicted to evalimages/predictedboxes, text files will be then written to the output path. This format confides to the ICDAR text detection challenge format.

Train

The training code is not uploaded as of now, but will definitely post it in the upcoming days.

Examples

Here are some examples, the images on the left represent source images, and the images on the right are images overlayed with the predicted bounding boxes.

image_1 image_2 image_3 image_4

Issues

If you encounter any issues, please create an issue tracker.

Model Architecture

If you want to get more insight on the RF Block, just peruse the model architecture. image_1

And a huge shout out for argman/EAST for providing C++ NMS.

Owner

  • Name: Christen Millerdurai
  • Login: Chris10M
  • Kind: user

PhD & Researcher @ AV DFKI-Kaiserslautern.

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • Keras ==2.2.4
  • numpy ==1.16.2
  • opencv_contrib_python ==4.0.0.21
  • plumbum ==1.6.7
  • tensorflow ==1.13.1