https://github.com/amir22010/ssd_detectors

SSD-based object and text detection with Keras, SSD, DSOD, TextBoxes, SegLink, TextBoxes++, CRNN

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

SSD-based object and text detection with Keras, SSD, DSOD, TextBoxes, SegLink, TextBoxes++, CRNN

Basic Info

Host: GitHub
Owner: Amir22010
License: mit
Language: Jupyter Notebook
Default Branch: master
Homepage:
Size: 142 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of mvoelk/ssd_detectors

Created almost 7 years ago · Last pushed almost 7 years ago

https://github.com/Amir22010/ssd_detectors/blob/master/

# SSD-based object and text detection with Keras This repository contains the implementation of various approaches to object detection in general and text detection/recognition in particular. Its code was initially used to carry out the experiments for the author's master thesis [End-to-End Scene Text Recognition based on Artificial Neural Networks](http://83.169.39.135/thesis/thesis.pdf) and later extended with the implementation of more recent approaches. ## Technical background Most of the ideas used for this project go back to the following papers: #### SSD: Single Shot MultiBox Detector [arXiv:1512.02325](https://arxiv.org/abs/1512.02325) SSD is a generic object detector that does local regression and classification on multiple feature maps of a CNN to predict a dense population of bounding boxes, which are subsequently filtered by a confidence threshold and NMS. #### TextBoxes: A Fast Text Detector with a Single Deep Neural Network [arXiv:1611.06779](https://arxiv.org/abs/1611.06779) TextBoxes is a modification of SSD that uses non-square convolution kernels and prior boxes with a large aspect ratio to better detect horizontal text. #### DSOD: Learning Deeply Supervised Object Detectors from Scratch [arXiv:1708.01241](https://arxiv.org/abs/1708.01241) DSOD is a modification of SSD that uses DenseNet as backbone architecture and thus can be trained form scratch instead of depending on a pretrained VGG-16 model. #### Detecting Oriented Text in Natural Images by Linking Segments [arXiv:1703.06520](https://arxiv.org/abs/1703.06520) SegLink builds on SSD and detects oriented text by locally predicting text segments (objects in SSD) and there linking with each other. The segments (edges) and links (vertices) are considered as a graph and thresholded by confidence. The remaining groups are finally combined to form bounding boxes. #### TextBoxes++: A Single-Shot Oriented Scene Text Detector [arXiv:1801.02765](https://arxiv.org/abs/1801.02765) TextBoxes++ extends TextBoxes for arbitrary oriented text by predicting horizontal bounding boxes as well as quadrilaterals and oriented bounding boxes. It additionally uses the recognition score to eliminate false positives from the detection stage (currently not implemented). #### Focal Loss for Dense Object Detection [arXiv:1708.02002](https://arxiv.org/abs/1708.02002) The focal loss is a dynamically weighted version of the cross entropy loss that can better handle a large imbalance between the classes and focus the training process on the difficult samples. It can be applied to the aforementioned detectors, instead of hard negative mining, to overcome the dominance of the background class. #### An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition [arXiv:1507.05717](https://arxiv.org/abs/1507.05717) CRNN is a relatively simple architecture with some convolutional-pooling blocks, followed by two bidirectional LSTM (GRU in this implementation) layers, which can be trained with a CTC for efficient text recognition. It can be used to read the text in the cropped bounding boxes generated by the text detectors mentioned above. ## Supported datasets Currently supported datasets for object detection are - PASCAL VOC - MS COCO and supported datasets related to text are - ICDAR2015 FST - ICDAR2015 IST - SynthText - MSRA TD500 - SVT - COCO Text For more information about the datasets, see [datasets.ipynb](datasets.ipynb). ## Dependencies For suitable versions of the necessary dependencies, see [environment.ipynb](misc/environment.ipynb). ## Usage The usage of the code is quite straightforward, clone the repository and run the related Jupyter notebooks. Some of the scripts (e.g. for video and model conversion) can also be executed form the command line. ## Pretrained models Pretrained SSD models can be converted from the [original Caffe implementation](https://github.com/weiliu89/caffe/tree/ssd). #### [Converted SSD300 VOC](http://83.169.39.135/ssd_detectors/ssd300_voc_weights_fixed.zip) PASCAL VOC 07+12+COCO SSD300* from Caffe implementation #### [Converted SSD512 VOC](http://83.169.39.135/ssd_detectors/ssd512_voc_weights_fixed.zip) PASCAL VOC 07+12+COCO SSD512* from Caffe implementation #### [Converted SSD300 COCO](http://83.169.39.135/ssd_detectors/ssd300_coco_weights_fixed.zip) COCO trainval35k SSD300* from Caffe implementation #### [Converted SSD512 COCO](http://83.169.39.135/ssd_detectors/ssd512_coco_weights_fixed.zip) COCO trainval35k SSD512* from Caffe implementation #### [SegLink](http://83.169.39.135/ssd_detectors/201809231008_sl512_synthtext.zip) initialized with converted SSD512 weights trained and tested on subsets of SynthText segment_threshold 0.60 link_threshold 0.25 precision 0.884 recall 0.854 f-measure 0.869 #### [SegLink with DenseNet and Focal Loss](http://83.169.39.135/ssd_detectors/201806021007_dsodsl512_synthtext.zip) trained and tested on subsets of SynthText segment_threshold 0.60 link_threshold 0.50 precision 0.937 recall 0.926 f-measure 0.932 #### [TextBoxes++ with DennseNet and Focal Loss](http://83.169.39.135/ssd_detectors/201906190710_dsodtbpp512fl_synthtext.zip) trained and tested on subsets of SynthText threshold 0.35 precision 0.984 recall 0.890 f-measure 0.934 #### [CRNN with LSTM](http://83.169.39.135/ssd_detectors/201806162129_crnn_lstm_synthtext.zip) trained and tested on cropped word level bounding boxes form SynthText mean editdistance 0.332 mean normalized editdistance 0.081 character recogniton rate 0.916 word recognition rate 0.861 #### [CRNN with GRU](http://83.169.39.135/ssd_detectors/201806190711_crnn_gru_synthtext.zip) trained and tested on cropped word level bounding boxes form SynthText mean editdistance 0.333 mean normalized editdistance 0.081 character recogniton rate 0.916 word recognition rate 0.858 ## Demo images #### SSD on PASCAL VOC 2007 test

#### SegLink with DenseNet on SynthText

#### TextBoxes++ with DenseNet on SynthText

#### SegLink with DenseNet, Focal Loss and CRNN end-to-end on SynthText

#### SegLink with DenseNet, Focal Loss and CRNN end-to-end real-time recogniton [

](http://83.169.39.135/ssd_detectors/dsodslcrnn_end2end_record.mp4)

Owner

Name: Amir Khan
Login: Amir22010
Kind: user
Location: India

Repositories: 3
Profile: https://github.com/Amir22010

working on developing a state of art AI solutions mainly in computer vision, chat bots and nlp domain. building an awesome AI as a professional developer 😍.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amir22010/ssd_detectors

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/Amir22010/ssd_detectors/blob/master/

Owner

GitHub Events

Total

Last Year