https://github.com/bertsky/ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary

Keywords

dla ocr-d olr

Keywords from Contributors

ocr

Last synced: 5 months ago · JSON representation

Repository

OCR-D wrapper for detectron2 based segmentation models

Basic Info

Host: GitHub
Owner: bertsky
Language: Python
Default Branch: master
Homepage:
Size: 1.6 GB

Statistics

Stars: 17
Watchers: 2
Forks: 5
Open Issues: 6
Releases: 10

Topics

dla ocr-d olr

Created about 4 years ago · Last pushed 10 months ago

Metadata Files

Readme Changelog

ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

Introduction
Installation
Usage
- OCR-D processor interface ocrd-detectron2-segment
Models
Testing
- Test results

Introduction

This offers OCR-D compliant workspace processors for document layout analysis with models trained on Detectron2, which implements Faster R-CNN, Mask R-CNN, Cascade R-CNN, Feature Pyramid Networks and Panoptic Segmentation, among others.

In trying to cover a broad range of third-party models, a few sacrifices have to be made: Deployment of models may be difficult, and needs configuration. Class labels (really PAGE-XML region types) must be provided. The code itself tries to cope with panoptic and instance segmentation models (with or without masks).

Only meant for (coarse) page segmentation into regions – no text lines, no reading order, no orientation.

Installation

Create and activate a virtual environment as usual.

To install Python dependencies:

make deps

Which is the equivalent of:

pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html # for CUDA 11.3
pip install -r requirements.txt -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.10/index.html # for CPU only

To install this module, then do:

make install

Which is the equivalent of:

pip install .

Alternatively, you can use the provided Docker image (either from Github Container Registry or from Dockerhub):

docker pull bertsky/ocrd_detectron2
# or
docker pull ghcr.io/bertsky/ocrd_detectron2

Usage

OCR-D processor interface `ocrd-detectron2-segment`

To be used with PAGE-XML documents in an OCR-D annotation workflow.

``` Usage: ocrd-detectron2-segment [OPTIONS]

Detect regions with Detectron2 models

Use detectron2 to segment each page into regions.

Open and deserialize PAGE input files and their respective images. Fetch a raw and a binarized image for the page frame (possibly cropped and deskewed).

Feed the raw image into the detectron2 predictor that has been used to load the given model. Then, depending on the model capabilities (whether it can do panoptic segmentation or only instance segmentation, whether the latter can do masks or only bounding boxes), post-process the predictions:

panoptic segmentation: take the provided segment label map, and apply the segment to class label map,

instance segmentation: find an optimal non-overlapping set (flat map) of instances via non-maximum suppression,

both: avoid overlapping pre-existing top-level regions (incremental segmentation).

Then extend / shrink the surviving masks to fully include / exclude connected components in the foreground that are on the boundary.

(This describes the steps when postprocessing is full. A value of only-nms will omit the morphological extension/shrinking, while only-morph will omit the non-maximum suppression, and none will skip all postprocessing.)

Finally, find the convex hull polygon for each region, and map its class id to a new PAGE region type (and subtype).

(Does not annotate ReadingOrder or TextLines or @orientation.)

Produce a new output file by serialising the resulting hierarchy.

Options: -I, --input-file-grp USE File group(s) used as input -O, --output-file-grp USE File group(s) used as output -g, --page-id ID Physical page ID(s) to process --overwrite Remove existing output pages/images (with --page-id, remove only those) --profile Enable profiling --profile-file Write cProfile stats to this file. Implies --profile -p, --parameter JSON-PATH Parameters, either verbatim JSON string or JSON file path -P, --param-override KEY VAL Override a single JSON object key-value pair, taking precedence over --parameter -m, --mets URL-PATH URL or file path of METS to process -w, --working-dir PATH Working directory of local workspace -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE] Log level -C, --show-resource RESNAME Dump the content of processor resource RESNAME -L, --list-resources List names of processor resources -J, --dump-json Dump tool description as JSON and exit -D, --dump-module-dir Output the 'module' directory with resources for this processor -h, --help This help message -V, --version Show version

Parameters: "operationlevel" [string - "page"] hierarchy level which to predict and assign regions for Possible values: ["page", "table"] "categories" [array - REQUIRED] maps each category (class index) of the model to a PAGE region type (and @type or @custom if separated by colon), e.g. ['TextRegion:paragraph', 'TextRegion:heading', 'TextRegion:floating', 'TableRegion', 'ImageRegion'] for PubLayNet; categories with an empty string will be skipped during prediction "modelconfig" [string - REQUIRED] path name of model config "modelweights" [string - REQUIRED] path name of model weights "minconfidence" [number - 0.5] confidence threshold for detections "postprocessing" [string - "full"] which postprocessing steps to enable: by default, applies a custom non-maximum suppression (to avoid overlaps) and morphological operations (using connected component analysis on the binarized input image to shrink or expand regions) Possible values: ["full", "only-nms", "only-morph", "none"] "debugimg" [string - "none"] paint an AlternativeImage which blends the input image and all raw decoded region candidates Possible values: ["none", "instancecolors", "instancecolorsonly", "category_colors"] "device" [string - "cuda"] select computing device for Torch (e.g. cpu or cuda:0); will fall back to CPU if no GPU is available ```

Example:

# download one preconfigured model:
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.yaml
ocrd resmgr download ocrd-detectron2-segment TableBank_X152.pth
# run it (setting model_config, model_weights and categories):
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -P categories '["TableRegion"]' -P model_config TableBank_X152.yaml -P model_weights TableBank_X152.pth -P min_confidence 0.1
# run it (equivalent, with presets file)
ocrd-detectron2-segment -I OCR-D-BIN -O OCR-D-SEG-TAB -p presets_TableBank_X152.json -P min_confidence 0.1 
# download all preconfigured models
ocrd resmgr download ocrd-detectron2-segment "*"

For installation via Docker, usage is bascially the same as above – with some modifications:

# For data persistency, decide which host-side directories you want to mount in Docker:
DATADIR=/host-side/path/to/data
MODELDIR=/host-side/path/to/models
# Either you "log in" to a container first:
docker run -v $DATADIR:/data -v $MODELDIR:/usr/local/share/ocrd-resources -it bertsky/ocrd_detectron2 bash
# and then can use the above commands verbatim
...
# Or you spin up a new container each time,
# which means prefixing the above commands with
docker run -v $DATADIR:/data -v $MODELDIR:/usr/local/share/ocrd-resources bertsky/ocrd_detectron2 ...

Debugging

If you mistrust your model, and/or this tool's additional postprocessing, try playing with the runtime parameters:

Set debug_img to some value other than none, e.g. instance_colors_only. This will generate an image which overlays the raw predictions with the raw image using Detectron2's internal visualiser. The parameter settings correspond to its ColorMode. The AlternativeImages will have @comments="debug", and will also be referenced in the METS, which allows convenient browsing with OCR-D Browser. (For example, open the Page View and Image View side by side, and navigate to your output fileGrp on each.)
Selectively disable postprocessing steps: from the default full via only-nms (first stage) or only-morph (second stage) to none.
Lower min_confidence to get more candidates, raise to get fewer.

Models

Some of the following models have already been registered as known file resources, along with parameter presets to use them conveniently.

To get a list of registered models available for download, do:

ocrd resmgr list-available -e ocrd-detectron2-segment

To get a list of already installed models and presets, do:

ocrd resmgr list-installed -e ocrd-detectron2-segment

To download a registered model (i.e. a config file and the respective weights file), do:

ocrd resmgr download ocrd-detectron2-segment NAME.yaml
ocrd resmgr download ocrd-detectron2-segment NAME.pth

To download more models (registered or other), see:

ocrd resmgr download --help

To use a model, do:

ocrd-detectron2-segment -P model_config NAME.yaml -P model_weights NAME.pth -P categories '[...]' ...
ocrd-detectron2-segment -p NAME.json ... # equivalent, with presets file

To add (i.e. register) a new model, you first have to find: - the classes it is trained on, so you can then define a mapping to PAGE-XML region (and subregion) types, - a download link to the model config and model weights file. Archives (zip/tar) are allowed, but then you must also specify the file paths to extract.

Assuming you have done so, then proceed as follows:

# from local file path
ocrd resmgr download -n path/to/model/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n path/to/model/weights.pth ocrd-detectron2-segment NAME.pth
# from single file URL
ocrd resmgr download -n https://path.to/model/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n https://path.to/model/weights.pth ocrd-detectron2-segment NAME.pth
# from zip file URL
ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/config.yml ocrd-detectron2-segment NAME.yml
ocrd resmgr download -n https://path.to/model/arch.zip -t archive -P zip-path/to/weights.pth ocrd-detectron2-segment NAME.pth
# create corresponding preset file
echo '{"model_weights": "NAME.pth", "model_config": "NAME.yml", "categories": [...]}' > NAME.json
# install preset file so it can be used everywhere (not just in CWD):
ocrd resmgr download -n NAME.json ocrd-detectron2-segment NAME.json
# now the new model can be used just like the preregistered models
ocrd-detectron2-segment -p NAME.json ...

What follows is an overview of the preregistered models (i.e. available via resmgr).

Note: These are just examples, no exhaustive search was done yet!

Note: The filename suffix (.pth vs .pkl) of the weight file does matter!

TableBank

X152-FPN config|weights|["TableRegion"]

TableBank

X152-FPN config|weights|["TableRegion"]

PubLayNet

R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

X101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

PubLayNet

R50-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

R101-FPN config|weights|["TextRegion:paragraph", "TextRegion:heading", "TextRegion:floating", "TableRegion", "ImageRegion"]

LayoutParser

provides different model variants of various depths for multiple datasets: - PubLayNet (Medical Research Papers) - TableBank (Tables Computer Typesetting) - PRImALayout (Various Computer Typesetting)
R50-FPN config|weights|["Background","TextRegion","ImageRegion","TableRegion","MathsRegion","SeparatorRegion","LineDrawingRegion"] - HJDataset (Historical Japanese Magazines) - NewspaperNavigator (Historical Newspapers) - Math Formula Detection

See here for an overview, and here for the model files. You will have to adapt the label map to conform to PAGE-XML region (sub)types accordingly.

PubLaynet finetuning

(pre-trained on PubLayNet, fine-tuned on a custom, non-public GT corpus of 500 pages 20th century magazines)

X101-FPN config|weights|["TextRegion:caption","ImageRegion","TextRegion:page-number","TableRegion","TextRegion:heading","TextRegion:paragraph"]

DocBank

X101-FPN archive

Proposed mappings: - ["TextRegion:header", "TextRegion:credit", "TextRegion:caption", "TextRegion:other", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:floating", "TextRegion:paragraph", "TextRegion:endnote", "TextRegion:heading", "TableRegion", "TextRegion:heading"] (using only predefined @type) - ["TextRegion:abstract", "TextRegion:author", "TextRegion:caption", "TextRegion:date", "MathsRegion", "GraphicRegion", "TextRegion:footer", "TextRegion:list", "TextRegion:paragraph", "TextRegion:reference", "TextRegion:heading", "TableRegion", "TextRegion:title"] (using @custom as well)

Testing

To install Python dependencies and download some models:

make deps-test

Which is the equivalent of:

pip install -r requirements-test.txt
make models-test

To run the tests, then do:

make test

You can inspect the results under test/assets/*/data under various new OCR-D-SEG-* fileGrps. (Again, it is recommended to use OCR-D Browser.)

Finally, to remove the test data, do:

make clean

Test results

These tests are integrated as a Github Action. Its results can be viewed here.

Owner

Name: Robert Sachunsky
Login: bertsky
Kind: user

Repositories: 114
Profile: https://github.com/bertsky

GitHub Events

Total

Release event: 1
Watch event: 1
Push event: 14
Pull request event: 2
Create event: 2

Last Year

Release event: 1
Watch event: 1
Push event: 14
Pull request event: 2
Create event: 2

Committers

Last synced: over 1 year ago

All Time

Total Commits: 96
Total Committers: 4
Avg Commits per committer: 24.0
Development Distribution Score (DDS): 0.292

Past Year

Commits: 7
Committers: 2
Avg Commits per committer: 3.5
Development Distribution Score (DDS): 0.286

Top Committers

Name	Email	Commits
Robert Sachunsky	s**y@i**e	68
Robert Sachunsky	3****y	25
Stefan Weil	sw@w****e	2
Konstantin Baierer	k****a	1

Committer Domains (Top 20 + Academic)

weilnetz.de: 1 informatik.uni-leipzig.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 19
Total pull requests: 12
Average time to close issues: 6 months
Average time to close pull requests: about 1 month
Total issue authors: 6
Total pull request authors: 4
Average comments per issue: 5.21
Average comments per pull request: 1.08
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: 14 days
Average time to close pull requests: about 5 hours
Issue authors: 1
Pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

bertsky (9)
stefanCCS (4)
stweil (3)
masc-it (1)
joschrew (1)
witobejmak (1)

Pull Request Authors

stweil (4)
bertsky (4)
kba (3)
JBBalling (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 85 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 9
Total maintainers: 1

pypi.org: ocrd-detectron2

OCR-D wrapper for detectron2 based segmentation models

Homepage: https://github.com/bertsky/ocrd_detectron2
Documentation: https://ocrd-detectron2.readthedocs.io/
License: MIT
Latest release: 0.1.8
published over 2 years ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 85 Last month

Rankings

Dependent packages count: 10.1%

Forks count: 14.2%

Stargazers count: 15.3%

Average: 17.2%

Dependent repos count: 21.6%

Downloads: 25.0%

Maintainers (1)

bertsky

Last synced: 6 months ago

Dependencies

requirements.txt pypi

click >=7.0
detectron2 >=0.6
numpy >=1.17.0
ocrd >=2.40
pillow >=7.1.2
scikit-image >=0.17.2
scipy *
shapely *
torch >=1.10.0
torchvision >=0.11.2

.github/workflows/docker-image.yml actions

actions/checkout v3 composite
docker/login-action v2 composite
docker/setup-buildx-action v2 composite

.github/workflows/python-app.yml actions

actions/cache v3 composite
actions/checkout 24cb9080177205b6e8c946b17badbe402adc938f composite
actions/checkout v3 composite
actions/download-artifact 9bc31d5ccc31df68ecc42ccf4149144866c47d8a composite
actions/setup-python v3 composite
actions/upload-artifact v3 composite
lhotari/action-upterm v1 composite
stefanzweifel/git-auto-commit-action v4 composite

Dockerfile docker

ocrd/core-cuda latest build

requirements-test.txt pypi

ocrd_wrap * test
pytest * test

setup.py pypi

https://github.com/bertsky/ocrd_detectron2

Science Score: 46.0%

Keywords

Keywords from Contributors

Basic Info

Statistics

Topics

Metadata Files

ocrd_detectron2

Introduction

Installation

Usage

OCR-D processor interface ocrd-detectron2-segment

Debugging

Models

Testing

Test results

GitHub Events

Total

Last Year

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: ocrd-detectron2

Rankings

Maintainers (1)

Dependencies

OCR-D processor interface `ocrd-detectron2-segment`