decimer-segmentation

Chemical structure detection and segmentation tool for Journal articles.

https://github.com/kohulan/decimer-image-segmentation

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 9 committers (11.1%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary

Keywords

chemical-structure decimer-segmentation deep-learning segmented-images segmented-structure-depictions

Keywords from Contributors

standardization meshing pipeline-testing datacleaner pde pinn interpretability data-profilers bridges polygons
Last synced: 6 months ago · JSON representation

Repository

Chemical structure detection and segmentation tool for Journal articles.

Basic Info
  • Host: GitHub
  • Owner: Kohulan
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage: https://decimer.ai
  • Size: 120 MB
Statistics
  • Stars: 115
  • Watchers: 6
  • Forks: 34
  • Open Issues: 1
  • Releases: 14
Topics
chemical-structure decimer-segmentation deep-learning segmented-images segmented-structure-depictions
Created almost 6 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Citation

README.md

DECIMER-Image-Segmentation

License Maintenance GitHub issues GitHub contributors tensorflow DOI GitHub release PyPI version fury.io

Chemistry looks back at many decades of publications on chemical compounds, their structures and properties, in scientific articles. Liberating this knowledge (semi-)automatically and making it available to the world in open-access databases is a current challenge. Apart from mining textual information, Optical Chemical Structure Recognition (OCSR), the translation of an image of a chemical structure into a machine-readable representation, is part of this workflow. As the OCSR process requires an image containing a chemical structure, there is a need for a publicly available tool that automatically recognizes and segments chemical structure depictions from scientific publications. This is especially important for older documents which are only available as scanned pages. Here, we present DECIMER (Deep lEarning for Chemical IMagE Recognition) Segmentation, the first open-source, deep learning-based tool for automated recognition and segmentation of chemical structures from the scientific literature.

The workflow is divided into two main stages. During the detection step, a deep learning model recognizes chemical structure depictions and creates masks which define their positions on the input page. Subsequently, potentially incomplete masks are expanded in a post-processing workflow. The performance of DECIMER Segmentation has been manually evaluated on three sets of publications from different publishers. The approach operates on bitmap images of journal pages to be applicable also to older articles before the introduction of vector images in PDFs.

By making the source code and the trained model publicly available, we hope to contribute to the development of comprehensive chemical data extraction workflows. In order to facilitate access to DECIMER Segmentation, we also developed a web application. The web application, available at https://decimer.ai, lets the user upload a pdf file and retrieve the segmented structure depictions.

GitHub Logo

Usage

  • To use DECIMER Segmentation, clone the repository to your local disk. Mask-RCNN runs on a GPU-enabled PC or simply on CPU, so please do make sure you have all the necessary drivers installed if you are using the GPU.
We recommend to use DECIMER-Segmentation inside a Conda environment to facilitate the installation of the dependencies.
  • Conda can be downloaded as part of the Anaconda or the Miniconda platforms (Python 3.0). We recommend to install miniconda3. Using Linux you can get it with: $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh $ bash Miniconda3-latest-Linux-x86_64.sh ## How to install DECIMER-Segmentation

``` $ git clone https://github.com/Kohulan/DECIMER-Image-Segmentation $ cd DECIMER-Image-Segmentation $ conda create --name DECIMERIMGSEG python=3.10 $ conda activate DECIMERIMGSEG $ conda install pip $ python -m pip install -U pip #Upgrade pip $ pip install . $ conda install -c conda-forge poppler

From Pypi

$ pip install decimer-segmentation ```

The Mask-RCNN Model is available at: DOI

How to use DECIMER-Segmentation

  • The repository contains a script that can be used for the segmentation of chemical structures from an image of a scanned page or from a pdf document: $ python3 segment_structures_in_document.py file_name (the file can be an image of a scanned page or a pdf document)
  • Segmented images are saved in the output folder (which has the name of the pdf file).

  • Alternatively, you can use integrate DECIMER Segmentation in your Python code: ``` from decimersegmentation import segmentchemicalstructures, segmentchemicalstructuresfrom_file import cv2

Segment structures in scanned page image (np.array)

page = cv2.imread(scannedpagefilepath) segments = segmentchemical_structures(page, expand=True)

Segment structures from file (pdf or image)

Windows users may need to specify the location of their poppler installation with the poppler_path argument if they want to process pdf files

segments = segmentchemicalstructuresfromfile(path, expand=True, poppler_path=None)

```

Notes for Windows users:

  • Execute DECIMER_Segmentation.py in the Anaconda Powershell Prompt

  • If you run into an error with the pdf conversion on Windows, you need to download poppler and extract the file.

  • The method segmentchemicalstructuresfromfile() takes a 'poppler_path' argument where the user can specify the path of their poppler installation ('PATH/TO/POPPLER/bin').

Authors

decimer.ai

Citation

Rajan, K., Brinkhaus, H.O., Sorokina, M. et al. DECIMER-Segmentation: Automated extraction of chemical structure depictions from scientific literature. J Cheminform 13, 20 (2021). https://doi.org/10.1186/s13321-021-00496-1

Project page

GitHub Logo

More information about our research group

GitHub Logo

Owner

  • Name: Kohulan Rajan
  • Login: Kohulan
  • Kind: user
  • Location: Jena,Germany
  • Company: Friedrich-Schiller-University

PostDoc @Steinbeck-Lab Currently based at Friedrich-Schiller-University, Jena

GitHub Events

Total
  • Create event: 1
  • Release event: 1
  • Issues event: 11
  • Watch event: 24
  • Delete event: 1
  • Issue comment event: 29
  • Push event: 11
  • Pull request review comment event: 1
  • Pull request review event: 2
  • Pull request event: 6
  • Fork event: 8
Last Year
  • Create event: 1
  • Release event: 1
  • Issues event: 11
  • Watch event: 24
  • Delete event: 1
  • Issue comment event: 29
  • Push event: 11
  • Pull request review comment event: 1
  • Pull request review event: 2
  • Pull request event: 6
  • Fork event: 8

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 243
  • Total Committers: 9
  • Avg Commits per committer: 27.0
  • Development Distribution Score (DDS): 0.461
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Otto Brinkhaus b****s@g****m 131
Kohulan Rajan k****n@u****e 84
dependabot[bot] 4****] 14
Adam Hardy a****y@d****o 4
github-actions[bot] 4****] 3
Adam Hanzlík h****8@g****m 3
Mahnoor Zulfiqar z****4@g****m 2
Dr. Aleksei Krasnov a****v@o****m 1
Adam Hardy 1****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 33
  • Total pull requests: 89
  • Average time to close issues: 3 months
  • Average time to close pull requests: 21 days
  • Total issue authors: 24
  • Total pull request authors: 9
  • Average comments per issue: 2.7
  • Average comments per pull request: 0.72
  • Merged pull requests: 52
  • Bot issues: 0
  • Bot pull requests: 47
Past Year
  • Issues: 7
  • Pull requests: 7
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 14 days
  • Issue authors: 7
  • Pull request authors: 2
  • Average comments per issue: 2.86
  • Average comments per pull request: 0.43
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Iagea (4)
  • rookiexiao123 (3)
  • OBrink (3)
  • jelymewang (2)
  • ad-hardy (2)
  • stud2008 (1)
  • HiteSit (1)
  • HBioquant (1)
  • zhentg (1)
  • Mblakey (1)
  • anamaycerelabs (1)
  • shivanandakandagalla (1)
  • pythonnewbie3 (1)
  • rakykane-pf (1)
  • UniquerWong (1)
Pull Request Authors
  • dependabot[bot] (44)
  • OBrink (21)
  • Kohulan (17)
  • alexey-krasnov (5)
  • github-actions[bot] (3)
  • deimos1078 (2)
  • ad-hardy (1)
  • Iagea (1)
  • zmahnoor14 (1)
Top Labels
Issue Labels
question (9) bug (6) enhancement (2) dependencies (2) wontfix (2) help wanted (1) good first issue (1) documentation (1)
Pull Request Labels
dependencies (46) python (34) javascript (11) enhancement (9) autorelease: tagged (7) bug (5) autorelease: pending (3) documentation (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 681 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 11
  • Total maintainers: 2
pypi.org: decimer-segmentation

DECIMER Segmentation - Extraction of chemical structure depictions from scientific literature

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 681 Last month
Rankings
Forks count: 8.3%
Stargazers count: 8.6%
Dependent packages count: 10.0%
Downloads: 10.5%
Average: 11.8%
Dependent repos count: 21.7%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/check_errors_and_test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • google-github-actions/release-please-action v3 composite
.github/workflows/pypi_release.yml actions
  • actions/checkout master composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
setup.py pypi
  • IPython *
  • imantics *
  • matplotlib *
  • numpy >=1.2.0
  • opencv-python *
  • pdf2image *
  • pillow *
  • scikit-image >=0.2.0
  • tensorflow_os ,