pyonb

Python SDK for OnBase REST API

https://github.com/safehr-data/pyonb

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Python SDK for OnBase REST API

Basic Info
  • Host: GitHub
  • Owner: SAFEHR-data
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.18 MB
Statistics
  • Stars: 1
  • Watchers: 4
  • Forks: 0
  • Open Issues: 8
  • Releases: 0
Created about 1 year ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

pyonb

[!WARNING] This repo is under construction.

pyonb is two things:

  • a Python SDK for document extraction via the Hyland OnBase REST API (work in progress)
  • a suite of APIs wrapped around open-source Optical Character Recognition (OCR) tools, designed for local deployment, for converting PDFs to structured text including:

Getting Started

Prerequisites

pyonb requires Docker and Docker Compose.

Installation & Usage

  1. Rename .env.sample to .env.

  2. Edit .env with the correct HOST_DATA_FOLDER location, e.g.:

```sh HOSTDATAFOLDER="/absolute/path/to/documents/folder"

e.g. for unit tests on GAE:

HOSTDATAFOLDER="/gae/pyonb/tests/data/singlesyntheticdoc"

```

  1. Set OCR service ports, e.g.:

sh OCR_FORWARDING_API_PORT=8110 MARKER_API_PORT=8112 SPARROW_API_PORT=8001 DOCLING_API_PORT=8115

[!IMPORTANT] For GAE usage, set OCR service ports and UCLH proxy details:

sh http_proxy= https_proxy= HTTPS_PROXY= HTTP_PROXY=

  1. Start the OCR API Server (e.g. using marker and docling):

sh docker compose --profile marker --profile docling up -d

  1. Open FastAPI Swagger at http://127.0.0.1:8110/docs to view and execute endpoints.

Use the following POST endpoints to execute the chosen OCR tool on a PDFs:

  • marker - POST /marker/inference_single
  • docling - POST /docling/inference_single
  1. View the JSON response:

| | | :-------------------------------------------------------------: | | OCR Server JSON response |

Developer Tips

  • Alternatively to Swagger, use Postman to construct, save and make your API requests.

Tests

  1. Clone the repo:

sh git clone https://github.com/SAFEHR-data/pyonb.git

  1. Create a virtual environment (we suggest using uv) and install dependencies:

sh uv venv --python3.12 source .venv/bin/activate uv sync

  1. Copy the tests/ .env file to root directory to use with tox:

sh cp /tests/.env.tests .env

  1. Start the Docker services:

sh docker compose --profile marker --profile docling up -d

  1. Run tests using tox:

sh tox -e py312

NB: this may take a few minutes to perform the inference tests. Some may fail depending on which OCR tools you choose to raise. For example, with --profile marker --profile docling the Sparrow API will not be raised, so the associated tests will fail.

To run unit tests individually, adapt the following:

sh tox -e py312 -- tests/api/test_routers.py::test_inference_single_file_upload_marker

About

Project Team

  • Arman Eshaghi
  • Tom Roberts (tom.roberts@ucl.ac.uk)
  • Kawsar Noor
  • Lawrence Lai
  • Stefan Piatek
  • Richard Dobson
  • Steve Harris
  • Sarah Keating

Acknowledgements

This work was funded by the National Institute for Health and Care Research (NIHR, award code NIHR302495).

This project is developed in collaboration with the Centre for Advanced Research Computing, University College London.

Owner

  • Name: SAFEHR
  • Login: SAFEHR-data
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: software
authors:
  - family-names: "Roberts"
    given-names: "Tom"
    email: "tom.roberts@ucl.ac.uk"
repository-code: "https://github.com/SAFEHR-data/pyonb"
title: "pyonb: Python SDK for OnBase REST API"
license: "MIT"

GitHub Events

Total
  • Create event: 5
  • Issues event: 13
  • Delete event: 4
  • Member event: 1
  • Issue comment event: 24
  • Push event: 50
  • Public event: 1
  • Pull request event: 6
  • Pull request review comment event: 4
  • Pull request review event: 12
Last Year
  • Create event: 5
  • Issues event: 13
  • Delete event: 4
  • Member event: 1
  • Issue comment event: 24
  • Push event: 50
  • Public event: 1
  • Pull request event: 6
  • Pull request review comment event: 4
  • Pull request review event: 12

Dependencies

.github/workflows/docs.yml actions
  • actions/cache 6849a6489940f00c2f30c0fb92c6274307ccb58a composite
  • actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
  • actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
  • peaceiris/actions-gh-pages 4f9cc6602d3f66b9c108549d475ec49e8ef4d45e composite
.github/workflows/linting.yml actions
  • actions/cache 6849a6489940f00c2f30c0fb92c6274307ccb58a composite
  • actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
  • actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
.github/workflows/tests.yml actions
  • actions/cache 6849a6489940f00c2f30c0fb92c6274307ccb58a composite
  • actions/checkout 11bd71901bbe5b1630ceea73d27597364c9af683 composite
  • actions/setup-python 0b93645e9fea7318ecaed2b359559ac225c90a2b composite
docker-compose-postgres.yml docker
  • postgres 17.4
docker-compose.yml docker
src/api/Dockerfile docker
  • python 3.11.9-slim-bookworm build
src/ocr/docling/Dockerfile docker
  • python 3.12-slim-bookworm build
src/ocr/docling/docker-compose.yml docker
src/ocr/marker/Dockerfile docker
  • python 3.12-slim-bookworm build
src/ocr/marker/docker-compose.yml docker
src/ocr/sparrow/Dockerfile docker
  • python 3.10.4 build
pyproject.toml pypi
  • marker-pdf [full]==1.6.2
  • pytest >=8.3.4
src/api/requirements.txt pypi
  • fastapi *
  • requests *
  • uvicorn *
src/ocr/docling/requirements.txt pypi
  • docling *
  • python-dotenv *
src/ocr/marker/requirements.txt pypi
  • accelerate *
  • fastapi *
  • marker-pdf *
  • ollama *
  • python-dotenv *
  • requests *
  • uvicorn *
uv.lock pypi
  • 167 dependencies