Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: latchipatinanireesha
  • Default Branch: main
  • Size: 8.79 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.rst

Description
===========

.. image:: https://github.com/mittagessen/kraken/actions/workflows/test.yml/badge.svg
    :target: https://github.com/mittagessen/kraken/actions/workflows/test.yml

kraken is a turn-key OCR system optimized for historical and non-Latin script
material.

kraken's main features are:

  - Fully trainable layout analysis and character recognition
  - `Right-to-Left `_, `BiDi
    `_, and Top-to-Bottom
    script support
  - `ALTO `_, PageXML, abbyyXML, and hOCR
    output
  - Word bounding boxes and character cuts
  - Multi-script recognition support
  - `Public repository `_ of model files
  - Variable recognition network architecture

Installation
============

kraken only runs on **Linux or Mac OS X**. Windows is not supported.

The latest stable releases can be installed either from `PyPi `_:

::

  $ pip install kraken

or through `conda `_:

::

  $ conda install -c conda-forge -c mittagessen kraken

If you want direct PDF and multi-image TIFF/JPEG2000 support it is necessary to
install the `pdf` extras package for PyPi:

::

  $ pip install kraken[pdf]

or install `pyvips` manually with conda:

::

  $ conda install -c conda-forge pyvips

Conda environment files are provided which for the seamless installation of the
main branch as well:

::

  $ git clone https://github.com/mittagessen/kraken.git 
  $ cd kraken
  $ conda env create -f environment.yml

or:

::

  $ git clone https://github.com/mittagessen/kraken.git 
  $ cd kraken
  $ conda env create -f environment_cuda.yml

for CUDA acceleration with the appropriate hardware.

Finally you'll have to scrounge up a model to do the actual recognition of
characters. To download the default model for printed English text and place it
in the kraken directory for the current user:

::

  $ kraken get 10.5281/zenodo.2577813 

A list of libre models available in the central repository can be retrieved by
running:

::

  $ kraken list

Quickstart
==========

Recognizing text on an image using the default parameters including the
prerequisite steps of binarization and page segmentation:

::

  $ kraken -i image.tif image.txt binarize segment ocr

To binarize a single image using the nlbin algorithm:

::

  $ kraken -i image.tif bw.png binarize

To segment an image (binarized or not) with the new baseline segmenter:

::

  $ kraken -i image.tif lines.json segment -bl
 

To segment and OCR an image using the default model(s):

::

  $ kraken -i image.tif image.txt segment -bl ocr

All subcommands and options are documented. Use the ``help`` option to get more
information.

Documentation
=============

Have a look at the `docs `_.

Related Software
================

These days kraken is quite closely linked to the `escriptorium
`_ project developed in the same eScripta research
group. eScriptorium provides a user-friendly interface for annotating data,
training models, and inference (but also much more). There is a `gitter channel
`_ that is mostly intended for
coordinating technical development but is also a spot to find people with
experience on applying kraken on a wide variety of material.

Funding
=======

kraken is developed at the `École Pratique des Hautes Études `_, `Université PSL `_.

.. container:: twocol

   .. container::

        .. image:: https://raw.githubusercontent.com/mittagessen/kraken/main/docs/_static/normal-reproduction-low-resolution.jpg
          :width: 100
          :alt: Co-financed by the European Union

   .. container::

        This project was partially funded through the RESILIENCE project, funded from
        the European Union’s Horizon 2020 Framework Programme for Research and
        Innovation.


.. container:: twocol

   .. container::

      .. image:: https://projet.biblissima.fr/sites/default/files/2021-11/biblissima-baseline-sombre-ia.png
         :width: 400
         :alt: Received funding from the Programme d’investissements d’Avenir

   .. container::

        Ce travail a bénéficié d’une aide de l’État gérée par l’Agence Nationale de la
        Recherche au titre du Programme d’Investissements d’Avenir portant la référence
        ANR-21-ESRE-0005 (Biblissima+).


Owner

  • Name: NIREESHA
  • Login: latchipatinanireesha
  • Kind: user
  • Location: India
  • Company: nil

CEO

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kiessling"
  given-names: "Benjamin"
  orcid: "https://orcid.org/0000-0001-9543-7827"
title: "The Kraken OCR system"
version: 4.1.2
date-released: 2022-04-12
url: "https://kraken.re"

GitHub Events

Total
Last Year

Dependencies

environment.yml pypi
  • coremltools *
  • file *
pyproject.toml pypi