https://github.com/caltechlibrary/htr-test-cases

Images of documents for testing HTR.

https://github.com/caltechlibrary/htr-test-cases

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

handwritten-text-recognition htr machine-learning ocr text-recognition
Last synced: 5 months ago · JSON representation

Repository

Images of documents for testing HTR.

Basic Info
  • Host: GitHub
  • Owner: caltechlibrary
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.27 GB
Statistics
  • Stars: 1
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
handwritten-text-recognition htr machine-learning ocr text-recognition
Created over 6 years ago · Last pushed over 5 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Test cases for HTR experiments

This repository contains test images for the Library's studies on handwritten text recognition.

Table of contents

Introduction

The Caltech Library is working on applications of OCR and HTR (handwritten text recognition) to documents stored in the Caltech Archives. The development of software such as Handprint requires test cases in the form of images of documents. This repository holds a collection of such images for the Library's work.

The images are stored in subdirectories that give some indication of their origins and natures; for example, the caltech subdirectory contains images from the Caltech Archives. The sources of individual images are described in associated XML files containing Dublin Core metadata in OAI 2.0 DC format (based on the specification document dated 2015-01-08). There is a separate .xml file for each image file. An XML schema is available elsewhere for the format used to store the Dublin Core metadata.

Installation

There is no software in this repository; it contains only image files, XML files, and text files. You can download the entire set using various methods. One method is to use GitHub's "Download ZIP" link,

https://github.com/caltechlibrary/htr-test-cases/archive/master.zip

in combination with your preferred file download software tool (which could be your browser, or curl, or wget, or similar software). A second method is to use git to clone the repository to your local computer: sh git clone https://github.com/caltechlibrary/htr-test-cases.git

Usage

This is a collection of files. You can use them in whatever way you would use other image files.

Known issues and limitations

None at this time.

Getting help

If you find an issue, please submit it in the GitHub issue tracker for this repository.

Contributing

We would be happy to receive your help and participation with enhancing this collection of test images. Please visit the guidelines for contributing for some tips on getting started.

License

Please see the individual image files and subdirectories for applicable copyright and license information.

Authors and history

Mike Hucka started this collection in 2019, with the help of others at the Caltech Library's DLD group, including Tommy Keswick and Peter Collopy.

Acknowledgments

The vector artwork of as a logo for Handprint was created by Alice Design from the Noun Project. It is licensed under the Creative Commons CC-BY 3.0 license. Mike Hucka slightly modified the original icon graphic file to change the color and reformat it for use as this repository's icon.

This work was funded by the California Institute of Technology Library.


Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels