archives-handwriting-text-extract-project

Project files, scripts, configurations, and workflow publications for the Archives-Textract Test Project

https://github.com/prys0000/archives-handwriting-text-extract-project

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary

Keywords

archival-research handwriting-ocr handwritten-character-recognition ocr-python ocr-recognition python-script textract-application
Last synced: 6 months ago · JSON representation

Repository

Project files, scripts, configurations, and workflow publications for the Archives-Textract Test Project

Basic Info
  • Host: GitHub
  • Owner: prys0000
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 61.3 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
archival-research handwriting-ocr handwritten-character-recognition ocr-python ocr-recognition python-script textract-application
Created about 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

archives-handwriting-text-extraction

The objective of this project is to create versatile text extraction and cleaning tools available through local application or by Amazon Textract. This flexibility allows the tools to align with a specific repository or project requirements, as well as facilitate local file processing and customization.

Both local and AWS codes extract text from handwritten documents, performs text cleaning operations and saves the extracted and cleaned text to the existing metadata templates used by the repository.

Extracting text from handwritten documents and exporting it to metadata worksheets can significantly enhance the efficiency of processing archival collections. Here's how:

1. Time Efficiency:

  • Automated text extraction eliminates the need for manual transcription, saving a significant amount of time.

2. Bulk Processing:

  • Automation enables bulk processing, allowing the extraction of text from multiple documents simultaneously.

3. Efficient Review:

  • Archivists can quickly scan the extracted text for keywords, names, or dates to determine the document's significance without reading every page.

4. Cross-Collection Analysis:

  • Extracted text can be used for cross-collection analysis.
  • Researchers can analyze trends, topics, and themes across different collections, leading to deeper insights.

By integrating text extraction and metadata creation, archival processing becomes more streamlined, accessible, and conducive to meaningful research. Automation empowers archivists to manage and leverage archival content more effectively, ultimately enhancing the value and impact of the collection.

student contributors (graduate and undergraduate)

See acknowledgements for more information

communication

license

See LICENSE for more information.

Owner

  • Name: JA Pryse
  • Login: prys0000
  • Kind: user
  • Location: 73019
  • Company: University of Oklahoma - Carl Albert Center Archives

JA Pryse is the Senior Archivist at the Carl Albert Center’s Congressional Archives.

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1