archives-handwriting-text-extract-project

Project files, scripts, configurations, and workflow publications for the Archives-Textract Test Project

https://github.com/prys0000/archives-handwriting-text-extract-project

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Keywords

archival-research handwriting-ocr handwritten-character-recognition ocr-python ocr-recognition python-script textract-application

Last synced: 9 months ago · JSON representation

Repository

Project files, scripts, configurations, and workflow publications for the Archives-Textract Test Project

Basic Info

Host: GitHub
Owner: prys0000
Language: Python
Default Branch: main
Homepage:
Size: 61.3 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

archival-research handwriting-ocr handwritten-character-recognition ocr-python ocr-recognition python-script textract-application

Created over 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Citation

archives-handwriting-text-extraction

The objective of this project is to create versatile text extraction and cleaning tools available through local application or by Amazon Textract. This flexibility allows the tools to align with a specific repository or project requirements, as well as facilitate local file processing and customization.

Both local and AWS codes extract text from handwritten documents, performs text cleaning operations and saves the extracted and cleaned text to the existing metadata templates used by the repository.

Extracting text from handwritten documents and exporting it to metadata worksheets can significantly enhance the efficiency of processing archival collections. Here's how:

1. Time Efficiency:

Automated text extraction eliminates the need for manual transcription, saving a significant amount of time.

2. Bulk Processing:

Automation enables bulk processing, allowing the extraction of text from multiple documents simultaneously.

3. Efficient Review:

Archivists can quickly scan the extracted text for keywords, names, or dates to determine the document's significance without reading every page.

4. Cross-Collection Analysis:

Extracted text can be used for cross-collection analysis.
Researchers can analyze trends, topics, and themes across different collections, leading to deeper insights.

By integrating text extraction and metadata creation, archival processing becomes more streamlined, accessible, and conducive to meaningful research. Automation empowers archivists to manage and leverage archival content more effectively, ultimately enhancing the value and impact of the collection.

student contributors (graduate and undergraduate)

See acknowledgements for more information

communication

email: japryse@ou.edu or cacarchives@ou.edu
homepage: carl albert center archives
twitter: @CarlAlbertCtr
finding aid: https://arc.ou.edu/

license

See LICENSE for more information.

Owner

Name: JA Pryse
Login: prys0000
Kind: user
Location: 73019
Company: University of Oklahoma - Carl Albert Center Archives

Repositories: 1
Profile: https://github.com/prys0000

JA Pryse is the Senior Archivist at the Carl Albert Center’s Congressional Archives.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science