lepidemo

LECTAUREP Pipeline demonstration to TEI Publisher

https://github.com/lectaurep/lepidemo

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary

Keywords

escriptorium htr pagexml pipeline tei tei-publisher
Last synced: 6 months ago · JSON representation ·

Repository

LECTAUREP Pipeline demonstration to TEI Publisher

Basic Info
  • Host: GitHub
  • Owner: lectaurep
  • License: cc-by-4.0
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 3.43 MB
Statistics
  • Stars: 4
  • Watchers: 0
  • Forks: 2
  • Open Issues: 2
  • Releases: 1
Topics
escriptorium htr pagexml pipeline tei tei-publisher
Created over 4 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License Citation

README.md

License: CC BY 4.0 DOI

LEPIDEMO : LECTAUREP PIPELINE DEMONSTRATOR

Going from eScriptorium to TEI-Publisher

This demonstration shows the implementation of a pipeline going from PAGE XML to TEI Publisher created within the frame of the LECTAUREP project.

LECTAUREP is a project jointly led by Inria (ALMAnaCH) and the Archives nationales de France (DMC). Its purpose is to facilitate the exploration of thousands of pages of directories listing minutes and deeds redacted by Parisians notaries between the beginning of the 19th century and the mid-20th centuries. To do so, LECTAUREP relies on automatic transcription performed with Kraken via the eScriptorium web application.

Images are loaded on the platform, then transcribed and annotated, and finally exported to PAGE XML files. The last section of the pipeline aims at offering users a platform to visualise, querry and read the pages of the directories. An almost ready-to-use solution consist in using TEI-Publisher, which requires transforming the PAGE XML files into compliant TEI XML.

LEPIDEMO demonstrates how this transformation can be plugged into eScriptorium as a simple python script.

A Jupyter notebook

The demonstration can be followed step by step using the lepidemo.ipynb Jupyter scenario.

Installation

  • Create a python virtual environment: `virtualenv -p python3 [ENVIRONMENT NAME]
  • Activate it source [ENVIRONMENT NAME]/bin/activate
  • Then launch Jupyter with jupyter notebook
  • Openlepidemo.ipynb with jupyter browser and then follow cells instructions.

Cite this work

Chagué, A., & Scheithauer, H. LEPIDEMO, a Pipeline Demonstrator for LECTAUREP to go from eScriptorium to TEI-Publisher [Computer software]

Owner

  • Name: LECTAUREP
  • Login: lectaurep
  • Kind: organization
  • Location: France

Citation (CITATION.CFF)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Chagué
    given-names: Alix
    orcid: https://orcid.org/0000-0002-0136-4434
  - family-names: Scheithauer
    given-names: Hugo
    orcid: https://orcid.org/0000-0002-5659-4675
title: "LEPIDEMO, a Pipeline Demonstrator for LECTAUREP to go from eScriptorium to TEI-Publisher"
version: 1.0
doi: 10.5072/zenodo.977657
date-released: 2021-12-07
url: "https://github.com/lectaurep/lepidemo"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Dependencies

requirements.txt pypi
  • bs4 ==0.0.1
  • fuzzywuzzy ==0.18.0
  • jupyter ==1.0.0
  • lxml ==4.2.6
  • requests ==2.26.0
  • tqdm ==4.62.3