faust-gen-html

Pipelines to generate HTML for the Faust edition's reading texts and prints.

https://github.com/faustedition/faust-gen-html

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Pipelines to generate HTML for the Faust edition's reading texts and prints.

Basic Info
  • Host: GitHub
  • Owner: faustedition
  • Language: XSLT
  • Default Branch: master
  • Size: 12.2 MB
Statistics
  • Stars: 1
  • Watchers: 7
  • Forks: 4
  • Open Issues: 75
  • Releases: 0
Created about 11 years ago · Last pushed 10 months ago
Metadata Files
Readme Citation

README.md

Faust Edition Text Generation

This project contains processing steps for generating the more reading-text, less diplomatic text representations of the Faust edition and most other generated or converted data, except for the diplomatic text representations

This is work in progress.

Usage

This is mainly used as a submodule to https://gihub.com/faustediion/faust-gen – the easiest way to run this is to checkout that module including all submodules and running mvn -Pxproc there.

Alternatively, you need:

  • The XProc processor XML Calabash. The scripts have only been tested with version 1.0.25-96.
  • A local copy of the XML folder of the Faust edition.

You should then clone this repository and edit the configuration file, config.xml as you see fit (e.g., enter the path to your copy of the Faust data). You could also leave the config file as it is and pass in the relevant parameters as parameters to the XML processor.

To generate all data, run the pipeline generate-all, e.g., using

calabash generate-all.xpl

This will run all processing steps and generate the HTML data in subdirectories of target by default.

Source Code Details

Basically, we need to perform three steps, in order:

  1. Generate a list of witnesses from the metadata in the Faust XML's documents folder
  2. Generate the HTML fragments that form the apparatus
  3. For each text to render, generate the HTML representation (in split and unsplit form).

All steps read config.xml, and all XSLT stylesheets have the parameters defined there available. All parameters from config.xml can also be passed by the usual means of passing parameters to pipelines (like calabash's -p option).

List Witnesses: collect-metadata.xpl

  • no input
  • output is a list of transcripts

The output is a list of <textTranscript> elements, here is an example:

xml <textTranscript xmlns="http://www.faustedition.net/ns" uri="faust://xml/transcript/gsa/391083/391083.xml" href="file:/home/vitt/Faust/transcript/gsa/391083/391083.xml" document="document/maximen_reflexionen/gsa_391083.xml" type="archivalDocument" f:sigil="H P160"> <idno type="bohnenkamp" uri="faust://document/bohnenkamp/H_P160" rank="2">H P160</idno> <idno type="gsa_2" uri="faust://document/gsa_2/GSA_25/W_1783" rank="28">GSA 25/W 1783</idno> <idno type="gsa_1" uri="faust://document/gsa_1/GSA_25/XIX,2,9:2" rank="50">GSA 25/XIX,2,9:2</idno> </textTranscript>

href is the local path to the actual transcript, document is the relative URL to the metadata document. type is either archivalDocument or print. The <idno> elements are ordered by an order of preference defined in the pipeline (depending on type) and recorded in the respective rank attribute.

Generate Apparatus: collate-variants.xpl

  • input is the list of witnesses from the collect-metadata.xpl step
  • output is a large XML document containing all variants of all lines (only useful for debugging purposes)
  • additionally, the variants HTML fragments are written to the variants directory configured in config.xml

This step performs three substeps that are controlled by additional files:

  1. apply-edits.xpl (for each transcript) – TEI preprocessing, see separate section
  2. extract-lines.xsl (for each transcript) – filters only those TEI elements that represent lines used for the apparatus (including descendant nodes), augmenting them with provenance attributes
  3. variant-fragments.xsl – sorts and groups the lines, and transforms them to HTML.

Preprocessing the TEI files: apply-edits.xpl

  • input is one TEI document (transcript)
  • output is one TEI document (transcript) that has been normalized.
  • additionally, there is an “emended” version of the XML version at a secondary output port (called emended-version) that contains the result of steps 1–5.

This removes the genetic markup from the textual transcripts by applying the edits indicated by the markup. Thus, the result represents the last state of the text in the input document.

The document is passed through the following steps:

  1. textTranscrpretranspose.xsl normalizes references inside ge:transpose
  2. textTranscr_transpose.xsl applies transpositions
  3. emend-core.xsl (previous name: textTranscrfuerDrucke.xsl) applies genetic markup (del, corr etc.), performs character normalizations and a set of other normalizations. This also includes the rules for harmonize-antilabes.xsl, which transforms the antilabe encoding that are in the the join form to the part form so we only have to deal with one form in the further processing.
  4. text-emend.xsl applies genetic markup that is using spanTo etc. Attention, this step will remove text if you include delSpan elements that point to a non-existing anchor. The script will print a warning if it detects such a case.
  5. clean-up.xsl removes TEI containers that are empty after the steps above.
  6. prose-to-lines.xsl transforms the <p>-based markup in Trüber Tag. Feld. to a <lg>/<l> based markup as in the verse parts to ease collation.

Generate the master HTML files: print2html.xpl

  • input: a transcript. Additionally, the variants must already exist.
  • option: basename is the name used for the HTML files, relative to the output directory given by the html parameter
  • side effect: sectioned HTML files and a basename.all.html file for the all-in-one document are generated inside the folder specified using the html parameter
  • output: XML page map (see below)

Steps:

  1. apply-edits.xpl, TEI normalization, see above
  2. resolve-pbs.xsl, augments <pb> elements with a normalized page number used
  3. print2html.xsl, the actual transformation to html

The page map

When generating HTML from longer documents, these are split into multiple HTML files along TEI <div> elements. This can be configured from the configuration file.

To find out which page is where, we generate an index that maps faust:// URIs and pages to HTML file names. This is a two-step process, the print2html.xpl pipeline generates an XML summary outlining files and pages of a single document (see pagemap.xsl for details), pagelist2json.xsl converts the information from all these documents to a single JSON file. You can then generate links in the form filename#dtpagenumber to link to the individual files.

Additional source files

  • lesetext.css is a stylesheet that is included in all generated HTML documents.
  • utils.xsl contains a number of functions used by the other stylesheets, e.g., to calculate the variant groups
  • config.xml contains the parameters for all steps
  • sigil-labels.xml contains labels for the sigil types

Experimental stuff

Einblendungsapparat

There is experimental code to generate an Einblendungsapparat as well. This kind of apparatus is based on the first level of the text, not the last, and it signifies later editings in the text in special markup using editorial notes in 〈angled brackets〉. The current implementation is still unfinished and renders only the most frequent editings.

  • apparatus.xsl contains the actual code for generating this kind of visualisation. It includes html-common.xsl, html-frame.xsl and utils.xsl, so most stuff works like in the other visualisations.
  • apparatus.xpl bundles this stylesheet with a few preprocessing steps to form the actual transformation for one document.
  • generate-app.xpl takes the output of collect-metadata.xpl as its input and runs the apparatus pipeline for all the archival documents in there. It also generates an index (just for debugging purposes). This is quite fast, since we neither run the complex preprocessing to get to the last state of the text, nor need to collate variants.

The CSS rules required for the apparatus are currently at the end of lesetext.css. Please again note that this is a moving target.

Owner

  • Name: Digitale Faust-Edition
  • Login: faustedition
  • Kind: organization
  • Location: Frankfurt/Main, Germany

Citation (citations.rnc)

namespace f = "http://www.faustedition.net/ns"

## This schema describes the internal citation format that is the basis for 
## creating the bibliography
start = Citations
Citations =
  element f:citations {
    ( TestimonyIndex |
      Citations |
      StdCitation |
      AppCitation |
      PageCitation
    )*
  }
  
BibUri = xsd:anyURI { pattern="faust://bibliography/\S+" }
FaustUri = xsd:anyURI { pattern="faust://\S+" }
 
StdCitation =
  ## Citation from some witnesses metadata etc.
  element f:citation {
    ## faust:// URI of the witness that cites the given reference 
    attribute from { FaustUri }?,
    ## faust://bibliography/ URI from the reference
    BibUri
  }
  
PageCitation =
  ## Citation from some arbitrary page in the edition
  element f:citation {
    ## absolute path of referring page
    attribute page { xsd:anyURI },
    ## short title for the page
    attribute title { text },
    ## faust://bibliography/ URI from the reference
    BibUri
  }
  
TestimonyIndex = element f:testimony-index {
  TestimonyCitation*
}
  
TestimonyCitation =
  element f:citation {
    ## the testimony's id, e.g. bie3_5817
    attribute testimony { xsd:NCName },
    ## the testimony's number, e.g., 5857
    attribute n { xsd:NMTOKEN },
    ## the label of the testimony's texonomy, e.g., 'Biedermann-Herwig Nr.'
    attribute taxonomy { text },
    ## the bibliography ID referred to
    BibUri 
  }

AppCitation = 
  element f:citation {
    ## the apparatus id
    attribute app { xsd:NMTOKEN },
    attribute section { xsd:int },
    attribute ref { text },
    BibUri
  }

GitHub Events

Total
  • Issues event: 3
  • Push event: 2
Last Year
  • Issues event: 3
  • Push event: 2

Dependencies

build.gradle maven