@digitallinguistics/dlx2html

A JavaScript library for converting linguistic data to HTML

https://github.com/digitallinguistics/dlx2html

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary

Keywords

digital-humanities digital-linguistics glossing language language-documentation linguistics morphology

Keywords from Contributors

interactive mesh interpretability profiles sequences generic projection optim embedded hacking
Last synced: 6 months ago · JSON representation ·

Repository

A JavaScript library for converting linguistic data to HTML

Basic Info
  • Host: GitHub
  • Owner: digitallinguistics
  • License: mit
  • Language: JavaScript
  • Default Branch: main
  • Homepage:
  • Size: 392 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 27
  • Releases: 8
Topics
digital-humanities digital-linguistics glossing language language-documentation linguistics morphology
Created about 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

dlx2html

A JavaScript library for converting linguistic data to HTML for presenting on the web.

Written using modern ES modules, useable in both Node and the browser.

When writing about linguistic data, linguists use a format called an interlinear glossed example which shows each of the parts of a word (morphemes) and their meanings. This allows people who are not familiar with the language under discussion to read the examples and understand their structure and meaning. Below is a very simple example from Swahili:

```txt

Swahili

ninaenda ni-na-end-a 1SG-PRES-go-IND I am going ```

These interlinear glossed examples follow a very specific format, originally specified in the Leipzig Glossing Rules. Another specification, called DaFoDiL, formalizes how such data should be structured when being stored as JSON or worked with as a plain old JavaScript object (POJO).

The dlx2html library takes one or more interlinear glosses in the DaFoDiL format and converts them to HTML for representing linguistic examples on the web.

The dlx2html library does not add any styling to the output HTML. Users should either add their own CSS styles, or use the compatible Digital Linguistics Style Library. The structure of the output HTML and CSS classes are described below.

If using this library for research, please cite it using the model below:

Hieber, Daniel W. {year}. @digitallinguistics/dlx2html. https://github.com/digitallinguistics/dlx2html/. DOI: 10.5281/zenodo.10720085.

Samples

The following pages demo the HTML output from the library. They are styled using the DLx styles library.

Usage

This library is written in JavaScript, and may be run as either a Node.js module or as a script in the browser. See the Node.js learning path for more information about Node.js, how to install it, and how to run programs with it.

Node.js

To use dlx2html in Node:

  1. Install the package.

    ```cmd npm install @digitallinguistics/dlx2html

    OR

    yarn add @digitallinguistics/dlx2html ```

  2. Import the package and use it to convert the data to HTML.

    ```js // Import the dlx2html module. import convert from '@digitallinguistics/dlx2html' import { readFile } from 'node:fs/promises'

    // Load the data from a JSON-formatted DaFoDiL file. const json = await readFile(examples.json, utf-8) const data = JSON.parse(json)

    // Convert the text to HTML. const html = convert(data, { /* specify options here */ })

    console.log(html) //

    ...
    ```

Browser

To use dlx2html in the browser:

  1. Download the latest version of the library from the releases page. Copy the dlx2html.js file to your project.

  2. Import and use the script in your code:

    ```html ```

API

Calling the dlx2html function returns an HTML string.

Options

| Option | type | Default | Description | | --------------- | ------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | abbreviations | Object | {} | An object hash providing the full descriptions of gloss abbreviations, e.g. "sg" => "singular". If present, these will be used to populate the title attribute of <abbr> elements for glosses. Note that the abbreviations are case-sensitive. | | analysisLang | String | undefined | An IETF language tag to use as the default value of the lang attribute for any data in the analysis language (metadata, literal translation, free translation, glosses, literal word translation). If undefined, no lang tag is added, which means that browsers will assume that the analysis language is the same as the HTML document. | | classes | Array | ['igl'] | An array of classes to apply to the wrapper element. | | glosses | Boolean | false | Options for wrapping glosses in <abbr> tags.

If set to false (default), no <abbr> tags are added to the glosses.

If set to true, an <abbr> tag is wrapped around any glosses in CAPS, any numbers, and any of sg, du, or pl (lowercased). Note that text within the <abbr> will be converted to lowercase, since by convention glosses are rendered in smallcaps (and uppercase letters display as normal uppercase letters even when in smallcaps). | | tag | String | 'div' | The HTML tag to wrap each interlinear gloss in. Can also be a custom tag (useful for HTML custom elements). | | targetLang | String | undefined | An IETF language tag to use as the default value of the lang attribute for any data in the target language. |

HTML Structure

This section describes the structure of the HTML output by this library, and the classes added to the HTML elements. You can see sample HTML output by the program in the samples/ folder, as well as the DLx Styles library.

Note: The output HTML does not contain much extraneous whitespace and therefore is not very human-readable. If you want more readable output, use a formatting library like Prettier to format the result.

Each utterance/example in the original data is wrapped in a <div class=igl> element by default. You can customize both the tag that is used for the wrapper and the classes applied to it with the tag and classes options. For example, to wrap each utterance in <li class=interlinear>, you would provide the following options:

js const options = { classes: [`interlinear`], tag: `li` }

You can apply three different types of emphasis to the data:

| Scription | HTML Output | Renders As | | ------------ | ----------------------- | --------------------- | | ***text*** | <strong>text</strong> | text | | **text** | <em>text</em> | text | | *text* | <b>text</b> | text | | _text_ | <u>text</u> | text |

Additional Notes

  • The language code (\lg) is not displayed. It is merely used to set the lang attribute on elements where appropriate. To display the language of an utterance, use the metadata field (\#).
  • The speaker (\sp) and source (\s) data are combined into a single element strutured as follows: <p class=ex-source>{speaker} ({source})</p>.
  • Notes fields (\n) are not added to the HTML by default.
  • Individual glosses receive the .gl class.

CSS

The CSS classes for each line type are as follows:

| Line | CSS Class | | ---------------------- | ----------- | | metadata | ex-header | | source | ex-source | | transcript | trs | | phonemic transcription | txn | | phonetic transcription | phon | | word transcription | w | | morphemic analysis | m | | glosses | glosses | | literal translation | lit | | timespan | timespan | | free translation | tln | | word translation | wlt |

If the language of the text is specified, it is set as the value of the lang attribute for data in the target language wherever relevant. Whenever the language of analysis data (metadata, glosses, translations, etc.) is specified, it is passed through to the lang attribute of the relevant analysis language elements (<p class=tln lang=en>).

When the data occurs in multiple orthographies, the orthography of the data is specified in the data-ortho attribute. For example, the following data is transformed to the HTML that follows:

txt \trs-Modern Wetkx hus naancaakamankx wetk hi hokmiqi. \trs-Swadesh wetkšˊ husˊ na·nča·kamankšˊ wetkˊ hi hokmiʔiˊ. \tln He left his brothers.

html <p class=trs data-ortho=Modern>Wetkx hus naancaakamankx wetk hi hokmiqi.</p> <p class=trs data-ortho=Swadesh>wetkšˊ husˊ na·nča·kamankšˊ wetkˊ hi hokmiʔiˊ.</p> <p class=tln>He left his brothers.</p>

Owner

  • Name: Digital Linguistics
  • Login: digitallinguistics
  • Kind: organization

The science of managing linguistic data, digitally

Citation (CITATION.cff)

authors:
  - family-names: Hieber
    given-names:  Daniel W.
    orcid:        'https://orcid.org/0000-0002-1411-3773'
cff-version:     1.2.0
date-released:   2024-02-27
doi:             '10.5281/zenodo.10720085'
license:         MIT
repository-code: 'https://github.com/digitallinguistics/dlx2html'
title:           '@digitallinguistics/dlx2html'
type:            software
version:         0.4.0

GitHub Events

Total
  • Pull request event: 2
  • Create event: 1
Last Year
  • Pull request event: 2
  • Create event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 101
  • Total Committers: 2
  • Avg Commits per committer: 50.5
  • Development Distribution Score (DDS): 0.01
Past Year
  • Commits: 6
  • Committers: 2
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.167
Top Committers
Name Email Commits
Daniel W. Hieber d****b@g****m 100
dependabot[bot] 4****] 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 71
  • Total pull requests: 6
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.1
  • Average comments per pull request: 0.17
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 4
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 4
Top Authors
Issue Authors
  • dwhieb (54)
Pull Request Authors
  • dependabot[bot] (7)
Top Labels
Issue Labels
🆕 enhancement (26) 🌠 wish list (9) 🧑🏼‍💻 development (7) enhancement (4) development (3) docs (2) bug (1) 🛑 blocked (1) blocked (1) dependencies (1)
Pull Request Labels
dependencies (5)

Packages

  • Total packages: 1
  • Total downloads:
    • npm 13 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
npmjs.org: @digitallinguistics/dlx2html

A JavaScript library for converting linguistic data to HTML

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 13 Last month
Rankings
Dependent repos count: 33.3%
Average: 40.5%
Dependent packages count: 47.7%
Maintainers (1)
Last synced: 7 months ago