@digitallinguistics/transliterate

A small JavaScript library for transliterating strings between different orthographies

https://github.com/digitallinguistics/transliterate

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary

Keywords

digital-humanities digital-linguistics dlx linguistics transliteration
Last synced: 6 months ago · JSON representation ·

Repository

A small JavaScript library for transliterating strings between different orthographies

Basic Info
  • Host: GitHub
  • Owner: digitallinguistics
  • License: mit
  • Language: JavaScript
  • Default Branch: main
  • Homepage:
  • Size: 1010 KB
Statistics
  • Stars: 10
  • Watchers: 1
  • Forks: 0
  • Open Issues: 4
  • Releases: 17
Topics
digital-humanities digital-linguistics dlx linguistics transliteration
Created about 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Transliterate

A small JavaScript library for transliterating and/or sanitizing strings. Tested against a variety of edge cases and unusual inputs.

GitHub Release GitHub issues DOI GitHub license GitHub stars

Overview

This library is useful for linguists and data analysts working with language data. It can be used to convert a string from one writing system to another (a process known as transliteration), or to remove unwanted characters or sequences of characters from a string (a process known as sanitization). This library handles common problems that arise during transliteration and sanitization, including bleeding and feeding issues.

Citation & Attribution

This library is maintained by Daniel W. Hieber. You can cite this library with its DOI using the following model:

Hieber, Daniel W. 2019. digitallinguistics/transliterate. DOI: 10.5281/zenodo.2550468.

Each version of this library is archived on this project's Zenodo page.

Installation

Install with npm or yarn:

sh npm install @digitallinguistics/transliterate # npm yarn add @digitallinguistics/transliterate # yarn

Importing the Library

In the browser, include the library in your HTML (adjust the src to point to the location of the transliterate.js file in your project):

html <script src=transliterate.js type=module></script>

In Node, simply import the library:

js import { transliterate } from '@digitallinguistics/transliterate';

Basic Usage

The transliterate library exports an object with four methods:

  • transliterate
  • Transliterator
  • sanitize
  • Sanitizer

The sanitize and Sanitizer exports are essentially just aliases for transliterate and Transliterator respectively.

To transliterate a string, use the transliterate method:

```js // Import the "transliterate" method from the library import { transliterate } from '@digitallinguistics/transliterate';

// The list of substitutions to make const substitutions = { p: b, t: d, k: g, };

// The string to transliterate const input = patak;

// Transliterate the string const output = transliterate(input, substitutions);

console.log(output); // --> "badag" ```

To save a set of transliteration rules for reuse on more than one string, use the Transliterator class:

```js // Import the Transliterator class import { Transliterator } from '@digitallinguistics/transliterate';

// The list of substitutions to use for transliteration const substitutions = { p: b, t: d, k: g, };

// Create a transliterate function that always // applies the same substitutions const transliterate = new Transliterator(substitutions);

// The string to transliterate const input = patak;

// Transliterate the string const output = transliterate(input);

console.log(output); // --> "badag" ```

View the entire API for this library here.

Working with Substitution Rules

The transliterate library already handles several tricky cases on your behalf. For example, say you have the following substitution rules, and want to use them on the string abc:

Input | Output :----:|:-----: a | b b | c

In this case, you probably intend the output to be bcc. But if you apply the a → b rule before the b → c rule, you get the output ccc. This is called a feeding problem. The transliterate library automatically avoids feeding problems, so that you get the expected result bcc rather than ccc.

Now say that you want to apply the following rules to the string abacad.

Input | Output :----:|:-----: a | b ac | d

You probably intend the output to be abdbd. But if you apply the a → b rule before the ac → d rule, you get the output bbbcbd. This is called a bleeding problem. The transliterate library automatically avoids bleeding problems as well, so that you get the expected result abdbd rather than bbbcbd.

Here are some things to remember about how the transliterate library applies substitutions:

  • Longer substitutions are always made first. If you have substitution rules for both ch and c, the library will first substitute all instances of ch with its replacement, followed by all instances of c.

  • If two substitution inputs are the same length, the substitutions will be applied in the order they were passed to the library. For example, if you have the rules ab → d and bc → e, in that order, the ab → d substitutions will be applied first.

Sometimes the way you want to transliterate a character or sequence of characters will depend on context. For example, you might want a to sometimes become b, and other times become c. In this case you have several options:

  • Update the original text to indicate the difference. For example, you might change all the as that you want to become cs to ɑ or maybe ac or aa or \a, or whatever makes sense for your project.

  • Update the substitution rules to take more context into account. For example, if a becomes b before c and becomes d elsewhere, you could write your rules like this:

Input | Output :----:|:-----: ab | c a | d

  • Update both the original text and the subsitution rules. For example, you could update the original text to indicate syllable boundaries, and then update your substitution rules to use those boundaries. For instance, the sequence abc could be syllabified as a.bc or ab.c. After updating the original text with syllable boundaries, you could change your rules to target syllable-initial vs. syllable-final b; for example: .b → d (syllable-initial) and b. → e (syllable-final).

Owner

  • Name: Digital Linguistics
  • Login: digitallinguistics
  • Kind: organization

The science of managing linguistic data, digitally

Citation (CITATION.cff)

authors:
  - family-names: Hieber
    given-names:  Daniel W.
    orcid:        https://orcid.org/0000-0002-1411-3773
    website:      https://github.com/dwhieb
cff-version: 1.2.0
doi:         10.5281/zenodo.2550468
keywords:
  - digital linguistics
  - digital humanities
  - linguistics
  - orthography
  - transliteration
  - writing
license:         MIT
message:         If you use this software, please cite it using these metadata.
repository-code: https://github.com/digitallinguistics/transliterate
title:           DLx Transliterator
url:             https://github.com/digitallinguistics/transliterate

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 64
  • Total Committers: 2
  • Avg Commits per committer: 32.0
  • Development Distribution Score (DDS): 0.047
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
dwhieb d****b@g****m 61
dependabot[bot] 4****]@u****m 3

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 54
  • Total pull requests: 23
  • Average time to close issues: 7 months
  • Average time to close pull requests: 9 months
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 0.22
  • Average comments per pull request: 0.7
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 19
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: 5 minutes
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dwhieb (52)
  • xrotwang (1)
  • eddieantonio (1)
  • RichardLitt (1)
Pull Request Authors
  • dependabot[bot] (27)
  • dwhieb (4)
Top Labels
Issue Labels
dev (18) docs (9) enhancement (8) 📄 documentation (4) ➕ enhancement (3) bug (2) 🧑‍💻 development (2) duplicate (1) 🐞 bug (1)
Pull Request Labels
dependencies (15)

Packages

  • Total packages: 1
  • Total downloads:
    • npm 14 last-month
  • Total dependent packages: 2
  • Total dependent repositories: 1
  • Total versions: 8
  • Total maintainers: 1
npmjs.org: @digitallinguistics/transliterate

A small JavaScript library for transliterating and/or sanitizing strings

  • Versions: 8
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 14 Last month
Rankings
Dependent packages count: 8.9%
Dependent repos count: 10.4%
Stargazers count: 12.3%
Forks count: 15.5%
Average: 17.5%
Downloads: 40.5%
Maintainers (1)
Last synced: 7 months ago