Graph Transliterator

Graph Transliterator: A graph-based transliteration tool - Published in JOSS (2019)

https://github.com/seanpue/graphtransliterator

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Biology Life Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

A graph-based transliteration tool

Basic Info
  • Host: GitHub
  • Owner: seanpue
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 58.4 MB
Statistics
  • Stars: 8
  • Watchers: 1
  • Forks: 3
  • Open Issues: 4
  • Releases: 3
Created over 6 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Authors

README.rst

====================
Graph Transliterator
====================

.. image:: https://img.shields.io/pypi/v/graphtransliterator.svg
      :target: https://pypi.python.org/pypi/graphtransliterator
      :alt: PyPi Version

.. image:: https://readthedocs.org/projects/graphtransliterator/badge/?version=latest
      :target: https://graphtransliterator.readthedocs.io/en/latest/?badge=latest
      :alt: Documentation Status

.. image:: https://pyup.io/repos/github/seanpue/graphtransliterator/shield.svg
     :target: https://pyup.io/repos/github/seanpue/graphtransliterator/
     :alt: PyUp Updates

.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
     :target: https://github.com/ambv/black
     :alt: Code Style: Black

.. image:: https://img.shields.io/pypi/pyversions/graphtransliterator
     :alt: PyPI - Python Version

.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3558365.svg
     :target: https://doi.org/10.5281/zenodo.3558365
     :alt: Software repository DOI

.. image:: https://joss.theoj.org/papers/10.21105/joss.01717/status.svg
     :target: https://doi.org/10.21105/joss.01717
     :alt: Paper DOI

A graph-based transliteration tool that lets you convert the symbols of one
language or script to those of another using rules that you define.

* Free software: MIT license
* Documentation: https://graphtransliterator.readthedocs.io
* Repository: https://github.com/seanpue/graphtransliterator

Transliteration... What? Why?
-----------------------------

Moving text or data from one script or encoding to another is a common problem:

- Many languages are written in multiple scripts, and many people can only read one of
  them. Moving between them can be a complex but necessary task in order to make
  texts accessible.

- The identification of names and locations, as well as machine translation,
  benefit from transliteration.

- Library systems often require metadata be in particular forms of romanization in
  addition to the original script.

- Linguists need to move between different methods of phonetic transcription.

- Documents in legacy fonts must now be converted to contemporary Unicode ones.

- Complex-script languages are frequently approached in natural language processing and
  in digital humanities research through transliteration, as it provides disambiguating
  information about pronunciation, morphological boundaries, and unwritten elements not
  present in the original script.

Graph Transliterator abstracts transliteration, offering an "easy reading" method for
developing transliterators that does not require writing a complex program. It also
contains bundled transliterators that are rigorously tested. These can be expanded to
handle many transliteration tasks.

Contributions are very welcome!


Features
--------

* Provides a transliteration tool that can be configured to convert the tokens
  of an input string into an output string using:

  * user-defined types of input **tokens** and **token classes**
  * **transliteration rules** based on:

    * a sequence of input tokens
    * specific input tokens that precede or follow the token sequence
    * classes of input tokens preceding or following specified tokens

  * **"on match" rules** for output to be inserted between transliteration
    rules involving particular token classes
  * defined rules for **whitespace**, including its optional consolidation

* Can be setup using:

  * an **"easy reading"** `YAML `_ format that lets you
    quickly craft settings for the transliteration tool
  * a `JSON `_ dump of a transliterator (quicker!)
  * **"direct"** settings, perhaps passed programmatically, using a dictionary

* **Automatically orders rules** by the number of tokens in a
  transliteration rule
* **Checks for ambiguity** in transliteration rules
* Can provide **details** about each transliteration rule match
* Allows **optional matching of all possible rules** in a particular location
* Permits **pruning of rules** with certain productions
* **Validates**, as well as **serializes** to and **deserializes** from JSON
  and Python data types, using accessible
  `marshmallow `_ schemas
* Provides **full support for Unicode**, including Unicode **character names**
  in the "easy reading" YAML format
* Constructs and uses a **directed tree** and performs a **best-first search**
  to find the most specific transliteration rule in a given context
* Includes **bundled transliterators** that *you* can add to
  hat check for full test coverage of the nodes and edges of the internal graph and any
  "on match" rules
* Includes a command-line interface to perform transliteration and other tasks

Sample Code and Graph
---------------------

.. code-block:: python

  from graphtransliterator import GraphTransliterator
  GraphTransliterator.from_yaml("""
      tokens:
        h: [consonant]
        i: [vowel]
        " ": [whitespace]
      rules:
        h: \N{LATIN SMALL LETTER TURNED I}
        i: \N{LATIN SMALL LETTER TURNED H}
         i: \N{LATIN CAPITAL LETTER TURNED H}
        ( h) i: \N{LATIN SMALL LETTER TURNED H}!
      onmatch_rules:
        -  + : ¡
      whitespace:
        default: " "
        consolidate: true
        token_class: whitespace
      metadata:
        title: "Upside Down Greeting Transliterator"
        version: "1.0.0"
  """).transliterate("hi")

.. code-block:: python

    '¡ᴉɥ!'

.. figure:: https://raw.githubusercontent.com/seanpue/graphtransliterator/master/docs/_static/sample_graph.png
   :alt: sample graph

   Sample directed tree created by Graph Transliterator. The `rule` nodes are in double
   circles, and `token` nodes  are single circles. The numbers are the cost of the
   particular edge, and less costly edges are searched first. Previous token classes
   and previous tokens that must be present are found as constraints on the edges
   incident to the terminal leaf `rule` nodes.


Get It Now
==========

.. code-block:: bash

   $ pip install -U graphtransliterator

Citation
========

To cite Graph Transliterator, please use:

    Pue, A. Sean (2019). Graph Transliterator: A graph-based transliteration tool.
    Journal of Open Source Software, 4(44), 1717, https://doi.org/10.21105/joss.01717

Owner

  • Name: A. Sean Pue
  • Login: seanpue
  • Kind: user
  • Location: East Lansing, MI USA
  • Company: Michigan State University

JOSS Publication

Graph Transliterator: A graph-based transliteration tool
Published
December 01, 2019
Volume 4, Issue 44, Page 1717
Authors
A. Sean Pue ORCID
Linguistics and Germanic, Slavic, Asian, and African Languages, Michigan State University
Editor
George K. Thiruvathukal ORCID
Tags
transliteration language graph

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 360
  • Total Committers: 3
  • Avg Commits per committer: 120.0
  • Development Distribution Score (DDS): 0.133
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
seanpue p****e@m****u 312
pyup-bot g****t@p****o 46
vc1492a v****a@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 1
  • Total pull requests: 144
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 months
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.89
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • klinga (1)
Pull Request Authors
  • pyup-bot (130)
  • dependabot[bot] (4)
Top Labels
Issue Labels
Pull Request Labels
dependencies (4)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 65 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 39
  • Total maintainers: 1
pypi.org: graphtransliterator

A graph-based transliteration tool

  • Versions: 39
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 65 Last month
Rankings
Dependent packages count: 10.0%
Forks count: 16.9%
Average: 17.5%
Stargazers count: 19.3%
Downloads: 19.5%
Dependent repos count: 21.7%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/main.yaml actions
  • ./.github/actions/setup-poetry-env * composite
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
  • snok/install-poetry v1 composite
pyproject.toml pypi
  • PyYAML ^6.0.1
  • click ^8.1.7
  • marshmallow ^3.20.1
  • python ^3.9
.github/actions/setup-poetry-env/action.yml actions
  • actions/cache v3 composite
  • actions/setup-python v4 composite
  • snok/install-poetry v1 composite