Lerche

Lerche: Generating data file processors in Julia from EBNF grammars - Published in JOSS (2021)

https://github.com/jamesrhester/lerche.jl

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Biology Life Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

A Julia port of the Lark parser

Basic Info
  • Host: GitHub
  • Owner: jamesrhester
  • License: mit
  • Language: Julia
  • Default Branch: master
  • Size: 393 KB
Statistics
  • Stars: 49
  • Watchers: 4
  • Forks: 4
  • Open Issues: 9
  • Releases: 15
Created about 6 years ago · Last pushed 4 months ago
Metadata Files
Readme License

README.md

DOI DOI Testing Coverage Status

Introduction

Lerche (German for Lark) is a partial port of the Lark grammar processor from Python to Julia. Lark grammars should work unchanged in Lerche.

Installation: at the Julia REPL, using Pkg; Pkg.add("Lerche")

Documentation:

Quick start

See also 'Notes for Lark users' below.

Lerche reads Lark EBNF grammars to produce a parser. This parser, when provided with text conforming to the grammar, produces a parse tree. This tree can be visited and transformed using "rules". A rule is a function named after the production whose arguments it should be called on, and the first argument of a rule is an object which is a subtype of Visitor or Transformer.

Given an EBNF grammar, it can be used to parse text into your data structure as follows: 1. Define one or more subtypes of Transformer or Visitor instances of which will be passed as the first argument to the appropriate rule. The instance can also be used to hold information during transformation if you wish, in which case it must have a concrete type. 1. Define visit_tokens(t::MyNewType) = false if you will not be processing token values. This is about 25% faster than leaving the default true. 1. For every production in your grammar that you wish to process, write a rule with identical name to the production 1. The rule should be prefixed with macro @rule if the second argument is an array containing all of the arguments to the grammar production 1. The rule should be prefixed with macro @inline_rule if the second and following arguments refer to each argument in the grammar production 1. For every token which you wish to process, define an identically-named method as for rules, but precede it with a @terminal macro instead of @rule.

If your grammar is in String variable mygrammar, your text to be parsed and transformed is in String variable mytext, and your Transformer subtype is MyTransformer, the following commands will produce a data structure from the text:

julia using Lerche p = Lark(mygrammar,parser="lalr",lexer="contextual") #create parser t = Lerche.parse(p,mytext) #Create parse tree x = Lerche.transform(MyTransformer(),t) #transform parse tree

For a real-world example of usage, see this file.

Citation

If you are publishing work where Lerche has been useful, please consider citing the Lerche paper.

Issues

Please raise any issues or problems with using Lerche in the Github issue tracker.

Contributions

Contributions of all types are welcome. Examples include: * Improvements to processing speed * Improved documentation * Links to projects using Lerche * Commenting and triaging issues

The most straightforward way to make a contribution is to fork the repository, make your changes, and create a pull request.

Notes for Lark users

Please read the Lark documentation. When converting from Lark programs written in Python to Lerche programs written in Julia, the changes outlined below are necessary.

  1. All Transformer and Visitor classes become subtypes of Transformer/Visitor
  2. All class method calls become Julia method calls with an instance of the type as the first argument (i.e. replacing self)
  3. Transformation or visitor rules should be preceded by the @rule macro. Inline rules use the @inline_rule macro and token processing methods use @terminal.
  4. The first argument of transformer and visitor rules is a variable of the desired transformer/visitor type.
  5. Any grammars containing backslash-double quote sequences need to be fixed (see below).
  6. Any grammars containing backslash-x to denote a byte value need to be fixed (see below).

Inconsistencies with Lark

  1. Earley and CYK grammars are not implemented.
  2. Dynamic lexer is not implemented.
  3. All errors with messages attached must be at the bottom of the exception type hierarchy, as these are the only types that can have contents. Thus an UnexpectedInput exception must become e.g an UnexpectedCharacter exception if a message is included.
  4. The PuppetParser invoked when there is a parse error is not yet functional
  5. There may be issues with correctly interpreting import paths to find imported grammars: please raise an issue if this happens.
  6. No choice of regex engine, Tree structure or byte/string choices are available as they make no sense for Julia.

Implementation notes and hints

Lerche is currently based off Lark 0.11.1. The priority has been on maintaining fidelity with Lark. For example, global regex flags which are integers in Lark are still integers in Lerche, which means you will need to look their values up. This may be changed to a more Julian approach in future.

The @rule and @inline_rule macros define methods of Lerche function transformer_func. Julia multiple dispatch is used to select the appropriate method at runtime. @terminal similarly defines methods of token_func.

Parsing a large (500K) file suggest Lerche is about 3 times faster than Lark with CPython for parsing. Parser generation is much slower as no optimisation techniques have been applied (yet). Calculating and storing your grammar in a Julia const variable at the top level of your package will allow it to be precompiled and thus avoid grammar re-analysis each time your package is loaded.

Owner

  • Name: James Hester
  • Login: jamesrhester
  • Kind: user

JOSS Publication

Lerche: Generating data file processors in Julia from EBNF grammars
Published
August 24, 2021
Volume 6, Issue 64, Page 3497
Authors
James R. Hester ORCID
Australian Nuclear Science and Technology Organisation, Sydney, Australia
Erez Shinan
Independent researcher
Editor
Sebastian Benthall ORCID
Tags
Julia data processing data formats EBNF

GitHub Events

Total
  • Release event: 2
  • Watch event: 2
  • Issue comment event: 6
  • Push event: 3
  • Create event: 2
Last Year
  • Release event: 2
  • Watch event: 2
  • Issue comment event: 6
  • Push event: 3
  • Create event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 119
  • Total Committers: 4
  • Avg Commits per committer: 29.75
  • Development Distribution Score (DDS): 0.286
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
James.Hester j****h@a****u 85
jamesrhester j****r@g****m 30
Venkatesh Dayananda v****h@j****m 3
GiggleLiu c****9@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 28
  • Total pull requests: 5
  • Average time to close issues: 2 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 13
  • Total pull request authors: 4
  • Average comments per issue: 4.36
  • Average comments per pull request: 1.4
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jamesrhester (10)
  • guyvdbroeck (5)
  • ziotom78 (2)
  • kskyten (2)
  • ArtHarg (1)
  • Amval (1)
  • willow-ahrens (1)
  • bilderbuchi (1)
  • stensmo (1)
  • GiggleLiu (1)
  • robertfeldt (1)
  • JuliaTagBot (1)
  • ddyok (1)
Pull Request Authors
  • GiggleLiu (2)
  • vdayanand (1)
  • erezsh (1)
  • danielskatz (1)
Top Labels
Issue Labels
enhancement (3) bug (2)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • julia 157 total
  • Total dependent packages: 8
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 47
proxy.golang.org: github.com/jamesrhester/lerche.jl
  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 4 months ago
proxy.golang.org: github.com/jamesrhester/Lerche.jl
  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 4 months ago
juliahub.com: Lerche

A Julia port of the Lark parser

  • Versions: 17
  • Dependent Packages: 8
  • Dependent Repositories: 0
  • Downloads: 157 Total
Rankings
Dependent packages count: 7.0%
Dependent repos count: 9.9%
Average: 16.0%
Stargazers count: 18.9%
Forks count: 28.1%
Last synced: 4 months ago

Dependencies

.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
  • coverallsapp/github-action master composite
  • julia-actions/julia-buildpkg latest composite
  • julia-actions/julia-runtest latest composite
  • julia-actions/setup-julia latest composite
  • julia-actions/setup-julia v1 composite