magiclex

A magical Lexer maker for C#.

https://github.com/dadarkwizard/magiclex

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A magical Lexer maker for C#.

Basic Info
  • Host: GitHub
  • Owner: DaDarkWizard
  • Language: C#
  • Default Branch: master
  • Size: 39.1 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

readme.md

About

This is a Scanner generator that works along the lines of flex, a popular C++ parser generator.

An example project is included to demonstrate the current capabilities. Some of the files need to be rearranged for better organization, but the basics are there.

Regular Expressions

Regexes currently have a rudamentary implementation. All possibilities are as follows:

*: Kleene star. Zero or more of the preceeding letters. It is best to wrap this in parentheses, as expressions like a(ba)* will allow '' and force an 'aba' pattern.

(): Parentheses. An expression in parentheses will be evaluated on its own - it's a recursive algorithm. When in doubt, use more.

[*-*]: A macro for adding a set of characters. Uses the character value to check whether it is between first and second, for example [a-z] will allow all lowercase letters.

|: Or. Will allow the entire left half of the expression or the entire right half, so it should be used with parentheses - a(a|b) allows aa or ab

Definitions

There is an example of definitions included in the project. All text within a usings bracket will be placed at the top of the generated code. Return should give the return type of the expression - default is int. Error should give the error value - default is -1. Endtype should give the end of file value - default is 0.

Between begin and and are the expressions. All regex should be placed between brackets, followed by a colon and curly brackets containing your code. This code is place verbatim into the generated C# as a lambda expression. In the future I will include an example that shows how to get the value of the parsed string. ``` %usings %{ using TestProject; %}

%return {Types} %errortype {Types.Error} %endtype {Types.End}

%begin

[int] : { return Types.INT; } [([1-9])([0-9])] : { return Types.Num; } ["((([#-Z^-~])|(\"|\r|\n|\t|[|])))"] : { return Types.STRING; } [([a-zA-Z])(([a-zA-Z0-9]))] : { return Types.IDENTIFIER; } [\w(\w)] : { return Lex(); }

%end ```

Owner

  • Name: DaDarkWizard
  • Login: DaDarkWizard
  • Kind: user
  • Company: Maclean Fogg

Programming aficionado, technophile, and keyboard-wielding mage.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: MagicLex
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Daniel
    family-names: Masker
    email: dtmasker@mtu.edu
    affiliation: Michigan Technological University
    orcid: 'https://orcid.org/0009-0009-3662-8354'
repository-code: 'https://github.com/DaDarkWizard/MagicLex'
abstract: >-
  MagicLex is a self-contained lexer generator for C#. In
  any computer-based consumption of human-readable text, a
  lexer plays a key role. A lexer is a program that uses
  regular expressions to tokenize text. A lexer generator is
  a program that, given a domain-defined specification for a
  lexer, generates that lexer.

  Flex is one such example of a lexer generator, built for
  C++. What sets Flex apart from other lexer generators is
  how it generates a self-contained lexer. In other words,
  once the lexer has been generated flex is no longer
  needed.

  While there are many lexer generators available for C\#,
  there are none that generate a self-contained lexer. They
  all require the library used to generate the lexer to be
  available in the final project. Using a library may be
  difficult in scenarios where the project is offline.
  MagicLex seeks to fill this gap.

  MagicLex takes a specification similar to that of Flex.
  After parsing through any configuration options, it
  converts the provided RegEx into a minimized DFA, then
  embeds the DFA and matching code into a source file. Using
  language features available in C\#, it's able to create a
  single source file without having a confusing or polluted
  namespace. The resulting source file is also readable
  enough that modifications can be made to token matching
  code without regenerating the file.

  While MagicLex is intended for use as a tokenizer, the
  freedom provided in the specification allows any type of
  complex object or tuple to be returned on a match.
keywords:
  - lexer
  - lexer generator
  - computer science
  - programming languages
  - finite state automota
license: MIT
date-released: '2021-12-15'

GitHub Events

Total
Last Year

Dependencies

LanguageProcessing/LanguageProcessing.csproj nuget
  • Newtonsoft.Json 13.0.1
TestProject/TestProject.csproj nuget