hlnn-replicationpackage

The repository for the replication package of the paper "On-the-Fly Syntax Highlighting Using Neural Networks."

https://github.com/mepalma/hlnn-replicationpackage

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary

Keywords

deep-learning neural-networks regular-expressions syntax-highlighting
Last synced: 6 months ago · JSON representation

Repository

The repository for the replication package of the paper "On-the-Fly Syntax Highlighting Using Neural Networks."

Basic Info
  • Host: GitHub
  • Owner: MEPalma
  • License: other
  • Language: ANTLR
  • Default Branch: main
  • Homepage: https://hlnn.netlify.app
  • Size: 11.8 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
deep-learning neural-networks regular-expressions syntax-highlighting
Created over 3 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

On-the-Fly Syntax Highlighting Using Neural Networks

DOI CC BY 4.0 Netlify Status

This repository represents the replication package for the paper:

On-the-Fly Syntax Highlighting Using Neural Networks

With the presence of online collaborative tools for software developers, source code is shared and consulted frequently, from code viewers to merge requests and code snippets. Typically, code highlighting quality in such scenarios is sacrificed in favor of system responsiveness. In these on-the-fly settings, performing a formal grammatical analysis of the source code is expensive and intractable for the many times the input is an invalid derivation of the language. Indeed, current popular highlighters heavily rely on a system of regular expressions, typically far from the specification of the language's lexer. Due to their complexity, regular expressions need to be periodically updated as more feedback is collected from the users and their design unwelcome the detection of more complex language formations. This paper delivers a deep learning-based approach suitable for on-the-fly grammatical code highlighting of correct and incorrect language derivations, such as code viewers and snippets. It focuses on alleviating the burden on the developers, who can reuse the language's parsing strategy to produce the desired highlighting specification. Moreover, this approach is compared to nowadays online syntax highlighting tools and formal methods in terms of accuracy and execution time, across different levels of grammatical coverage, for three mainstream programming languages. The results obtained show how the proposed approach can consistently achieve near-perfect accuracy in its predictions, thereby outperforming regular expression-based strategies.

The paper is published in the proceeding of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

In this replication package, we provide all the data and scripts we used in our study.

:openfilefolder: Organization

The repository is organized as follows:

  • src/ contains the scripts to build and execute the models
  • docs/ contains the source code for the website where we documented our scripts

Additional resources can be found at: https://doi.org/10.5281/zenodo.6949491. In particular:

  • The input data that was used for the study
  • The obtained detailed results
  • The resources to replicate this study by using this repository's code. The input data is formatted to be compatible with the provided code. The content of the file HLNN-Resources.zip has to be extracted in the folder src/main/resources

:books: How to cite this dataset

If you would like to cite the dataset, please use the following BibTeX snippet:

bibtex @article{palma_onthefly_2022, author = {Palma, Marco Edoardo and Salza, Pasquale and Gall, Harald C.}, title = {{On-the-Fly Syntax Highlighting Using Neural Networks}}, journal = {ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)}, year = {2022}, doi = {10.1145/3540250.3549109} }

:balance_scale: License

This replication package is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License. Please see the LICENSE file for full details.

:pray: Credits

Owner

  • Name: MEPalma
  • Login: MEPalma
  • Kind: user
  • Location: Zurich, CH

GitHub Events

Total
Last Year

Dependencies

environment.yml conda
  • _ipyw_jlab_nb_ext_conf 0.1.0.*
  • _libgcc_mutex 0.1.*
  • _openmp_mutex 4.5.*
  • anaconda-client 1.9.0.*
  • anaconda-navigator 2.1.1.*
  • anyio 2.2.0.*
  • argon2-cffi 20.1.0.*
  • async_generator 1.10.*
  • attrs 21.2.0.*
  • babel 2.9.1.*
  • backcall 0.2.0.*
  • backports 1.0.*
  • backports.functools_lru_cache 1.6.4.*
  • backports.tempfile 1.0.*
  • backports.weakref 1.0.post1.*
  • beautifulsoup4 4.10.0.*
  • blas 1.0.*
  • bleach 4.0.0.*
  • brotlipy 0.7.0.*
  • bzip2 1.0.8.*
  • ca-certificates 2021.10.26.*
  • certifi 2021.10.8.*
  • cffi 1.14.6.*
  • chardet 4.0.0.*
  • charset-normalizer 2.0.4.*
  • click 8.0.3.*
  • clyent 1.2.2.*
  • conda 4.10.3.*
  • conda-build 3.21.5.*
  • conda-content-trust 0.1.1.*
  • conda-env 2.6.0.*
  • conda-package-handling 1.7.3.*
  • conda-repo-cli 1.0.4.*
  • conda-token 0.3.0.*
  • conda-verify 3.4.2.*
  • cryptography 3.4.8.*
  • cudatoolkit 10.2.89.*
  • dbus 1.13.18.*
  • debugpy 1.5.1.*
  • decorator 5.1.0.*
  • defusedxml 0.7.1.*
  • entrypoints 0.3.*
  • expat 2.4.1.*
  • ffmpeg 4.3.*
  • filelock 3.3.1.*
  • fontconfig 2.13.1.*
  • freetype 2.11.0.*
  • future 0.18.2.*
  • giflib 5.2.1.*
  • glib 2.69.1.*
  • glob2 0.7.*
  • gmp 6.2.1.*
  • gnutls 3.6.15.*
  • gst-plugins-base 1.14.0.*
  • gstreamer 1.14.0.*
  • icu 58.2.*
  • idna 3.3.*
  • importlib-metadata 4.8.1.*
  • importlib_metadata 4.8.1.*
  • intel-openmp 2021.4.0.*
  • ipykernel 6.4.1.*
  • ipython 7.29.0.*
  • ipython_genutils 0.2.0.*
  • ipywidgets 7.6.5.*
  • jedi 0.18.0.*
  • jinja2 2.11.3.*
  • joblib 1.1.0.*
  • jpeg 9d.*
  • json5 0.9.6.*
  • jsonschema 3.2.0.*
  • jupyter_client 7.0.6.*
  • jupyter_core 4.9.1.*
  • jupyter_server 1.4.1.*
  • jupyterlab 3.2.1.*
  • jupyterlab_pygments 0.1.2.*
  • jupyterlab_server 2.8.2.*
  • jupyterlab_widgets 1.0.0.*
  • lame 3.100.*
  • lcms2 2.12.*
  • ld_impl_linux-64 2.35.1.*
  • libarchive 3.4.2.*
  • libffi 3.3.*
  • libgcc-ng 9.3.0.*
  • libgfortran-ng 7.5.0.*
  • libgfortran4 7.5.0.*
  • libgomp 9.3.0.*
  • libiconv 1.15.*
  • libidn2 2.3.2.*
  • liblief 0.10.1.*
  • libpng 1.6.37.*
  • libsodium 1.0.18.*
  • libstdcxx-ng 9.3.0.*
  • libtasn1 4.16.0.*
  • libtiff 4.2.0.*
  • libunistring 0.9.10.*
  • libuuid 1.0.3.*
  • libuv 1.40.0.*
  • libwebp 1.2.0.*
  • libwebp-base 1.2.0.*
  • libxcb 1.14.*
  • libxml2 2.9.12.*
  • lz4-c 1.9.3.*
  • markupsafe 2.0.1.*
  • matplotlib-inline 0.1.2.*
  • mistune 0.8.4.*
  • mkl 2021.4.0.*
  • mkl-service 2.4.0.*
  • mkl_fft 1.3.1.*
  • mkl_random 1.2.2.*
  • navigator-updater 0.2.1.*
  • nbclassic 0.2.6.*
  • nbclient 0.5.3.*
  • nbconvert 6.1.0.*
  • nbformat 5.1.3.*
  • ncurses 6.3.*
  • nest-asyncio 1.5.1.*
  • nettle 3.7.3.*
  • notebook 6.4.6.*
  • numpy 1.21.2.*
  • numpy-base 1.21.2.*
  • olefile 0.46.*
  • openh264 2.1.0.*
  • openssl 1.1.1l.*
  • packaging 21.3.*
  • pandocfilters 1.4.3.*
  • parso 0.8.2.*
  • patchelf 0.13.*
  • pcre 8.45.*
  • pexpect 4.8.0.*
  • pickleshare 0.7.5.*
  • pillow 8.4.0.*
  • pip 21.2.4.*
  • pkginfo 1.7.1.*
  • prometheus_client 0.12.0.*
  • prompt-toolkit 3.0.20.*
  • psutil 5.8.0.*
  • ptyprocess 0.7.0.*
  • py-lief 0.10.1.*
  • pycosat 0.6.3.*
  • pycparser 2.21.*
  • pygments 2.10.0.*
  • pyjwt 2.1.0.*
  • pyopenssl 21.0.0.*
  • pyparsing 3.0.4.*
  • pyqt 5.9.2.*
  • pyrsistent 0.18.0.*
  • pysocks 1.7.1.*
  • python 3.9.7.*
  • python-dateutil 2.8.2.*
  • python-libarchive-c 2.9.*
  • pytorch 1.10.0.*
  • pytorch-mutex 1.0.*
  • pytz 2021.3.*
  • pyyaml 6.0.*
  • pyzmq 22.3.0.*
  • qt 5.9.7.*
  • qtpy 1.10.0.*
  • readline 8.1.*
  • requests 2.26.0.*
  • ripgrep 12.1.1.*
  • ruamel_yaml 0.15.100.*
  • scikit-learn 1.0.1.*
  • scipy 1.7.1.*
  • send2trash 1.8.0.*
  • setuptools 58.0.4.*
  • sip 4.19.13.*
  • six 1.16.0.*
  • sniffio 1.2.0.*
  • soupsieve 2.3.1.*
  • sqlite 3.36.0.*
  • terminado 0.9.4.*
  • testpath 0.5.0.*
  • threadpoolctl 2.2.0.*
  • tk 8.6.11.*
  • torchaudio 0.10.0.*
  • torchvision 0.11.1.*
  • tornado 6.1.*
  • tqdm 4.62.3.*
  • traitlets 5.1.1.*
  • typing_extensions 3.10.0.2.*
  • tzdata 2021e.*
  • ujson 4.0.2.*
  • urllib3 1.26.7.*
  • wcwidth 0.2.5.*
  • webencodings 0.5.1.*
  • wheel 0.37.0.*
  • widgetsnbextension 3.5.1.*
  • xmltodict 0.12.0.*
  • xz 5.2.5.*
  • yaml 0.2.5.*
  • zeromq 4.3.4.*
  • zipp 3.6.0.*
  • zlib 1.2.11.*
  • zstd 1.4.9.*
build.gradle maven
  • org.jetbrains.kotlinx:kotlinx-coroutines-core 1.5.1-native-mt implementation
  • org.junit.jupiter:junit-jupiter-api 5.7.2 testImplementation
  • org.junit.jupiter:junit-jupiter-engine 5.7.2 testRuntimeOnly