Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 17 committers (5.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords from Contributors

large-language-model ai4code code-analysis code-completion code-generation codegen huggingface-transformers language-server-client language-server-protocol lsp
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: jaredhancock31
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 352 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 8 months ago
Metadata Files
Readme License Code of conduct Citation Security Support

README.md

PyPI - Version

Multilspy: LSP client library in Python to build applications around language servers

Introduction

This repository hosts multilspy, a library developed as part of research conducted for NeruIPS 2023 paper titled "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" ("Guiding Language Models of Code with Global Context using Monitors" on Arxiv). The paper introduces Monitor-Guided Decoding (MGD) for code generation using Language Models, where a monitor uses static analysis to guide the decoding, ensuring that the generated code follows various correctness properties, like absence of hallucinated symbol names, valid order of method calls, etc. For further details about Monitor-Guided Decoding, please refer to the paper and GitHub repository microsoft/monitors4codegen.

multilspy is a cross-platform library designed to simplify the process of creating language server clients to query and obtain results of various static analyses from a wide variety of language servers that communicate over the Language Server Protocol. It is easily extensible to support any language that has a Language Server and we aim to continuously add support for more language servers and languages.

Language servers are tools that perform a variety of static analyses on code repositories and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc., over the Language Server Protocol (LSP). Since LSP is language-agnostic, multilspy can provide the results for static analyses of code in different languages over a common interface.

multilspy intends to ease the process of using language servers, by handling various steps in using a language server: * Automatically handling the download of platform-specific server binaries, and setup/teardown of language servers * Handling JSON-RPC based communication between the client and the server * Maintaining and passing hand-tuned server and language specific configuration parameters * Providing a simple API to the user, while executing all steps of server-specific protocol steps to execute the query/request.

Some of the analysis results that multilspy can provide are: - Finding the definition of a function or a class (textDocument/definition) - Finding the callers of a function or the instantiations of a class (textDocument/references) - Providing type-based dereference completions (textDocument/completion) - Getting information displayed when hovering over symbols, like method signature (textDocument/hover) - Getting list/tree of all symbols defined in a given file, along with symbol type like class, method, etc. (textDocument/documentSymbol) - Please create an issue/PR to add any other LSP request not listed above

Installation

It is ideal to create a new virtual environment with python>=3.10. To create a virtual environment using conda and activate it: conda create -n multilspy_env python=3.10 conda activate multilspy_env Further details and instructions on creation of Python virtual environments can be found in the official documentation. Further, we also refer users to Miniconda, as an alternative to the above steps for creation of the virtual environment.

To install multilspy using pip, execute the following command: pip install multilspy

Supported Languages

multilspy currently supports the following languages: | Code Language | Language Server | | --- | --- | | java | Eclipse JDTLS | | python | jedi-language-server | | rust | Rust Analyzer | | csharp | OmniSharp / RazorSharp | | typescript | TypeScriptLanguageServer | | javascript | TypeScriptLanguageServer | | go | gopls | | dart | Dart | | ruby | Solargraph |

Usage

Example usage: python from multilspy import SyncLanguageServer from multilspy.multilspy_config import MultilspyConfig from multilspy.multilspy_logger import MultilspyLogger ... config = MultilspyConfig.from_dict({"code_language": "java"}) # Also supports "python", "rust", "csharp", "typescript", "javascript", "go", "dart", "ruby" logger = MultilspyLogger() lsp = SyncLanguageServer.create(config, logger, "/abs/path/to/project/root/") with lsp.start_server(): result = lsp.request_definition( "relative/path/to/code_file.java", # Filename of location where request is being made 163, # line number of symbol for which request is being made 4 # column number of symbol for which request is being made ) result2 = lsp.request_completions( ... ) result3 = lsp.request_references( ... ) result4 = lsp.request_document_symbols( ... ) result5 = lsp.request_hover( ... ) ...

multilspy also provides an asyncio based API which can be used in async contexts. Example usage (asyncio): python from multilspy import LanguageServer ... lsp = LanguageServer.create(...) async with lsp.start_server(): result = await lsp.request_definition( ... ) ...

The file src/multilspy/language_server.py provides the multilspy API. Several tests for multilspy present under tests/multilspy/ provide detailed usage examples for multilspy. The tests can be executed by running: bash pytest tests/multilspy

Use of multilspy in AI4Code Scenarios like Monitor-Guided Decoding

multilspy provides all the features that language-server-protocol provides to IDEs like VSCode. It is useful to develop toolsets that can interface with AI systems like Large Language Models (LLM).

Monitor-Guided Decoding

One such usecase is Monitor-Guided Decoding, where multilspy is used to find results of static analyses like type-directed completions, to guide the token-by-token generation of code using an LLM, ensuring that all generated identifier/method names are valid in the context of the repository, significantly boosting the compilability of generated code. MGD also demonstrates use of multilspy to create monitors that ensure all function calls in LLM generated code receive correct number of arguments, and that functions of an object are called in the right order following a protocol (like not calling "read" before "open" on a file object).

Multilspy in other usecases

Frequently Asked Questions (FAQ)

asyncio related Runtime error when executing the tests for MGD

If you get the following error: RuntimeError: Task <Task pending name='Task-2' coro=<_AsyncGeneratorContextManager.__aenter__() running at python3.8/contextlib.py:171> cb=[_chain_future.<locals>._call_set_state() at python3.8/asyncio/futures.py:367]> got Future <Future pending> attached to a different loop python3.8/asyncio/locks.py:309: RuntimeError

Please ensure that you create a new environment with Python >=3.10. For further details, please have a look at the StackOverflow Discussion.

Citing Multilspy

If you're using Multilspy in your research or applications, please cite using this BibTeX: @inproceedings{NEURIPS2023_662b1774, author = {Agrawal, Lakshya A and Kanade, Aditya and Goyal, Navin and Lahiri, Shuvendu and Rajamani, Sriram}, booktitle = {Advances in Neural Information Processing Systems}, editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine}, pages = {32270--32298}, publisher = {Curran Associates, Inc.}, title = {Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context}, url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/662b1774ba8845fc1fa3d1fc0177ceeb-Paper-Conference.pdf}, volume = {36}, year = {2023} }

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner

  • Name: Jared
  • Login: jaredhancock31
  • Kind: user
  • Location: Austin, TX
  • Company: Cisco

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Monitor-Guided Decoding of Code LMs with Static Analysis
  of Repository Context
message: >-
  If you use this repository, please cite it using the metadata
  from this file.
type: software
authors:
  - given-names: Lakshya A
    family-names: Agrawal
    email: t-lakagrawal@microsoft.com
    affiliation: Microsoft Research
    orcid: 'https://orcid.org/0000-0003-0409-8212'
  - given-names: Aditya
    family-names: Kanade
    email: kanadeaditya@microsoft.com
    affiliation: Microsoft Research
  - given-names: Navin
    family-names: Goyal
    email: navingo@microsoft.com
    affiliation: Microsoft Research
  - given-names: Shuvendu K.
    family-names: Lahiri
    email: shuvendu.lahiri@microsoft.com
    affiliation: Microsoft Research
  - given-names: Sriram K.
    family-names: Rajamani
    email: sriram@microsoft.com
    affiliation: Microsoft Research
identifiers:
  - type: doi
    value: 10.48550/arXiv.2306.10763
  - type: url
    value: >-
      https://openreview.net/forum?id=qPUbKxKvXq&noteId=98Ukj82fSP
abstract: >-
  Language models of code (LMs) work well when the
  surrounding code provides sufficient context. This is not
  true when it becomes necessary to use types, functionality
  or APIs defined elsewhere in the repository or a linked
  library, especially those not seen during training. LMs
  suffer from limited awareness of such global context and
  end up hallucinating.


  Integrated development environments (IDEs) assist
  developers in understanding repository context using
  static analysis. We extend this assistance, enjoyed by
  developers, to LMs. We propose monitor-guided decoding
  (MGD) where a monitor uses static analysis to guide the
  decoding. We construct a repository-level dataset
  PragmaticCode for method-completion in Java and evaluate
  MGD on it. On models of varying parameter scale, by
  monitoring for type-consistent object dereferences, MGD
  consistently improves compilation rates and agreement with
  ground truth. Further, LMs with fewer parameters, when
  augmented with MGD, can outperform larger LMs. With MGD,
  SantaCoder-1.1B achieves better compilation rate and
  next-identifier match than the much larger
  text-davinci-003 model.


  We also conduct a generalizability study to evaluate the
  ability of MGD to generalize to multiple programming
  languages (Java, C# and Rust), coding scenarios (e.g.,
  correct number of arguments to method calls), and to
  enforce richer semantic constraints (e.g., stateful API
  protocols). Our data and implementation are available at
  https://github.com/microsoft/monitors4codegen.
keywords:
  - program analysis
  - correctness
  - code generation
  - Language models

GitHub Events

Total
  • Push event: 1
  • Public event: 1
  • Pull request review comment event: 1
  • Pull request review event: 4
  • Pull request event: 1
  • Fork event: 1
  • Create event: 1
Last Year
  • Push event: 1
  • Public event: 1
  • Pull request review comment event: 1
  • Pull request review event: 4
  • Pull request event: 1
  • Fork event: 1
  • Create event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 121
  • Total Committers: 17
  • Avg Commits per committer: 7.118
  • Development Distribution Score (DDS): 0.719
Past Year
  • Commits: 111
  • Committers: 15
  • Avg Commits per committer: 7.4
  • Development Distribution Score (DDS): 0.73
Top Committers
Name Email Commits
Lakshya A Agrawal l****l@b****u 34
Steve Brudz s****z@d****o 21
themichaelusa m****o@i****m 17
v4rgas 6****s 14
Avi Avni a****i@g****m 9
David Sounthiraraj d****i@c****m 5
Microsoft Open Source m****e 5
pratham1002 p****a@d****i 3
Jet Zhou j****t@j****m 2
Nasser Mohamed n****t@g****m 2
nj.jo j****v@g****m 2
Brian Rittermann b****m@c****m 2
Aditya Kanade a****e@g****m 1
Jason j****3@g****m 1
Petro Ivaniuk 1****a 1
microsoft-github-operations[bot] 5****] 1
mrT23 t****r@c****i 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 4 hours
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 4 hours
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • pivanua (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/publish-to-pypi.yaml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-go v5 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • ruby/setup-ruby v1 composite
  • sigstore/gh-action-sigstore-python v3.0.0 composite
.github/workflows/test-workflow.yaml actions
  • actions/checkout v4 composite
  • actions/setup-go v5 composite
  • actions/setup-python v5 composite
  • ruby/setup-ruby v1 composite
pyproject.toml pypi
  • jedi-language-server ==0.41.1
  • psutil (>=7.0.0,<8.0.0)
  • requests ==2.32.3
  • typing-extensions >=4.2.0
requirements.txt pypi
  • jedi-language-server ==0.41.1
  • psutil >=7.0.0,<8.0.0
  • pytest ==7.3.1
  • pytest-asyncio ==0.21.1
  • requests ==2.32.3
  • typing-extensions >=4.2.0