multilspy

https://github.com/jaredhancock31/multilspy

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 17 committers (5.9%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary

Keywords from Contributors

large-language-model ai4code code-analysis code-completion code-generation codegen huggingface-transformers language-server-client language-server-protocol lsp

Last synced: 8 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: jaredhancock31
License: mit
Language: Python
Default Branch: main
Size: 352 KB

Statistics

Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created 12 months ago · Last pushed 9 months ago

Metadata Files

Readme License Code of conduct Citation Security Support

Multilspy: LSP client library in Python to build applications around language servers

Introduction

This repository hosts multilspy, a library developed as part of research conducted for NeruIPS 2023 paper titled "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context" ("Guiding Language Models of Code with Global Context using Monitors" on Arxiv). The paper introduces Monitor-Guided Decoding (MGD) for code generation using Language Models, where a monitor uses static analysis to guide the decoding, ensuring that the generated code follows various correctness properties, like absence of hallucinated symbol names, valid order of method calls, etc. For further details about Monitor-Guided Decoding, please refer to the paper and GitHub repository microsoft/monitors4codegen.

multilspy is a cross-platform library designed to simplify the process of creating language server clients to query and obtain results of various static analyses from a wide variety of language servers that communicate over the Language Server Protocol. It is easily extensible to support any language that has a Language Server and we aim to continuously add support for more language servers and languages.

Language servers are tools that perform a variety of static analyses on code repositories and provide useful information such as type-directed code completion suggestions, symbol definition locations, symbol references, etc., over the Language Server Protocol (LSP). Since LSP is language-agnostic, multilspy can provide the results for static analyses of code in different languages over a common interface.

multilspy intends to ease the process of using language servers, by handling various steps in using a language server: * Automatically handling the download of platform-specific server binaries, and setup/teardown of language servers * Handling JSON-RPC based communication between the client and the server * Maintaining and passing hand-tuned server and language specific configuration parameters * Providing a simple API to the user, while executing all steps of server-specific protocol steps to execute the query/request.

Some of the analysis results that multilspy can provide are: - Finding the definition of a function or a class (textDocument/definition) - Finding the callers of a function or the instantiations of a class (textDocument/references) - Providing type-based dereference completions (textDocument/completion) - Getting information displayed when hovering over symbols, like method signature (textDocument/hover) - Getting list/tree of all symbols defined in a given file, along with symbol type like class, method, etc. (textDocument/documentSymbol) - Please create an issue/PR to add any other LSP request not listed above

Installation

It is ideal to create a new virtual environment with python>=3.10. To create a virtual environment using conda and activate it: conda create -n multilspy_env python=3.10 conda activate multilspy_env Further details and instructions on creation of Python virtual environments can be found in the official documentation. Further, we also refer users to Miniconda, as an alternative to the above steps for creation of the virtual environment.

To install multilspy using pip, execute the following command: pip install multilspy

Supported Languages

multilspy currently supports the following languages: | Code Language | Language Server | | --- | --- | | java | Eclipse JDTLS | | python | jedi-language-server | | rust | Rust Analyzer | | csharp | OmniSharp / RazorSharp | | typescript | TypeScriptLanguageServer | | javascript | TypeScriptLanguageServer | | go | gopls | | dart | Dart | | ruby | Solargraph |

Usage

Example usage: python from multilspy import SyncLanguageServer from multilspy.multilspy_config import MultilspyConfig from multilspy.multilspy_logger import MultilspyLogger ... config = MultilspyConfig.from_dict({"code_language": "java"}) # Also supports "python", "rust", "csharp", "typescript", "javascript", "go", "dart", "ruby" logger = MultilspyLogger() lsp = SyncLanguageServer.create(config, logger, "/abs/path/to/project/root/") with lsp.start_server(): result = lsp.request_definition( "relative/path/to/code_file.java", # Filename of location where request is being made 163, # line number of symbol for which request is being made 4 # column number of symbol for which request is being made ) result2 = lsp.request_completions( ... ) result3 = lsp.request_references( ... ) result4 = lsp.request_document_symbols( ... ) result5 = lsp.request_hover( ... ) ...

multilspy also provides an asyncio based API which can be used in async contexts. Example usage (asyncio): python from multilspy import LanguageServer ... lsp = LanguageServer.create(...) async with lsp.start_server(): result = await lsp.request_definition( ... ) ...

The file src/multilspy/language_server.py provides the multilspy API. Several tests for multilspy present under tests/multilspy/ provide detailed usage examples for multilspy. The tests can be executed by running: bash pytest tests/multilspy

Use of `multilspy` in AI4Code Scenarios like Monitor-Guided Decoding

multilspy provides all the features that language-server-protocol provides to IDEs like VSCode. It is useful to develop toolsets that can interface with AI systems like Large Language Models (LLM).

Monitor-Guided Decoding

One such usecase is Monitor-Guided Decoding, where multilspy is used to find results of static analyses like type-directed completions, to guide the token-by-token generation of code using an LLM, ensuring that all generated identifier/method names are valid in the context of the repository, significantly boosting the compilability of generated code. MGD also demonstrates use of multilspy to create monitors that ensure all function calls in LLM generated code receive correct number of arguments, and that functions of an object are called in the right order following a protocol (like not calling "read" before "open" on a file object).

Multilspy in other usecases

"Fix the Tests: Augmenting LLMs to Repair Test Cases with Static Collector and Neural Reranker," in 2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE)
Tutorial on obtaining python completions with multilspy
Gathering and utilizing repository-wide context for repository-level coding agents

Frequently Asked Questions (FAQ)

`asyncio` related Runtime error when executing the tests for MGD

If you get the following error: RuntimeError: Task <Task pending name='Task-2' coro=<_AsyncGeneratorContextManager.__aenter__() running at python3.8/contextlib.py:171> cb=[_chain_future.<locals>._call_set_state() at python3.8/asyncio/futures.py:367]> got Future <Future pending> attached to a different loop python3.8/asyncio/locks.py:309: RuntimeError

Please ensure that you create a new environment with Python >=3.10. For further details, please have a look at the StackOverflow Discussion.

Citing Multilspy

If you're using Multilspy in your research or applications, please cite using this BibTeX: @inproceedings{NEURIPS2023_662b1774, author = {Agrawal, Lakshya A and Kanade, Aditya and Goyal, Navin and Lahiri, Shuvendu and Rajamani, Sriram}, booktitle = {Advances in Neural Information Processing Systems}, editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine}, pages = {32270--32298}, publisher = {Curran Associates, Inc.}, title = {Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context}, url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/662b1774ba8845fc1fa3d1fc0177ceeb-Paper-Conference.pdf}, volume = {36}, year = {2023} }

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Owner

Name: Jared
Login: jaredhancock31
Kind: user
Location: Austin, TX
Company: Cisco

Website: https://www.linkedin.com/in/jared-james/
Repositories: 31
Profile: https://github.com/jaredhancock31

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Monitor-Guided Decoding of Code LMs with Static Analysis
  of Repository Context
message: >-
  If you use this repository, please cite it using the metadata
  from this file.
type: software
authors:
  - given-names: Lakshya A
    family-names: Agrawal
    email: t-lakagrawal@microsoft.com
    affiliation: Microsoft Research
    orcid: 'https://orcid.org/0000-0003-0409-8212'
  - given-names: Aditya
    family-names: Kanade
    email: kanadeaditya@microsoft.com
    affiliation: Microsoft Research
  - given-names: Navin
    family-names: Goyal
    email: navingo@microsoft.com
    affiliation: Microsoft Research
  - given-names: Shuvendu K.
    family-names: Lahiri
    email: shuvendu.lahiri@microsoft.com
    affiliation: Microsoft Research
  - given-names: Sriram K.
    family-names: Rajamani
    email: sriram@microsoft.com
    affiliation: Microsoft Research
identifiers:
  - type: doi
    value: 10.48550/arXiv.2306.10763
  - type: url
    value: >-
      https://openreview.net/forum?id=qPUbKxKvXq&noteId=98Ukj82fSP
abstract: >-
  Language models of code (LMs) work well when the
  surrounding code provides sufficient context. This is not
  true when it becomes necessary to use types, functionality
  or APIs defined elsewhere in the repository or a linked
  library, especially those not seen during training. LMs
  suffer from limited awareness of such global context and
  end up hallucinating.


  Integrated development environments (IDEs) assist
  developers in understanding repository context using
  static analysis. We extend this assistance, enjoyed by
  developers, to LMs. We propose monitor-guided decoding
  (MGD) where a monitor uses static analysis to guide the
  decoding. We construct a repository-level dataset
  PragmaticCode for method-completion in Java and evaluate
  MGD on it. On models of varying parameter scale, by
  monitoring for type-consistent object dereferences, MGD
  consistently improves compilation rates and agreement with
  ground truth. Further, LMs with fewer parameters, when
  augmented with MGD, can outperform larger LMs. With MGD,
  SantaCoder-1.1B achieves better compilation rate and
  next-identifier match than the much larger
  text-davinci-003 model.


  We also conduct a generalizability study to evaluate the
  ability of MGD to generalize to multiple programming
  languages (Java, C# and Rust), coding scenarios (e.g.,
  correct number of arguments to method calls), and to
  enforce richer semantic constraints (e.g., stateful API
  protocols). Our data and implementation are available at
  https://github.com/microsoft/monitors4codegen.
keywords:
  - program analysis
  - correctness
  - code generation
  - Language models

GitHub Events

Total

Push event: 1
Public event: 1
Pull request review comment event: 1
Pull request review event: 4
Pull request event: 1
Fork event: 1
Create event: 1

Last Year

Push event: 1
Public event: 1
Pull request review comment event: 1
Pull request review event: 4
Pull request event: 1
Fork event: 1
Create event: 1

Committers

Last synced: 8 months ago

All Time

Total Commits: 121
Total Committers: 17
Avg Commits per committer: 7.118
Development Distribution Score (DDS): 0.719

Past Year

Commits: 111
Committers: 15
Avg Commits per committer: 7.4
Development Distribution Score (DDS): 0.73

Top Committers

Name	Email	Commits
Lakshya A Agrawal	l**l@b**u	34
Steve Brudz	s**z@d**o	21
themichaelusa	m**o@i**m	17
v4rgas	6****s	14
Avi Avni	a**i@g**m	9
David Sounthiraraj	d**i@c**m	5
Microsoft Open Source	m****e	5
pratham1002	p**a@d**i	3
Jet Zhou	j**t@j**m	2
Nasser Mohamed	n**t@g**m	2
nj.jo	j**v@g**m	2
Brian Rittermann	b**m@c**m	2
Aditya Kanade	a**e@g**m	1
Jason	j**3@g**m	1
Petro Ivaniuk	1****a	1
microsoft-github-operations[bot]	5****]	1
mrT23	t**r@c**i	1

Committer Domains (Top 20 + Academic)

cisco.com: 2 codium.ai: 1 jetzhou.com: 1 devrev.ai: 1 defmethod.io: 1 berkeley.edu: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 4 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 4 hours
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

pivanua (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/codeql.yml actions

actions/checkout v3 composite
github/codeql-action/analyze v2 composite
github/codeql-action/autobuild v2 composite
github/codeql-action/init v2 composite

.github/workflows/publish-to-pypi.yaml actions

actions/checkout v4 composite
actions/download-artifact v4 composite
actions/setup-go v5 composite
actions/setup-python v5 composite
actions/upload-artifact v4 composite
pypa/gh-action-pypi-publish release/v1 composite
ruby/setup-ruby v1 composite
sigstore/gh-action-sigstore-python v3.0.0 composite

.github/workflows/test-workflow.yaml actions

actions/checkout v4 composite
actions/setup-go v5 composite
actions/setup-python v5 composite
ruby/setup-ruby v1 composite

pyproject.toml pypi

jedi-language-server ==0.41.1
psutil (>=7.0.0,<8.0.0)
requests ==2.32.3
typing-extensions >=4.2.0

requirements.txt pypi

jedi-language-server ==0.41.1
psutil >=7.0.0,<8.0.0
pytest ==7.3.1
pytest-asyncio ==0.21.1
requests ==2.32.3
typing-extensions >=4.2.0

multilspy

Science Score: 64.0%

Keywords from Contributors

Basic Info

Statistics

Metadata Files

Multilspy: LSP client library in Python to build applications around language servers

Introduction

Installation

Supported Languages

Usage

Use of multilspy in AI4Code Scenarios like Monitor-Guided Decoding

Multilspy in other usecases

Frequently Asked Questions (FAQ)

asyncio related Runtime error when executing the tests for MGD

Citing Multilspy

Contributing

Trademarks

Citation (CITATION.cff)

GitHub Events

Total

Last Year

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Use of `multilspy` in AI4Code Scenarios like Monitor-Guided Decoding

`asyncio` related Runtime error when executing the tests for MGD