Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: stuartpearce-hg
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 28.3 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

🤔 What is this, how does it work?

This tooling aims to enhance the quality and accuracy of responses Gen AI LLMs provide when asked to support software engineers refactoring enterprise codebases. Expanding the context made available to the LLM through use of retrieval augmented generation to select code most relevant to an engineer's prompt.

These tools utilise LangChain (https://python.langchain.com/docs/get_started/introduction) source code can be indexed for use with a range of LLMs. You are encouraged to adapt and enhance tools for specific use cases and where possible contribute back enhancements to the community. Please refer to LICENSE file for details of the MIT license in use

Usage

Set configuration values
Rename .env.example to .env and populate with configuration values for your LLM provider. Hg currently reccomend use of Azure OpenAI Services
OPENAIAPIBASE = URL to your LLM account
OPENAIAPIKEY = Your API Key
OPENAIAPITYPE = "azure"
OPENAIAPIVERSION = Service API version to be used
OPENAIAPIDEPLOYMENTNAME = Name of your deployed LLM
OPENAI
APIEMBEDDINGSNAME = Name of your deployed embedding engine
REPOSITORY_DIRECTORY = Path to source code for analysis

Dependency setup
With pip: pip install openai -r requirements.txt

Build vecotr database of code
Edit line 37 of build.py to refernce file extensions relevant to the codebase e.g. for C# set suffixes=['.cs', '.csproj', '.sln'] Filtering greatly improves performance by eliminating content thats not relevant to queries from the model context Run build.py

Query the code
Uncomment lines 51-55 of query.py and adjust the prompt to provide base context of the nature of code and application working with. This can help focus the range of suggestions made by LLM. run query.py pairing with LLM to interogate codebase and options to enhance prompt "quit" when finished

Large codebases can suffer degraded performance when too much or too little context is made available to the LLM alongside prompt inputs. Adjusting the code retrieval parameters on line 41 searchkwargs={"k": 20, "fetchk": 50} can improve LLM performance by ensuring sufficient context is provided but not too much to include unrelated code.

Agent tooling is under development This tooling Uses CrewAI to orchestrate agent based resolution of tasks within a defined process Agents can utilise tools which require external setup for Github and Jira

See https://python.langchain.com/docs/integrations/toolkits/github and https://python.langchain.com/docs/integrations/toolkits/jira for setup instructions

Owner

  • Login: stuartpearce-hg
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Chase"
  given-names: "Harrison"
title: "LangChain"
date-released: 2022-10-17
url: "https://github.com/langchain-ai/langchain"

GitHub Events

Total
  • Issue comment event: 4
  • Push event: 15
  • Pull request review event: 1
  • Pull request review comment event: 2
  • Pull request event: 3
  • Create event: 3
Last Year
  • Issue comment event: 4
  • Push event: 15
  • Pull request review event: 1
  • Pull request review comment event: 2
  • Pull request event: 3
  • Create event: 3

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 15
  • Average time to close issues: N/A
  • Average time to close pull requests: 6 days
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.2
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 8
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
  • stuartpearce-hg (13)
  • dependabot[bot] (11)
  • devin-ai-integration[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (11)

Dependencies

requirements.txt pypi
  • PyPika ==0.48.9
  • PyYAML ==6.0.1
  • SQLAlchemy ==2.0.21
  • aiohttp ==3.9.0
  • aiosignal ==1.3.1
  • annotated-types ==0.5.0
  • anyio ==3.7.1
  • async-timeout ==4.0.3
  • attrs ==23.1.0
  • backoff ==2.2.1
  • bcrypt ==4.0.1
  • beautifulsoup4 ==4.12.2
  • certifi ==2023.7.22
  • chardet ==5.2.0
  • charset-normalizer ==3.3.0
  • chroma-hnswlib ==0.7.3
  • chromadb ==0.4.13
  • click ==8.1.7
  • coloredlogs ==15.0.1
  • dataclasses-json ==0.6.1
  • emoji ==2.8.0
  • esprima ==4.0.1
  • exceptiongroup ==1.1.3
  • faiss-cpu ==1.7.4
  • fastapi ==0.103.2
  • filelock ==3.12.4
  • filetype ==1.2.0
  • flatbuffers ==23.5.26
  • frozenlist ==1.4.0
  • fsspec ==2023.9.2
  • greenlet ==3.0.0
  • h11 ==0.14.0
  • httptools ==0.6.0
  • huggingface-hub ==0.16.4
  • humanfriendly ==10.0
  • idna ==3.4
  • importlib-resources ==6.1.0
  • iniconfig ==2.0.0
  • joblib ==1.3.2
  • jsonpatch ==1.33
  • jsonpointer ==2.4
  • langchain ==0.0.329
  • langdetect ==1.0.9
  • langsmith ==0.0.42
  • lxml ==4.9.3
  • marshmallow ==3.20.1
  • monotonic ==1.6
  • mpmath ==1.3.0
  • multidict ==6.0.4
  • mypy-extensions ==1.0.0
  • nltk ==3.8.1
  • numpy ==1.26.0
  • onnxruntime ==1.16.0
  • openai ==0.28.1
  • overrides ==7.4.0
  • packaging ==23.2
  • phply *
  • playwright ==1.38.0
  • pluggy ==1.3.0
  • posthog ==3.0.2
  • protobuf ==4.24.4
  • pulsar-client ==3.3.0
  • pydantic ==2.4.2
  • pydantic_core ==2.10.1
  • pyee ==9.0.4
  • pytest ==7.4.2
  • pytest-asyncio ==0.21.1
  • pytest-base-url ==2.0.0
  • pytest-playwright ==0.4.3
  • python-dateutil ==2.8.2
  • python-dotenv ==1.0.0
  • python-iso639 ==2023.6.15
  • python-magic ==0.4.27
  • python-slugify ==8.0.1
  • regex ==2023.10.3
  • requests ==2.31.0
  • rich ==13.6.0
  • six ==1.16.0
  • sniffio ==1.3.0
  • soupsieve ==2.5
  • starlette ==0.27.0
  • sympy ==1.12
  • tabulate ==0.9.0
  • tenacity ==8.2.3
  • text-unidecode ==1.3
  • tiktoken ==0.5.1
  • tokenizers ==0.14.0
  • tomli ==2.0.1
  • tqdm ==4.66.1
  • typer ==0.9.0
  • typing-inspect ==0.9.0
  • typing_extensions ==4.8.0
  • unstructured ==0.10.19
  • urllib3 ==2.0.7
  • uvicorn ==0.23.2
  • watchfiles ==0.20.0
  • websockets ==11.0.3
  • yarl ==1.9.2