hg-genai-workshop

https://github.com/stuartpearce-hg/hg-genai-workshop

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: stuartpearce-hg
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 28.3 MB

Statistics

Stars: 2
Watchers: 2
Forks: 2
Open Issues: 1
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

🤔 What is this, how does it work?

This tooling aims to enhance the quality and accuracy of responses Gen AI LLMs provide when asked to support software engineers refactoring enterprise codebases. Expanding the context made available to the LLM through use of retrieval augmented generation to select code most relevant to an engineer's prompt.

These tools utilise LangChain (https://python.langchain.com/docs/get_started/introduction) source code can be indexed for use with a range of LLMs. You are encouraged to adapt and enhance tools for specific use cases and where possible contribute back enhancements to the community. Please refer to LICENSE file for details of the MIT license in use

Usage

Set configuration values
Rename .env.example to .env and populate with configuration values for your LLM provider. Hg currently reccomend use of Azure OpenAI Services
OPENAIAPIBASE = URL to your LLM account
OPENAIAPIKEY = Your API Key
OPENAIAPITYPE = "azure"
OPENAIAPIVERSION = Service API version to be used
OPENAIAPIDEPLOYMENTNAME = Name of your deployed LLM
OPENAIAPIEMBEDDINGSNAME = Name of your deployed embedding engine
REPOSITORY_DIRECTORY = Path to source code for analysis

Dependency setup
With pip: pip install openai -r requirements.txt

Build vecotr database of code
Edit line 37 of build.py to refernce file extensions relevant to the codebase e.g. for C# set suffixes=['.cs', '.csproj', '.sln'] Filtering greatly improves performance by eliminating content thats not relevant to queries from the model context Run build.py

Query the code
Uncomment lines 51-55 of query.py and adjust the prompt to provide base context of the nature of code and application working with. This can help focus the range of suggestions made by LLM. run query.py pairing with LLM to interogate codebase and options to enhance prompt "quit" when finished

Large codebases can suffer degraded performance when too much or too little context is made available to the LLM alongside prompt inputs. Adjusting the code retrieval parameters on line 41 searchkwargs={"k": 20, "fetchk": 50} can improve LLM performance by ensuring sufficient context is provided but not too much to include unrelated code.

Agent tooling is under development This tooling Uses CrewAI to orchestrate agent based resolution of tasks within a defined process Agents can utilise tools which require external setup for Github and Jira

See https://python.langchain.com/docs/integrations/toolkits/github and https://python.langchain.com/docs/integrations/toolkits/jira for setup instructions

Owner

Login: stuartpearce-hg
Kind: user

Repositories: 3
Profile: https://github.com/stuartpearce-hg

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Chase"
  given-names: "Harrison"
title: "LangChain"
date-released: 2022-10-17
url: "https://github.com/langchain-ai/langchain"

GitHub Events

Total

Issue comment event: 4
Push event: 15
Pull request review event: 1
Pull request review comment event: 2
Pull request event: 3
Create event: 3

Last Year

Issue comment event: 4
Push event: 15
Pull request review event: 1
Pull request review comment event: 2
Pull request event: 3
Create event: 3

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 15
Average time to close issues: N/A
Average time to close pull requests: 6 days
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.2
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 8

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

Pull Request Authors

stuartpearce-hg (13)
dependabot[bot] (11)
devin-ai-integration[bot] (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (11)

Dependencies

requirements.txt pypi

PyPika ==0.48.9
PyYAML ==6.0.1
SQLAlchemy ==2.0.21
aiohttp ==3.9.0
aiosignal ==1.3.1
annotated-types ==0.5.0
anyio ==3.7.1
async-timeout ==4.0.3
attrs ==23.1.0
backoff ==2.2.1
bcrypt ==4.0.1
beautifulsoup4 ==4.12.2
certifi ==2023.7.22
chardet ==5.2.0
charset-normalizer ==3.3.0
chroma-hnswlib ==0.7.3
chromadb ==0.4.13
click ==8.1.7
coloredlogs ==15.0.1
dataclasses-json ==0.6.1
emoji ==2.8.0
esprima ==4.0.1
exceptiongroup ==1.1.3
faiss-cpu ==1.7.4
fastapi ==0.103.2
filelock ==3.12.4
filetype ==1.2.0
flatbuffers ==23.5.26
frozenlist ==1.4.0
fsspec ==2023.9.2
greenlet ==3.0.0
h11 ==0.14.0
httptools ==0.6.0
huggingface-hub ==0.16.4
humanfriendly ==10.0
idna ==3.4
importlib-resources ==6.1.0
iniconfig ==2.0.0
joblib ==1.3.2
jsonpatch ==1.33
jsonpointer ==2.4
langchain ==0.0.329
langdetect ==1.0.9
langsmith ==0.0.42
lxml ==4.9.3
marshmallow ==3.20.1
monotonic ==1.6
mpmath ==1.3.0
multidict ==6.0.4
mypy-extensions ==1.0.0
nltk ==3.8.1
numpy ==1.26.0
onnxruntime ==1.16.0
openai ==0.28.1
overrides ==7.4.0
packaging ==23.2
phply *
playwright ==1.38.0
pluggy ==1.3.0
posthog ==3.0.2
protobuf ==4.24.4
pulsar-client ==3.3.0
pydantic ==2.4.2
pydantic_core ==2.10.1
pyee ==9.0.4
pytest ==7.4.2
pytest-asyncio ==0.21.1
pytest-base-url ==2.0.0
pytest-playwright ==0.4.3
python-dateutil ==2.8.2
python-dotenv ==1.0.0
python-iso639 ==2023.6.15
python-magic ==0.4.27
python-slugify ==8.0.1
regex ==2023.10.3
requests ==2.31.0
rich ==13.6.0
six ==1.16.0
sniffio ==1.3.0
soupsieve ==2.5
starlette ==0.27.0
sympy ==1.12
tabulate ==0.9.0
tenacity ==8.2.3
text-unidecode ==1.3
tiktoken ==0.5.1
tokenizers ==0.14.0
tomli ==2.0.1
tqdm ==4.66.1
typer ==0.9.0
typing-inspect ==0.9.0
typing_extensions ==4.8.0
unstructured ==0.10.19
urllib3 ==2.0.7
uvicorn ==0.23.2
watchfiles ==0.20.0
websockets ==11.0.3
yarl ==1.9.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science