hg-genai-workshop
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: stuartpearce-hg
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 28.3 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
🤔 What is this, how does it work?
This tooling aims to enhance the quality and accuracy of responses Gen AI LLMs provide when asked to support software engineers refactoring enterprise codebases. Expanding the context made available to the LLM through use of retrieval augmented generation to select code most relevant to an engineer's prompt.
These tools utilise LangChain (https://python.langchain.com/docs/get_started/introduction) source code can be indexed for use with a range of LLMs. You are encouraged to adapt and enhance tools for specific use cases and where possible contribute back enhancements to the community. Please refer to LICENSE file for details of the MIT license in use
Usage
Set configuration values
Rename .env.example to .env and populate with configuration values for your LLM provider. Hg currently reccomend use of Azure OpenAI Services
OPENAIAPIBASE = URL to your LLM account
OPENAIAPIKEY = Your API Key
OPENAIAPITYPE = "azure"
OPENAIAPIVERSION = Service API version to be used
OPENAIAPIDEPLOYMENTNAME = Name of your deployed LLM
OPENAIAPIEMBEDDINGSNAME = Name of your deployed embedding engine
REPOSITORY_DIRECTORY = Path to source code for analysis
Dependency setup
With pip:
pip install openai -r requirements.txt
Build vecotr database of code
Edit line 37 of build.py to refernce file extensions relevant to the codebase e.g. for C# set suffixes=['.cs', '.csproj', '.sln']
Filtering greatly improves performance by eliminating content thats not relevant to queries from the model context
Run build.py
Query the code
Uncomment lines 51-55 of query.py and adjust the prompt to provide base context of the nature of code and application working with. This can help focus the range of suggestions made by LLM.
run query.py pairing with LLM to interogate codebase and options to enhance
prompt "quit" when finished
Large codebases can suffer degraded performance when too much or too little context is made available to the LLM alongside prompt inputs. Adjusting the code retrieval parameters on line 41 searchkwargs={"k": 20, "fetchk": 50} can improve LLM performance by ensuring sufficient context is provided but not too much to include unrelated code.
Agent tooling is under development This tooling Uses CrewAI to orchestrate agent based resolution of tasks within a defined process Agents can utilise tools which require external setup for Github and Jira
See https://python.langchain.com/docs/integrations/toolkits/github and https://python.langchain.com/docs/integrations/toolkits/jira for setup instructions
Owner
- Login: stuartpearce-hg
- Kind: user
- Repositories: 3
- Profile: https://github.com/stuartpearce-hg
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Chase" given-names: "Harrison" title: "LangChain" date-released: 2022-10-17 url: "https://github.com/langchain-ai/langchain"
GitHub Events
Total
- Issue comment event: 4
- Push event: 15
- Pull request review event: 1
- Pull request review comment event: 2
- Pull request event: 3
- Create event: 3
Last Year
- Issue comment event: 4
- Push event: 15
- Pull request review event: 1
- Pull request review comment event: 2
- Pull request event: 3
- Create event: 3
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 15
- Average time to close issues: N/A
- Average time to close pull requests: 6 days
- Total issue authors: 0
- Total pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.2
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 8
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
Pull Request Authors
- stuartpearce-hg (13)
- dependabot[bot] (11)
- devin-ai-integration[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- PyPika ==0.48.9
- PyYAML ==6.0.1
- SQLAlchemy ==2.0.21
- aiohttp ==3.9.0
- aiosignal ==1.3.1
- annotated-types ==0.5.0
- anyio ==3.7.1
- async-timeout ==4.0.3
- attrs ==23.1.0
- backoff ==2.2.1
- bcrypt ==4.0.1
- beautifulsoup4 ==4.12.2
- certifi ==2023.7.22
- chardet ==5.2.0
- charset-normalizer ==3.3.0
- chroma-hnswlib ==0.7.3
- chromadb ==0.4.13
- click ==8.1.7
- coloredlogs ==15.0.1
- dataclasses-json ==0.6.1
- emoji ==2.8.0
- esprima ==4.0.1
- exceptiongroup ==1.1.3
- faiss-cpu ==1.7.4
- fastapi ==0.103.2
- filelock ==3.12.4
- filetype ==1.2.0
- flatbuffers ==23.5.26
- frozenlist ==1.4.0
- fsspec ==2023.9.2
- greenlet ==3.0.0
- h11 ==0.14.0
- httptools ==0.6.0
- huggingface-hub ==0.16.4
- humanfriendly ==10.0
- idna ==3.4
- importlib-resources ==6.1.0
- iniconfig ==2.0.0
- joblib ==1.3.2
- jsonpatch ==1.33
- jsonpointer ==2.4
- langchain ==0.0.329
- langdetect ==1.0.9
- langsmith ==0.0.42
- lxml ==4.9.3
- marshmallow ==3.20.1
- monotonic ==1.6
- mpmath ==1.3.0
- multidict ==6.0.4
- mypy-extensions ==1.0.0
- nltk ==3.8.1
- numpy ==1.26.0
- onnxruntime ==1.16.0
- openai ==0.28.1
- overrides ==7.4.0
- packaging ==23.2
- phply *
- playwright ==1.38.0
- pluggy ==1.3.0
- posthog ==3.0.2
- protobuf ==4.24.4
- pulsar-client ==3.3.0
- pydantic ==2.4.2
- pydantic_core ==2.10.1
- pyee ==9.0.4
- pytest ==7.4.2
- pytest-asyncio ==0.21.1
- pytest-base-url ==2.0.0
- pytest-playwright ==0.4.3
- python-dateutil ==2.8.2
- python-dotenv ==1.0.0
- python-iso639 ==2023.6.15
- python-magic ==0.4.27
- python-slugify ==8.0.1
- regex ==2023.10.3
- requests ==2.31.0
- rich ==13.6.0
- six ==1.16.0
- sniffio ==1.3.0
- soupsieve ==2.5
- starlette ==0.27.0
- sympy ==1.12
- tabulate ==0.9.0
- tenacity ==8.2.3
- text-unidecode ==1.3
- tiktoken ==0.5.1
- tokenizers ==0.14.0
- tomli ==2.0.1
- tqdm ==4.66.1
- typer ==0.9.0
- typing-inspect ==0.9.0
- typing_extensions ==4.8.0
- unstructured ==0.10.19
- urllib3 ==2.0.7
- uvicorn ==0.23.2
- watchfiles ==0.20.0
- websockets ==11.0.3
- yarl ==1.9.2