graphrag-dialogue-insights
This repository contains the source code of GraphRAG Dialogue Insights (GDI).
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
This repository contains the source code of GraphRAG Dialogue Insights (GDI).
Basic Info
- Host: GitHub
- Owner: DelinaLy
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 135 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
GraphRAG Dialogue Insights (GDI)
This repository contains the source code of GraphRAG Dialogue Insights (GDI).
Issue Tracking Systems are a valuable and ever-evolving source of knowledge, as team members can read, update, expand, and act on the captured information. We use the term work items to refer to any trackable unit of work in the software development process, including epics, features, user stories, bugs, and tasks. Detailed work items that are adequately linked to one another are helpful in supporting software development teams. However, as systems and projects increase in complexity, navigating work items using traditional ITSs becomes challenging and time-consuming, as these systems have limited functionality for searching and acquiring knowledge, making it difficult to answer complex questions required for tasks.
We introduce GDI, which enables users to query work items from issue-tracking systems (represented as a knowledge graph) using natural language. In addition to a natural language response, GDI returns supporting information to help users understand how the LLM queries the knowledge graph.
Since GDI is based on GraphRAG, an LLM and a knowledge graph are essential components of a GDI instance, together with the GDI Core component that facilitates interaction among the components in GDI (see the sequence diagram below).

The user selects a stakeholder-based persona prompt and the preferred LLM from the dropdown buttons in the left sidebar of GDI. - (M1) The user initiates the interaction by asking a question to GDI. - (M2) The GDI Core component constructs a prompt using the prompt template that includes the graph schema and the users question and sends it to LLM. - (M3) Based on this prompt, the LLM generates a query in the query language used for the knowledge graph. - (M4) The GDI Core component queries the knowledge graph with this query. - (M5) The knowledge graph sends the structured data corresponding to the query as the retrieved context. - (M6) GDI Core constructs a prompt using this context, the persona selected by the user, and a question, and sends it to the LLM. - (M7) GDI Core receives the NL response. - (M8) Finally, the output is presented to the user.
The output consists of a) a natural language response to the question, b) the generated query used to query the knowledge graph, and c) the retrieved context as a result of the query.
You can use this repository to: 1. Interact with GDI to determine its suitability for your context. 2. Use your own knowledge graph and query work items using GDI. 3. Reproduce the results presented in the Related Publication section by following the steps outlined in the Steps to Reproduce section.
[!NOTE]
We provide the open-source knowledge graph. Due to confidentiality agreements, we cannot share the proprietary knowledge graph used in our study. To use GDI with your data, you have to create a knowledge graph. Our paper offers an overview of the knowledge graph construction process. For more instructions, refer to Fensel, D. et al. (2020). How to Build a Knowledge Graph. In: Knowledge Graphs. Springer, Cham. https://doi.org/10.1007/978-3-030-37439-6_2 .
Table of Contents
Table of Contents (click to expand)
1. Repository Structure
In this section, we first present a graphical overview of the repositorys folder structure and files. Subsequently, we provide an explanation of each folder and file.
Graphical Overview
GDI/
src
knowledge-graph
improved_version_knowledge_graph.txt
original_version_knowledge_graph.txt
prompts
cypher_prompt.py
stakeholder_prompt.py
GDI.py
process_logs.py
requirements.txt
supplementary-materials
saved_logs
graphrag_dialogue_insights_{timestamp}.json
onboarding
software_engineer
extracted questions
P1_extracted_questions.json
P2_extracted_questions.json
P3_extracted_questions.json
P4_extracted_questions.json
P5_extracted_questions.json
P1_graphrag_dialogue_insights.json
P2_graphrag_dialogue_insights.json
P3_graphrag_dialogue_insights.json
P4_graphrag_dialogue_insights.json
P5_graphrag_dialogue_insights.json
software_engineer_few_shot
P1_graphrag_dialogue_insights.json
P2_graphrag_dialogue_insights.json
P3_graphrag_dialogue_insights.json
P4_graphrag_dialogue_insights.json
P5_graphrag_dialogue_insights.json
trace-link-recovery
Informed-consent.docx
Protocol.docx
Questions.docx
sequence-diagram.png
.gitignore
CITATION.cff
LICENSE
README.md
Description of Folders and Files
Description of Folders and Files (click to expand)
| **Folder/file** | **Description** | |---------------|-----------------| | [GDI/](/)| The root folder of GDI, including the source code and supplementary materials.| | [GDI/src/](src/)| The folder contains the core source code of the GDI project. | | [GDI/src/knowledge-graph](src/knowledge-graph)| The folder contains the Cypher scripts used to build the knowledge graphs in [Neo4j](https://neo4j.com/). | | [GDI/src/knowledge-graph/improved_version_knowledge_graph.txt](src/knowledge-graph/improved_version_knowledge_graph.txt)| Cypher script for the improved knowledge graph, based on participant feedback. | | [GDI/src/knowledge-graph/original_version_knowledge_graph.txt](src/knowledge-graph/original_version_knowledge_graph.txt)| Cypher script for the original version of the knowledge graph. | | [GDI/src/prompts](src/prompts)| The folder contains prompt templates for generating prompts.| | [GDI/src/prompts/cypher_prompt.py](src/prompts/cypher_prompt.py)| Prompt template for generating Cypher statements. | | [GDI/src/prompts/stakeholder_prompt.py](src/prompts/stakeholder_prompt.py)| Prompt template for generating natural language answers. | | [GDI/src/GDI.py](src/GDI.py)| The file contains the source code of GDI, including the GDI Core that facilitates interaction among the selected LLM and the knowledge graph in [Neo4j](https://neo4j.com/).| | [GDI/src/process_logs.py](src/process_logs.py)| The file contains the source for extracting unique questions from the participants' logs. | | [GDI/src/requirements.txt](src/requirements.txt)| The file contains a list of packages or libraries required to execute the source code files. | | [GDI/supplementary-materials](supplementary-materials)| The folder contains the non-executable materials that have been used in the research of the related publication. | | [GDI/supplementary-materials/saved-logs](supplementary-materials/saved-logs)| The folder contains the saved log files of the session. | | [GDI/supplementary-materials/saved-logs/graphrag_dialogue_insights_[timestamp]_.json](supplementary-materials/saved-logs/graphrag_dialogue_insights_[timestamp]_.json)| The folder contains the saved log files of the session. | | [GDI/supplementary-materials/onboarding](supplementary-materials/onboarding)| The folder contains the supplementary materials for the onboarding use case. | | [GDI/supplementary-materials/onboarding/software_engineer](supplementary-materials/onboarding/software_engineer)| The folder contains the log files from the user-centered validation sessions. | | [GDI/supplementary-materials/onboarding/software_engineer/extracted_questions](supplementary-materials/onboarding/software_engineer/extracted_questions)| The folder contains extracted questions from the log files of the user-centered validation sessions. | | [GDI/supplementary-materials/onboarding/software_engineer/extracted_questions/P[Number]_graphrag_dialogue_insights.json](supplementary-materials/onboarding/software_engineer/extracted_questions/P[Number]_graphrag_dialogue_insights.json)| The extracted user questions from the log files of participants P1-P5 of the user-centered validation sessions. | | [GDI/supplementary-materials/onboarding/software_engineer/P[Number]_graphrag_dialogue_insights.json](supplementary-materials/onboarding/software_engineer/P[Number]_graphrag_dialogue_insights.json)| The log files of participants P1-P5 of the user-centered validation sessions. | | [GDI/supplementary-materials/onboarding/software_engineer/P[Number]_few_shot_graphrag_dialogue_insights.json](supplementary-materials/onboarding/software_engineer/P[Number]_few_shot_graphrag_dialogue_insights.json)| The log files of participants P1-P5 of the user-centered validation sessions, when we rerun the user questions after applying few-shot prompting. | | [GDI/supplementary-materials/trace-link-recovery](supplementary-materials/trace-link-recovery)| The folder contains the supplementary materials for the trace link recovery use case. | | [GDI/supplementary-materials/trace-link-recovery/Informed-consent.docx](supplementary-materials/trace-link-recovery/Informed-consent.docx)| Informed consent form for the user-centered validation sessions. | | [GDI/supplementary-materials/trace-link-recovery/Protocol.docx](supplementary-materials/trace-link-recovery/Protocol.docx)| Protocol to conduct the user-centered validation session. | | [GDI/supplementary-materials/Questions.docx](supplementary-materials/Questions.docx)| The file contains questions regarding the validation of GDI, which are used in the survey (onboarding use case) and the interviews (trace link recovery use case). | | [GDI/supplementary-materials/sequence-diagram.png](supplementary-materials/sequence-diagram.png)| The sequence diagram presents the interactions between GDI Core, LLM, and the knowledge graph. |
( Back to Top)
2. System Requirements
The following requirements must be met in order to use the artifacts in this repository: - To run the source code, Python 3.9 has to be installed, and an Integrated Development Environment (IDE) such as Visual Studio Code is recommended. - [Optional] To clone the repository, you have to install Git. - [Optional] To use the Docker setup, Docker Desktop is required. - [Optional] To use OpenAI's GPT-4, an OpenAI API key is needed. - To open and edit the supplementary material, specifically .docx files, Microsoft Word is required.
The technologies used in this repository, such as Streamlit, Neo4j, and Cypher, are modular and can be interchanged.
All the required packages and libraries can be automatically installed when running the following command: pip install -r src/requirements.txt.
[!NOTE]
We have tested GDI on two devices: 1) a Mac M1 Pro with 32GB unified memory (200GB/s bandwidth), a 14-core GPU, and a 16-core Neural Engine, and 2) a desktop with Intel Core i7 CPU, Nvidia GeForce RTX 4080 Super (16VRAM), and DDR5 16GB Memory. If you are running GDI on a device with lower specifications, it may take several minutes before you receive a response.
( Back to Top)
3. Installation Instructions
There are two ways to run GDI: 1) Via a Docker setup (this is the faster option). 2) Through a local installation.
1. Docker setup
A. Setting up Neo4j
Open Docker Desktop, and on the bottom-right, click on >_ Terminal button, and run the following command to start a Neo4j instance:
docker run --name neo4j-apoc --publish=7474:7474 --publish=7687:7687 --env='NEO4J_PLUGINS=[\"apoc\"]' --env=NEO4J_AUTH=neo4j/password neo4j:latestOpen http://localhost:7474/ in your browser and use the credentials below to log in to Neo4j:
Username: neo4j Password: password (for your own instance, please use a secure password)Copy the script of the original, or the improved version of the knowledge graph, and paste it into the query window in Neo4j and run it. A success message (e.g., "Creates X nodes, created X relationships, set X properties, added X labels") confirms that the knowledge graph has been created.
B. Setting up Ollama
- For CPU only, run the following command to set up ollama:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama To start a CPU-only Docker container, run the following command:
ollama run llama3.1For GPU Support, additional steps are required, which can be followed in Ollama GPU Docker Guide. For a more detailed guide, please follow the steps in the detailed guide to configure GPU support.
2. Local setup
A. Setting up Neo4j
- Install Neo4j desktop.
- Click on the button Create instance, fill in the following details, and click on the Create button:
Instance name: select an instance name representing the knowledge graph (e.g., Microservices Example) Neo4j version: select the newest version (e.g., 2025.05.0) Username: neo4j Password: password (for your own instance, please use a secure password) - On the right side bar, click on Local instances, and navigate to the created instance. Click on the three dots at the top right of the instance. A drop-down menu appears, then click on the Plugins option.
- Search for the APOC plugin and click the Install button.
- Find the neo4j.conf by copying the path button in Neo4j, and click on the conf folder. Click on the neo4j.conf and add the following line
dbms.security.procedures.unrestricted=apoc.*underdbms.security.procedures.allowlist=apoc.*. - Restart the instance to finish the plugin installation by clicking on the restart button.
- Copy the script of the original, or the improved version of the knowledge graph, and paste it into the query window in Neo4j and run it. A success message (e.g., "Creates X nodes, created X relationships, set X properties, added X labels") confirms that the knowledge graph has been created.
B. Setting up Ollama
- Install Ollama.
- Install and run Llama3.1 by running the following command:
ollama run llama3.1
3. Running GDI
To run GDI, please follow the steps below:
1. Download the repository by clicking on the <> Code button, and clicking on Download ZIP, or clone the repository via your preferred method. We suggest using the git command: git clone https://github.com/DelinaLy/GDI.git
2. Open the repository in your preferred IDE.
3. To avoid conflicts with your local Python environment, we suggest creating a virtual environment.
- 3.1 Install virtualenv: pip install virtualenv
- 3.2 Create a virtual environment: python -m venv env
- 3.3 Activate the virtual environment:
For Windows: env/Scripts/activate.bat (commandline) or env/Scripts/Activate.ps1 (powershell)
For MAC/Linux: source env/bin/activate
- 3.4 Navigate to the src folder and install the requirements: pip install -r requirements.txt
4. To start GDI, run the following command in the src folder: streamlit run GDI.py
You can open GDI in your browser by navigating to http://localhost:8501.
( Back to Top)
4. Usage Instructions
Please follow the installation instructions to run GDI.
We have provided a description of GDI's user interface in Section III in our paper. A Cheat Sheet (collapsible) with example user questions, such as "I want to fix a bug in [REPLACE WITH SERVICE NAME]. What are its dependencies?", along with guidelines are provided in the GDI's browser interface. The user can explore the knowledge graph by running MATCH (n)-[r]->(m) RETURN n, r, m in the query window of Neo4j. This will provide an overview of the knowledge graph. To use GDI, please follow the steps below:
- The user selects a stakeholder-based persona prompt and the preferred LLM from the dropdown buttons in the left sidebar of GDI.
- The user can type the question in the input field and click on the Send button to receive a response. The response consists of a) a natural language response, b) a corresponding Neo4j Query (which can be copied and run into Neo4j), and c) the retrieved context (which can be copied, collapsed, and expanded).
- Based on the answer given by GDI, the user can ask other questions by repeating steps 1 and 2.
- The user can save the chat history by clicking on the Save Session button. The saved files can be found in the folder GDI/supplementary-materials/saved-logs.
The video below provides a visual representation of steps 1 and 2:
https://github.com/user-attachments/assets/9eee107f-a8e9-470d-8899-1e52b4d1d6a1
For the onboarding use case, we conducted a group cognitive walkthrough with participants to systematically validate GDI by reasoning through user actions and identifying potential usability issues. The steps and protocol for this session are described in Section IV in our paper.
For the trace link recovery, we have provided the following materials: - The informed consent form, which other researchers can use to obtain informed consent from participants for user-centered validation sessions. - The protocol for user-centered validation sessions, enabling researchers to replicate user-centered validation sessions.
For both use cases, we used the same set of validation questions in the survey (onboarding use case) and the interviews (trace link recovery use case).
( Back to Top)
5. Steps to Reproduce
In this section, we describe how to reproduce the results from the individual user-centered validation sessions of the onboarding use case.
1. In the GDI/supplementary-materials/onboarding/software_engineer/ folder, we have provided the logs of the participants P1-P5 from the user-centered validation sessions. To extract the unique questions from these logs, run python process.logs.py in the src folder. A file is generated for each participant in the /supplementary-materials/onboarding/softwareengineer/extractedquestions.
2. In GDI/src/GDI.py, we have provided comments on lines 79-88 and 94-95 to provide guidance to loop through the user questions and reproduce the results. Note that since we are using a large language model, the answer can vary slightly from the reproduced results.
3. Follow the installation instructions.
4. Click on the Send button. The user questions are automatically asked to GDI. Depending on your desktop, this can take a few minutes.
5. You can save the log by clicking on the Save Session button. The saved files can be found in the folder GDI/supplementary-materials/saved-logs.
( Back to Top)
6. Related Publication
GDI is introduced in our paper, "Navigating through Work Items in Issue Tracking Systems via Natural Language Queries," which has been accepted for presentation in the Industrial Innovation Track at the Requirements Engineering Conference.
Ly, D., Radhakrishnan, S., Aydemir, F. B., & Dalpiaz, F. (2025). "Navigating through Work Items in Issue Tracking Systems via Natural Language Queries," 2025 33rd IEEE International Requirements Engineering Conference (RE), Valencia, Spain, 2025
( Back to Top)
7. Authors Information
The artifact was created by the following authors: | Name | Affiliation | ORCID | Contact | |---|---|---|---| | Delina Ly (corresponding author) | VX Company, Utrecht University | https://orcid.org/0000-0002-7972-7530 | dly@vxcompany.com| | Sruthi Radhakrishnan | itemis, Germany | N/A | radhakrishnan@itemis.com| | Dr. Fatma Baak Aydemir | Utrecht University, The Netherlands | https://orcid.org/0000-0003-3833-3997| f.b.aydemir@uu.nl| | Prof. Dr. Fabiano Dalpiaz | Utrecht University, The Netherlands |https://orcid.org/0000-0003-4480-3887| f.dalpiaz@uu.nl |
( Back to Top)
8. Suggested citation
If you would like to cite this artifact, we suggest using the following citation in your paper:
You can cite this repository by clicking on the "Cite this repository" button on the top-right menu of this repository to copy the APA and BibTeX versions of the citation.
This repository is licensed under the GNU GPLv3 License.
( Back to Top)
Owner
- Name: Delina Ly
- Login: DelinaLy
- Kind: user
- Repositories: 1
- Profile: https://github.com/DelinaLy
Citation (CITATION.cff)
cff-version: 1.2.0
title: GraphRAG Dialogue Insights (v1.0-updated-docs)
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Delina
family-names: Ly
email: dly@vxcompany.com
affiliation: 'VX Company, Utrecht University'
orcid: 'https://orcid.org/0000-0002-7972-7530'
- given-names: Sruthi
family-names: Radhakrishnan
email: radhakrishnan@itemis.com
affiliation: itemis
- given-names: Fatma Başak
family-names: Aydemir
email: f.b.aydemir@uu.nl
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0003-3833-3997'
- given-names: Fabiano
family-names: Dalpiaz
email: f.dalpiaz@uu.nl
affiliation: Utrecht University
orcid: 'https://orcid.org/0000-0003-4480-3887'
identifiers:
- type: doi
value: 10.5281/zenodo.15804044
description: 'The DOI of the corresponding release of GraphRAG Dialogue Insights on Zenodo.'
repository-code: 'https://github.com/DelinaLy/GDI/'
abstract: >+
This repository contains the source code of GraphRAG
Dialogue Insights (GDI).
GDI enables users to query work items from issue-tracking
systems (represented as a knowledge graph) using natural
language. In addition to a natural language response, GDI
returns supporting information to help users understand
how the LLM queries the knowledge graph. The technologies
used in this repository, such as Streamlit, Neo4j, and
Cypher, are modular and can be interchanged.
You can use this repository to:
1. Interact with GDI to determine its suitability for your
context.
2. Connect your own knowledge graph and query work items
using natural language.
3. Reproduce the results presented in the "Related Publication" section by following the steps outlined in the "Steps to Reproduce" section.
Note: Note: We provide the open-source knowledge graph. Due to confidentiality agreements,
we are unable to share the proprietary knowledge graph used in our study.
To use GDI with your own data, you have to create a custom knowledge graph. Our paper offers
an overview of the knowledge graph construction process. For more detailed instructions, refer to
the book "How to Build a Knowledge Graph" by Fensel et al. https://link.springer.com/chapter/10.1007/978-3-030-37439-6_2.
keywords:
- Issue Tracking Systems
- Large Language Models
- Prompt Engineering
- GraphRAG
license: GPL-3.0-or-later
GitHub Events
Total
- Release event: 2
- Watch event: 2
- Push event: 13
- Create event: 4
Last Year
- Release event: 2
- Watch event: 2
- Push event: 13
- Create event: 4
Dependencies
- PyYAML ==6.0.2
- SQLAlchemy ==2.0.35
- aiohappyeyeballs ==2.4.3
- aiohttp ==3.11.8
- aiosignal ==1.3.1
- annotated-types ==0.7.0
- anyio ==4.6.2.post1
- async-timeout ==4.0.3
- attrs ==24.2.0
- certifi ==2024.8.30
- charset-normalizer ==3.4.0
- dataclasses-json ==0.6.7
- distro ==1.9.0
- exceptiongroup ==1.2.2
- frozenlist ==1.5.0
- h11 ==0.14.0
- httpcore ==1.0.7
- httpx ==0.28.0
- httpx-sse ==0.4.0
- idna ==3.10
- jiter ==0.8.0
- jsonpatch ==1.33
- jsonpointer ==3.0.0
- langchain ==0.3.9
- langchain-community ==0.3.8
- langchain-core ==0.3.33
- langchain-neo4j ==0.1.1
- langchain-openai ==0.2.10
- langchain-text-splitters ==0.3.2
- langsmith ==0.1.147
- marshmallow ==3.23.1
- multidict ==6.1.0
- mypy-extensions ==1.0.0
- neo4j ==5.27.0
- numpy ==1.26.4
- openai ==1.55.3
- orjson ==3.10.12
- packaging ==24.2
- propcache ==0.2.0
- pydantic ==2.10.2
- pydantic-settings ==2.6.1
- pydantic_core ==2.27.1
- python-dotenv ==1.0.1
- pytz ==2024.2
- regex ==2024.11.6
- requests ==2.32.3
- requests-toolbelt ==1.0.0
- sniffio ==1.3.1
- tenacity ==9.0.0
- tiktoken ==0.8.0
- tqdm ==4.67.1
- typing-inspect ==0.9.0
- typing_extensions ==4.12.2
- urllib3 ==2.2.3
- yarl ==1.18.0