agent-dialogues
Conversation simulation with AI agents to evvaluate behavior changes.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Repository
Conversation simulation with AI agents to evvaluate behavior changes.
Basic Info
- Host: GitHub
- Owner: Savalera
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://docs.savalera.com
- Size: 113 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 3
Metadata Files
README.md
Agent dialogue simulations
Agent Dialogues is a framework for running multi-turn simulations between LLM-based agents. It's designed to be extensible by researchers and developers who want to define, run, and analyze dialogue-based interactions.
Why do this?
We use Agent Dialogues for simulated conversations to check language model and agent behavior, for example:
- Does behavior change over time when chain-prompting with a certain personality?
- How well is personality preserved over time during a conversation?
- How well is role-play preserved over time during a conversation?
Features
Current features
- Runs a conversation of two participants;
initiator, andresponder. - Simulation scenario definition in
yamlfile.- Configurable system prompt per participant.
- Configurable language model per participant.
- Configurable initial messages for both participants.
- Configurable conversation length (number of rounds).
- Command line interface.
- Batch mode.
- Chat agent with Ollama support.
- Toxicity classifier agent with Detoxify.
- Conversation data collection via log file (in
json). - Log converter to
csvdataset. - Basic data analytics support functions to be used in Notebooks.
Known limitations
- This is an MVP, there is a lot to be added.
- Only local Ollama invocation implemented for chat agent.
Planned features
- Huggingface inference support.
- Big 5 personality traits evaluation.
- Sarcasm classifier.
- Self-assessment mid-conversation.
- Self-adoption during conversation.
- Improved data analytics and reporting with detailed documentation.
Structure
The repository is structured as a Python module (agentdialogues/) which can be used directly for writing your own simulations.
You can:
- Use the built-in simulation runner (
sim_cli.py). - Write custom simulations in the
simulations/folder. - Import the
agentdialoguesmodule in your own Python project.
Installation
```bash
1. Clone this repo
git clone https://github.com/savalera/agent-dialogues.git cd agent-dialogues
2. Create a virtual environment (choose one)
Option A: Use uv (recommended for speed)
uv venv uv pip install -r requirements.txt
Option B: Use pip
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
3. Create your own simulation
Place your simulation in the simulations/ directory.
See simulations/bap_chat/ or simulations/bap_cla_tox/ for examples.
4. Run the simulation
Option A: Using uv
uv run -m agentdialogues.simcli \ --sim simulations/bapchat/bapchat.py \ --config simulations/bapchat/scenarios/baby-daddy.yaml \
Option B: Using python
python3 -m agentdialogues.simcli \ --sim simulations/bapchat/bapchat.py \ --config simulations/bapchat/scenarios/baby-daddy.yaml \ ```
Building a simulation
To create your own simulation using agentdialogues, you need to define a Python module that creates a LangGraph graph object at the top level.
Simulations live in the simulations/ directory. You can use simulations/bap_chat or simulations/bap_cla_tox as reference examples.
Requirements for a simulation module
Your simulation module must define the following:
1. graph: a compiled LangGraph workflow
This is the object that agentdialogues invokes when running the simulation. Use StateGraph(...) to build your workflow, then call .compile() and assign the result to graph.
2. Config schema
Define a Pydantic model to validate the YAML scenario config. You can:
- Use
agentdialogues.DialogueSimulationConfig(recommended), or - Create your own Pydantic schema
This config is passed to your simulation in state.raw_config: dict[str, Any]. You are responsible for validating and transforming it in the first node (typically setup_node).
3. Simulation state
Define a SimulationState class using Pydantic. This defines the shape of the state passed between nodes.
Best practices:
- Include a
dialogue: Dialoguefield (a list of DialogueItems). - Include the raw config, validated config, and runtime fields.
- Store all runtime dependencies (e.g. Runnables, encoders) in the state so LangGraph Studio can run and inspect your simulation.
State example:
python
class SimulationState(BaseModel):
dialogue: Dialogue = []
raw_config: dict[str, Any] = Field(default_factory=dict)
config: Optional[DialogueSimulationConfig] = None
runtime: Optional[Runtime] = None
4. Setup node
Create a setup_node that:
- Validates and parses the
raw_config. - Calculates runtime constants (e.g.
MAX_MESSAGES). - Optionally injects other objects (e.g. model seeds, loaded tools) into state.
5. Build your graph
Use LangGraph primitives to define your workflow:
- Add your nodes with
add_node(...). - Route control flow with
add_edge(...)andadd_conditional_edges(...). - Start and end with the
STARTandENDsymbols.
Example:
```python workflow = StateGraph(SimulationState) workflow.addnode("setup", setupnode) workflow.addnode("initiator", initiatornode) workflow.addnode("responder", respondernode)
workflow.addedge(START, "setup") workflow.addedge("setup", "initiator") workflow.addedge("initiator", "responder") workflow.addconditionaledges( "responder", shouldcontinue, {"continue": "initiator", "end": END} )
graph = workflow.compile() ```
Scenarios
Every simulation can support multiple scenarios. These are variations of the same workflow with different config parameters.
Scenarios are defined in YAML under the simulation’s scenarios/ directory.
You run a specific scenario using the command line, for example:
bash
uv run -m agentdialogues.sim_cli \
--sim simulations/my_simulation/main.py \
--config simulations/my_simulation/scenarios/variant-01.yaml
The --sim argument points to the simulation module (must expose a graph variable).
The --config argument provides the scenario YAML file.
Batch runs
You can repeat the same simulation scenario multiple times using the --batch argument:
batch
uv run -m agentdialogues.sim_cli \
--sim simulations/my_simulation/main.py \
--config simulations/my_simulation/scenarios/variant-01.yaml \
--batch 50
Each run will receive a different random seed and generate a separate log file. You can retrieve the seed for each run from the logs to reproduce it later.
Simulation examples
Agent Dialogues comes with two built-in simulation examples:
simulations/bap_chat- uses chat_agent to simulate a dialogue between two participants.simulations/bap_cla_tox- uses chat agent to simulate a dialogue and applies Detoxify toxicity classification on every message.
Running Your Simulation in LangGraph Studio
To run your simulation interactively in LangGraph Studio, you can register it in the langgraph.json file:
json
{
"bap_chat": "./simulations/bap_chat/bap_chat.py:graph"
}
The key (bap_chat) is the name of your simulation, and the value is the path to your module followed by :graph, referring to the compiled graph object.
Once registered, you can launch LangGraph Studio with:
bash
uv run langgraph dev
For more information on setup and advanced usage, see the LangGraph Studio documentation.
Built-in Agents
Agent Dialogues includes a set of built-in agents designed to cover common simulation use cases:
1. chat_agent
This agent is used for LLM-based conversational turns. It currently supports:
- Local Ollama.
- Planned Hugging Face support.
You can customize the model via the model_name and provider fields in the simulation config. The agent accepts messages, system prompts, and a seed for reproducibility.
2. detoxify_agent
This agent performs toxicity classification using the Detoxify model locally. It requires no external API and supports GPU acceleration via PyTorch.
The simulations/bapclatox/bapclatox.py example demonstrates a dialogue simulation where each message is followed by a toxicity classification.
Core schemas
Agent Dialogues revolves around two main data structures:
Dialogue— captures the conversation.DialogueSimulationConfig— defines the setup for a full simulation.
Dialogue format
A Dialogue is a list of turns between two agents. Each turn is represented as a DialogueItem, which includes:
python
DialogueItem:
role: Roles # either "initiator" or "responder"
message: str
meta: Optional[list[dict[str, Any]]]
The meta field can be used to store arbitrary structured annotations, such as evaluation results or classifier outputs.
Example:
json
[
{
"role": "initiator",
"message": "What is love?",
"meta": [{ "toxicity": 0.01 }]
},
{
"role": "responder",
"message": "Love is a deep emotional connection.",
"meta": [{ "toxicity": 0.01 }]
}
]
DialogueSimulationConfig
This schema defines the full setup for a simulation and is used to validate the scenario YAML file. It ensures consistency in how agents and their behavior are configured.
```python DialogueSimulationConfig: id: str # Unique simulation ID name: str # Human-readable name for logs/UI seed: int # Global seed for reproducibility
initiator: DialogueParticipantWithMessagesConfig responder: DialogueParticipantConfig
runtime: RuntimeConfig evaluation: Optional[dict[str, Any]] ```
Agent definitions
Both participants use a similar schema to define their behavior and models:
python
DialogueParticipantConfig:
name: str
role: str
model:
provider: "Ollama" | "HuggingFace" | "HFApi"
model_name: str
system_prompt: str
For the initiator, you can optionally provide a list of seed messages:
python
DialogueParticipantWithMessagesConfig:
messages: Optional[list[str]]
These messages are injected during the dialogue, one message per turn. This way you can seed messages in several turns into a simulation.
Runtime settings
The number of dialogue turns is specified in the runtime block:
python
RuntimeConfig:
rounds: int # number of dialogue rounds (initiator+responder = 2x)
Evaluation settings
Optionally, you can define evaluation steps — for example, to run classifiers on messages.
yaml
evaluation:
detoxify:
model: unbiased
device: mps
Data and analytics
Simulation runs are automatically logged under the logs/ directory.
Each scenario has its own subfolder, and each individual run is saved as a separate .json file.
To analyze results across multiple runs, use the built-in aggregation script:
bash
python3 -m agentdialogues.aggregate_logs.py --simulation logs/baby-daddy
This creates an aggregated_scores.csv file in the scenario’s log folder, containing flattened data suitable for further analysis.
The project also provides a /notebooks directory where you can store and run Jupyter notebooks.
Support for analytics helper functions (e.g., DataFrames, plotting) is implemented in the analytics module (currently in alpha).
Citation
If you use this project, please cite it as below:
bibtex
@software{Savalera_Agent_Dialogues_Multi-Agent_2025,
author = {Savalera},
doi = {10.5281/zenodo.15082311},
month = mar,
title = {{Agent Dialogues: Multi-Agent Simulation Framework for AI Behavior Research}},
url = {https://github.com/savalera/agent-dialogues},
version = {0.1.0-beta.1},
year = {2025}
}
Owner
- Name: Savalera
- Login: Savalera
- Kind: organization
- Email: info@savalera.com
- Location: Budapest, Hungary
- Website: https://savalera.com
- Repositories: 1
- Profile: https://github.com/Savalera
Savalera
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this project, please cite it as below."
authors:
- name: Savalera
orcid: https://orcid.org/0009-0000-3156-1765
affiliation: Savalera Agentic Lab
title: "Agent Dialogues: Multi-Agent Simulation Framework for AI Behavior Research"
version: "0.1.0-beta.1"
doi: 10.5281/zenodo.15082311
date-released: 2025-03-25
url: https://github.com/savalera/agent-dialogues
GitHub Events
Total
- Release event: 2
- Watch event: 1
- Delete event: 1
- Push event: 42
- Pull request event: 5
- Create event: 7
Last Year
- Release event: 2
- Watch event: 1
- Delete event: 1
- Push event: 42
- Pull request event: 5
- Create event: 7
Dependencies
- accelerate ==1.6.0
- annotated-types ==0.7.0
- anyio ==4.8.0
- appnope ==0.1.4
- argon2-cffi ==23.1.0
- argon2-cffi-bindings ==21.2.0
- argparse ==1.4.0
- arrow ==1.3.0
- asttokens ==3.0.0
- async-lru ==2.0.5
- attrs ==25.3.0
- babel ==2.17.0
- beautifulsoup4 ==4.13.3
- bleach ==6.2.0
- blockbuster ==1.5.24
- certifi ==2025.1.31
- cffi ==1.17.1
- charset-normalizer ==3.4.1
- click ==8.1.8
- cloudpickle ==3.1.1
- comm ==0.2.2
- contourpy ==1.3.1
- coverage ==7.7.1
- cryptography ==43.0.3
- cycler ==0.12.1
- debugpy ==1.8.13
- decorator ==5.2.1
- defusedxml ==0.7.1
- detoxify ==0.5.2
- executing ==2.2.0
- fastjsonschema ==2.21.1
- filelock ==3.18.0
- fonttools ==4.57.0
- forbiddenfruit ==0.1.4
- fqdn ==1.5.1
- fsspec ==2025.3.2
- h11 ==0.14.0
- httpcore ==1.0.7
- httpx ==0.28.1
- huggingface-hub ==0.30.1
- idna ==3.10
- iniconfig ==2.1.0
- ipykernel ==6.29.5
- ipython ==9.1.0
- ipython-pygments-lexers ==1.1.1
- isoduration ==20.11.0
- jedi ==0.19.2
- jinja2 ==3.1.6
- json5 ==0.12.0
- jsonpatch ==1.33
- jsonpointer ==3.0.0
- jsonschema ==4.23.0
- jsonschema-rs ==0.29.1
- jsonschema-specifications ==2024.10.1
- jupyter-client ==8.6.3
- jupyter-core ==5.7.2
- jupyter-events ==0.12.0
- jupyter-lsp ==2.2.5
- jupyter-server ==2.15.0
- jupyter-server-terminals ==0.5.3
- jupyterlab ==4.3.6
- jupyterlab-pygments ==0.3.0
- jupyterlab-server ==2.27.3
- kiwisolver ==1.4.8
- langchain-core ==0.3.40
- langchain-ollama ==0.2.3
- langgraph ==0.2.74
- langgraph-api ==0.0.45
- langgraph-checkpoint ==2.0.24
- langgraph-cli ==0.1.73
- langgraph-sdk ==0.1.61
- langsmith ==0.3.11
- markdown-it-py ==3.0.0
- markupsafe ==3.0.2
- matplotlib ==3.10.1
- matplotlib-inline ==0.1.7
- mdurl ==0.1.2
- mistune ==3.1.3
- mpmath ==1.3.0
- mypy ==1.15.0
- mypy-extensions ==1.0.0
- nbclient ==0.10.2
- nbconvert ==7.16.6
- nbformat ==5.10.4
- nest-asyncio ==1.6.0
- networkx ==3.4.2
- notebook ==7.3.3
- notebook-shim ==0.2.4
- numpy ==2.2.4
- ollama ==0.4.7
- orjson ==3.10.15
- ormsgpack ==1.9.1
- overrides ==7.7.0
- packaging ==24.2
- pandas ==2.2.3
- pandas-stubs ==2.2.3.250308
- pandocfilters ==1.5.1
- parso ==0.8.4
- pexpect ==4.9.0
- pillow ==11.1.0
- platformdirs ==4.3.7
- pluggy ==1.5.0
- prometheus-client ==0.21.1
- prompt-toolkit ==3.0.50
- psutil ==7.0.0
- ptyprocess ==0.7.0
- pure-eval ==0.2.3
- pycparser ==2.22
- pydantic ==2.10.6
- pydantic-core ==2.27.2
- pygments ==2.19.1
- pyjwt ==2.10.1
- pyparsing ==3.2.3
- pytest ==8.3.5
- pytest-cov ==6.0.0
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- python-json-logger ==3.3.0
- pytz ==2025.2
- pyyaml ==6.0.2
- pyzmq ==26.4.0
- referencing ==0.36.2
- regex ==2024.11.6
- requests ==2.32.3
- requests-toolbelt ==1.0.0
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rich ==13.9.4
- rpds-py ==0.24.0
- ruff ==0.9.7
- safetensors ==0.5.3
- send2trash ==1.8.3
- sentencepiece ==0.2.0
- setuptools ==78.1.0
- six ==1.17.0
- sniffio ==1.3.1
- soupsieve ==2.6
- sse-starlette ==2.1.3
- stack-data ==0.6.3
- starlette ==0.46.1
- structlog ==25.2.0
- sympy ==1.13.1
- tenacity ==9.0.0
- terminado ==0.18.1
- tinycss2 ==1.4.0
- tokenizers ==0.21.1
- torch ==2.6.0
- tornado ==6.4.2
- tqdm ==4.67.1
- traitlets ==5.14.3
- transformers ==4.51.0
- types-colorama ==0.4.15.20240311
- types-decorator ==5.2.0.20250324
- types-defusedxml ==0.7.0.20240218
- types-docutils ==0.21.0.20241128
- types-jsonschema ==4.23.0.20241208
- types-pexpect ==4.9.0.20241208
- types-psutil ==7.0.0.20250401
- types-pycurl ==7.45.6.20250309
- types-pygments ==2.19.0.20250305
- types-python-dateutil ==2.9.0.20241206
- types-pytz ==2025.2.0.20250326
- types-pyyaml ==6.0.12.20250326
- types-requests ==2.32.0.20250328
- typing-extensions ==4.12.2
- tzdata ==2025.2
- uri-template ==1.3.0
- urllib3 ==2.3.0
- uvicorn ==0.34.0
- watchfiles ==1.0.4
- wcwidth ==0.2.13
- webcolors ==24.11.1
- webencodings ==0.5.1
- websocket-client ==1.8.0
- zstandard ==0.23.0
- actions/checkout v4 composite
- actions/setup-python v4 composite
- actions/checkout v4 composite
- actions/setup-python v4 composite
- codespell-project/actions-codespell v2 composite
- langchain-ollama >=0.2.3
- langgraph >=0.2.6
- python-dotenv >=1.0.1