agent-dialogues

Conversation simulation with AI agents to evvaluate behavior changes.

https://github.com/savalera/agent-dialogues

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Conversation simulation with AI agents to evvaluate behavior changes.

Basic Info
  • Host: GitHub
  • Owner: Savalera
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://docs.savalera.com
  • Size: 113 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 3
Created 12 months ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

Agent dialogue simulations

CI Integration Tests License: MIT DOI

Agent Dialogues is a framework for running multi-turn simulations between LLM-based agents. It's designed to be extensible by researchers and developers who want to define, run, and analyze dialogue-based interactions.

Why do this?

We use Agent Dialogues for simulated conversations to check language model and agent behavior, for example:

  • Does behavior change over time when chain-prompting with a certain personality?
  • How well is personality preserved over time during a conversation?
  • How well is role-play preserved over time during a conversation?

Features

Current features

  • Runs a conversation of two participants; initiator, and responder.
  • Simulation scenario definition in yaml file.
    • Configurable system prompt per participant.
    • Configurable language model per participant.
    • Configurable initial messages for both participants.
    • Configurable conversation length (number of rounds).
  • Command line interface.
  • Batch mode.
  • Chat agent with Ollama support.
  • Toxicity classifier agent with Detoxify.
  • Conversation data collection via log file (in json).
  • Log converter to csv dataset.
  • Basic data analytics support functions to be used in Notebooks.

Known limitations

  • This is an MVP, there is a lot to be added.
  • Only local Ollama invocation implemented for chat agent.

Planned features

  • Huggingface inference support.
  • Big 5 personality traits evaluation.
  • Sarcasm classifier.
  • Self-assessment mid-conversation.
  • Self-adoption during conversation.
  • Improved data analytics and reporting with detailed documentation.

Structure

The repository is structured as a Python module (agentdialogues/) which can be used directly for writing your own simulations.

You can:

  • Use the built-in simulation runner (sim_cli.py).
  • Write custom simulations in the simulations/ folder.
  • Import the agentdialogues module in your own Python project.

Installation

```bash

1. Clone this repo

git clone https://github.com/savalera/agent-dialogues.git cd agent-dialogues

2. Create a virtual environment (choose one)

Option A: Use uv (recommended for speed)

uv venv uv pip install -r requirements.txt

Option B: Use pip

python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt

3. Create your own simulation

Place your simulation in the simulations/ directory.

See simulations/bap_chat/ or simulations/bap_cla_tox/ for examples.

4. Run the simulation

Option A: Using uv

uv run -m agentdialogues.simcli \ --sim simulations/bapchat/bapchat.py \ --config simulations/bapchat/scenarios/baby-daddy.yaml \

Option B: Using python

python3 -m agentdialogues.simcli \ --sim simulations/bapchat/bapchat.py \ --config simulations/bapchat/scenarios/baby-daddy.yaml \ ```

Building a simulation

To create your own simulation using agentdialogues, you need to define a Python module that creates a LangGraph graph object at the top level.

Simulations live in the simulations/ directory. You can use simulations/bap_chat or simulations/bap_cla_tox as reference examples.

Requirements for a simulation module

Your simulation module must define the following:

1. graph: a compiled LangGraph workflow

This is the object that agentdialogues invokes when running the simulation. Use StateGraph(...) to build your workflow, then call .compile() and assign the result to graph.

2. Config schema

Define a Pydantic model to validate the YAML scenario config. You can:

  • Use agentdialogues.DialogueSimulationConfig (recommended), or
  • Create your own Pydantic schema

This config is passed to your simulation in state.raw_config: dict[str, Any]. You are responsible for validating and transforming it in the first node (typically setup_node).

3. Simulation state

Define a SimulationState class using Pydantic. This defines the shape of the state passed between nodes.

Best practices:

  • Include a dialogue: Dialogue field (a list of DialogueItems).
  • Include the raw config, validated config, and runtime fields.
  • Store all runtime dependencies (e.g. Runnables, encoders) in the state so LangGraph Studio can run and inspect your simulation.

State example:

python class SimulationState(BaseModel): dialogue: Dialogue = [] raw_config: dict[str, Any] = Field(default_factory=dict) config: Optional[DialogueSimulationConfig] = None runtime: Optional[Runtime] = None

4. Setup node

Create a setup_node that:

  • Validates and parses the raw_config.
  • Calculates runtime constants (e.g. MAX_MESSAGES).
  • Optionally injects other objects (e.g. model seeds, loaded tools) into state.

5. Build your graph

Use LangGraph primitives to define your workflow:

  • Add your nodes with add_node(...).
  • Route control flow with add_edge(...) and add_conditional_edges(...).
  • Start and end with the START and END symbols.

Example:

```python workflow = StateGraph(SimulationState) workflow.addnode("setup", setupnode) workflow.addnode("initiator", initiatornode) workflow.addnode("responder", respondernode)

workflow.addedge(START, "setup") workflow.addedge("setup", "initiator") workflow.addedge("initiator", "responder") workflow.addconditionaledges( "responder", shouldcontinue, {"continue": "initiator", "end": END} )

graph = workflow.compile() ```

Scenarios

Every simulation can support multiple scenarios. These are variations of the same workflow with different config parameters.

Scenarios are defined in YAML under the simulation’s scenarios/ directory.

You run a specific scenario using the command line, for example:

bash uv run -m agentdialogues.sim_cli \ --sim simulations/my_simulation/main.py \ --config simulations/my_simulation/scenarios/variant-01.yaml

The --sim argument points to the simulation module (must expose a graph variable). The --config argument provides the scenario YAML file.

Batch runs

You can repeat the same simulation scenario multiple times using the --batch argument:

batch uv run -m agentdialogues.sim_cli \ --sim simulations/my_simulation/main.py \ --config simulations/my_simulation/scenarios/variant-01.yaml \ --batch 50

Each run will receive a different random seed and generate a separate log file. You can retrieve the seed for each run from the logs to reproduce it later.

Simulation examples

Agent Dialogues comes with two built-in simulation examples:

  • simulations/bap_chat - uses chat_agent to simulate a dialogue between two participants.
  • simulations/bap_cla_tox - uses chat agent to simulate a dialogue and applies Detoxify toxicity classification on every message.

Running Your Simulation in LangGraph Studio

To run your simulation interactively in LangGraph Studio, you can register it in the langgraph.json file:

json { "bap_chat": "./simulations/bap_chat/bap_chat.py:graph" }

The key (bap_chat) is the name of your simulation, and the value is the path to your module followed by :graph, referring to the compiled graph object.

Once registered, you can launch LangGraph Studio with:

bash uv run langgraph dev

For more information on setup and advanced usage, see the LangGraph Studio documentation.

Built-in Agents

Agent Dialogues includes a set of built-in agents designed to cover common simulation use cases:

1. chat_agent

This agent is used for LLM-based conversational turns. It currently supports:

  • Local Ollama.
  • Planned Hugging Face support.

You can customize the model via the model_name and provider fields in the simulation config. The agent accepts messages, system prompts, and a seed for reproducibility.

2. detoxify_agent

This agent performs toxicity classification using the Detoxify model locally. It requires no external API and supports GPU acceleration via PyTorch.

The simulations/bapclatox/bapclatox.py example demonstrates a dialogue simulation where each message is followed by a toxicity classification.

Core schemas

Agent Dialogues revolves around two main data structures:

  • Dialogue — captures the conversation.
  • DialogueSimulationConfig — defines the setup for a full simulation.

Dialogue format

A Dialogue is a list of turns between two agents. Each turn is represented as a DialogueItem, which includes:

python DialogueItem: role: Roles # either "initiator" or "responder" message: str meta: Optional[list[dict[str, Any]]]

The meta field can be used to store arbitrary structured annotations, such as evaluation results or classifier outputs.

Example:

json [ { "role": "initiator", "message": "What is love?", "meta": [{ "toxicity": 0.01 }] }, { "role": "responder", "message": "Love is a deep emotional connection.", "meta": [{ "toxicity": 0.01 }] } ]

DialogueSimulationConfig

This schema defines the full setup for a simulation and is used to validate the scenario YAML file. It ensures consistency in how agents and their behavior are configured.

```python DialogueSimulationConfig: id: str # Unique simulation ID name: str # Human-readable name for logs/UI seed: int # Global seed for reproducibility

initiator: DialogueParticipantWithMessagesConfig responder: DialogueParticipantConfig

runtime: RuntimeConfig evaluation: Optional[dict[str, Any]] ```

Agent definitions

Both participants use a similar schema to define their behavior and models:

python DialogueParticipantConfig: name: str role: str model: provider: "Ollama" | "HuggingFace" | "HFApi" model_name: str system_prompt: str

For the initiator, you can optionally provide a list of seed messages:

python DialogueParticipantWithMessagesConfig: messages: Optional[list[str]]

These messages are injected during the dialogue, one message per turn. This way you can seed messages in several turns into a simulation.

Runtime settings

The number of dialogue turns is specified in the runtime block:

python RuntimeConfig: rounds: int # number of dialogue rounds (initiator+responder = 2x)

Evaluation settings

Optionally, you can define evaluation steps — for example, to run classifiers on messages.

yaml evaluation: detoxify: model: unbiased device: mps

Data and analytics

Simulation runs are automatically logged under the logs/ directory.

Each scenario has its own subfolder, and each individual run is saved as a separate .json file.

To analyze results across multiple runs, use the built-in aggregation script:

bash python3 -m agentdialogues.aggregate_logs.py --simulation logs/baby-daddy

This creates an aggregated_scores.csv file in the scenario’s log folder, containing flattened data suitable for further analysis.

The project also provides a /notebooks directory where you can store and run Jupyter notebooks.

Support for analytics helper functions (e.g., DataFrames, plotting) is implemented in the analytics module (currently in alpha).

Citation

If you use this project, please cite it as below:

bibtex @software{Savalera_Agent_Dialogues_Multi-Agent_2025, author = {Savalera}, doi = {10.5281/zenodo.15082311}, month = mar, title = {{Agent Dialogues: Multi-Agent Simulation Framework for AI Behavior Research}}, url = {https://github.com/savalera/agent-dialogues}, version = {0.1.0-beta.1}, year = {2025} }

Owner

  • Name: Savalera
  • Login: Savalera
  • Kind: organization
  • Email: info@savalera.com
  • Location: Budapest, Hungary

Savalera

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this project, please cite it as below."
authors:
  - name: Savalera
    orcid: https://orcid.org/0009-0000-3156-1765
    affiliation: Savalera Agentic Lab
title: "Agent Dialogues: Multi-Agent Simulation Framework for AI Behavior Research"
version: "0.1.0-beta.1"
doi: 10.5281/zenodo.15082311
date-released: 2025-03-25
url: https://github.com/savalera/agent-dialogues

GitHub Events

Total
  • Release event: 2
  • Watch event: 1
  • Delete event: 1
  • Push event: 42
  • Pull request event: 5
  • Create event: 7
Last Year
  • Release event: 2
  • Watch event: 1
  • Delete event: 1
  • Push event: 42
  • Pull request event: 5
  • Create event: 7

Dependencies

requirements.txt pypi
  • accelerate ==1.6.0
  • annotated-types ==0.7.0
  • anyio ==4.8.0
  • appnope ==0.1.4
  • argon2-cffi ==23.1.0
  • argon2-cffi-bindings ==21.2.0
  • argparse ==1.4.0
  • arrow ==1.3.0
  • asttokens ==3.0.0
  • async-lru ==2.0.5
  • attrs ==25.3.0
  • babel ==2.17.0
  • beautifulsoup4 ==4.13.3
  • bleach ==6.2.0
  • blockbuster ==1.5.24
  • certifi ==2025.1.31
  • cffi ==1.17.1
  • charset-normalizer ==3.4.1
  • click ==8.1.8
  • cloudpickle ==3.1.1
  • comm ==0.2.2
  • contourpy ==1.3.1
  • coverage ==7.7.1
  • cryptography ==43.0.3
  • cycler ==0.12.1
  • debugpy ==1.8.13
  • decorator ==5.2.1
  • defusedxml ==0.7.1
  • detoxify ==0.5.2
  • executing ==2.2.0
  • fastjsonschema ==2.21.1
  • filelock ==3.18.0
  • fonttools ==4.57.0
  • forbiddenfruit ==0.1.4
  • fqdn ==1.5.1
  • fsspec ==2025.3.2
  • h11 ==0.14.0
  • httpcore ==1.0.7
  • httpx ==0.28.1
  • huggingface-hub ==0.30.1
  • idna ==3.10
  • iniconfig ==2.1.0
  • ipykernel ==6.29.5
  • ipython ==9.1.0
  • ipython-pygments-lexers ==1.1.1
  • isoduration ==20.11.0
  • jedi ==0.19.2
  • jinja2 ==3.1.6
  • json5 ==0.12.0
  • jsonpatch ==1.33
  • jsonpointer ==3.0.0
  • jsonschema ==4.23.0
  • jsonschema-rs ==0.29.1
  • jsonschema-specifications ==2024.10.1
  • jupyter-client ==8.6.3
  • jupyter-core ==5.7.2
  • jupyter-events ==0.12.0
  • jupyter-lsp ==2.2.5
  • jupyter-server ==2.15.0
  • jupyter-server-terminals ==0.5.3
  • jupyterlab ==4.3.6
  • jupyterlab-pygments ==0.3.0
  • jupyterlab-server ==2.27.3
  • kiwisolver ==1.4.8
  • langchain-core ==0.3.40
  • langchain-ollama ==0.2.3
  • langgraph ==0.2.74
  • langgraph-api ==0.0.45
  • langgraph-checkpoint ==2.0.24
  • langgraph-cli ==0.1.73
  • langgraph-sdk ==0.1.61
  • langsmith ==0.3.11
  • markdown-it-py ==3.0.0
  • markupsafe ==3.0.2
  • matplotlib ==3.10.1
  • matplotlib-inline ==0.1.7
  • mdurl ==0.1.2
  • mistune ==3.1.3
  • mpmath ==1.3.0
  • mypy ==1.15.0
  • mypy-extensions ==1.0.0
  • nbclient ==0.10.2
  • nbconvert ==7.16.6
  • nbformat ==5.10.4
  • nest-asyncio ==1.6.0
  • networkx ==3.4.2
  • notebook ==7.3.3
  • notebook-shim ==0.2.4
  • numpy ==2.2.4
  • ollama ==0.4.7
  • orjson ==3.10.15
  • ormsgpack ==1.9.1
  • overrides ==7.7.0
  • packaging ==24.2
  • pandas ==2.2.3
  • pandas-stubs ==2.2.3.250308
  • pandocfilters ==1.5.1
  • parso ==0.8.4
  • pexpect ==4.9.0
  • pillow ==11.1.0
  • platformdirs ==4.3.7
  • pluggy ==1.5.0
  • prometheus-client ==0.21.1
  • prompt-toolkit ==3.0.50
  • psutil ==7.0.0
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.3
  • pycparser ==2.22
  • pydantic ==2.10.6
  • pydantic-core ==2.27.2
  • pygments ==2.19.1
  • pyjwt ==2.10.1
  • pyparsing ==3.2.3
  • pytest ==8.3.5
  • pytest-cov ==6.0.0
  • python-dateutil ==2.9.0.post0
  • python-dotenv ==1.0.1
  • python-json-logger ==3.3.0
  • pytz ==2025.2
  • pyyaml ==6.0.2
  • pyzmq ==26.4.0
  • referencing ==0.36.2
  • regex ==2024.11.6
  • requests ==2.32.3
  • requests-toolbelt ==1.0.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rich ==13.9.4
  • rpds-py ==0.24.0
  • ruff ==0.9.7
  • safetensors ==0.5.3
  • send2trash ==1.8.3
  • sentencepiece ==0.2.0
  • setuptools ==78.1.0
  • six ==1.17.0
  • sniffio ==1.3.1
  • soupsieve ==2.6
  • sse-starlette ==2.1.3
  • stack-data ==0.6.3
  • starlette ==0.46.1
  • structlog ==25.2.0
  • sympy ==1.13.1
  • tenacity ==9.0.0
  • terminado ==0.18.1
  • tinycss2 ==1.4.0
  • tokenizers ==0.21.1
  • torch ==2.6.0
  • tornado ==6.4.2
  • tqdm ==4.67.1
  • traitlets ==5.14.3
  • transformers ==4.51.0
  • types-colorama ==0.4.15.20240311
  • types-decorator ==5.2.0.20250324
  • types-defusedxml ==0.7.0.20240218
  • types-docutils ==0.21.0.20241128
  • types-jsonschema ==4.23.0.20241208
  • types-pexpect ==4.9.0.20241208
  • types-psutil ==7.0.0.20250401
  • types-pycurl ==7.45.6.20250309
  • types-pygments ==2.19.0.20250305
  • types-python-dateutil ==2.9.0.20241206
  • types-pytz ==2025.2.0.20250326
  • types-pyyaml ==6.0.12.20250326
  • types-requests ==2.32.0.20250328
  • typing-extensions ==4.12.2
  • tzdata ==2025.2
  • uri-template ==1.3.0
  • urllib3 ==2.3.0
  • uvicorn ==0.34.0
  • watchfiles ==1.0.4
  • wcwidth ==0.2.13
  • webcolors ==24.11.1
  • webencodings ==0.5.1
  • websocket-client ==1.8.0
  • zstandard ==0.23.0
.github/workflows/integration-tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
.github/workflows/unit-tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • codespell-project/actions-codespell v2 composite
pyproject.toml pypi
  • langchain-ollama >=0.2.3
  • langgraph >=0.2.6
  • python-dotenv >=1.0.1