simulated-selfhood-llms

Code and data for the behavioral evaluation of introspective coherence in LLMs.

https://github.com/josealprestes/simulated-selfhood-llms

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Code and data for the behavioral evaluation of introspective coherence in LLMs.

Basic Info

Host: GitHub
Owner: josealprestes
License: mit
Language: Python
Default Branch: main
Size: 269 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

README.md

LLM Self-Reference Analysis

This repository contains the code, data, and figures for the paper "Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence (Preprint Version)".

The study evaluates introspective simulation across five open-weight Large Language Models (LLMs), using repeated prompts and a three-stage evaluation pipeline (textual, semantic, and inferential).

Project Structure

src/ — All Python scripts used in the experiment.
outputs/ — JSON and CSV files with model responses.
results/ — Final figures and computed analysis.
models/ — Empty folder with README for model download instructions.
requirements.txt — Python dependencies.
LICENSE — Open license for reuse.

Reproducing the Experiment

Clone this repo: bash git clone https://github.com/SEU_USUARIO/llm-self-reference-analysis.git cd llm-self-reference-analysis
(Optional) Create and activate a virtual environment: bash python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows
Install dependencies: bash pip install -r requirements.txt
Download the models (see models/README.md).
Run the experiment: bash python src/main.py

Preprint

The associated preprint will be available soon. Once published, the link (DOI) will be added here.

License

Distributed under the MIT License. See LICENSE for details.

Owner

Login: josealprestes
Kind: user

Repositories: 1
Profile: https://github.com/josealprestes

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please cite as below."
title: "Simulated Selfhood in LLMs: A Behavioral Analysis of Introspective Coherence"
authors:
  - family-names: "de Lima Prestes"
    given-names: "José Augusto"
    orcid: "https://orcid.org/0000-0001-8686-5360"
date-released: "2025-07-26"
version: "v2-preprint"
repository-code: "https://github.com/josealprestes/simulated-selfhood-llms"
url: "https://github.com/josealprestes/simulated-selfhood-llms"
license: "MIT"
type: "software"
abstract: "Large Language Models (LLMs) increasingly generate outputs that resemble introspection,
including self-reference, epistemic modulation, and claims about their internal
states. This study investigates whether such behaviors reflect consistent, underlying patterns
or are merely surface-level generative artifacts.We evaluated five open-weight, stateless LLMs
using a structured battery of 21 introspective prompts, each repeated ten times to yield 1,050
completions. These outputs were analyzed across four behavioral dimensions: surface-level
similarity (token overlap via SequenceMatcher), semantic coherence (Sentence-BERT embeddings),
inferential consistency (Natural Language Inference with a RoBERTa-large model),
and diachronic continuity (stability across prompt repetitions). Although some models exhibited
thematic stability, particularly on prompts concerning identity and consciousness, no
model sustained a consistent self-representation over time. High contradiction rates emerged
from a tension between mechanistic disclaimers and anthropomorphic phrasing. Following
recent behavioral frameworks, we heuristically adopt the term pseudo-consciousness to describe
structured yet non-experiential self-referential output in LLMs. This usage reflects
a functionalist stance that avoids ontological commitments, focusing instead on behavioral
regularities interpretable through Dennetts intentional stance. The study contributes a reproducible
framework for evaluating simulated introspection in LLMs and offers a graded
taxonomy for classifying such reflexive output. Our findings carry significant implications
for LLM interpretability, alignment, and user perception, highlighting the need for caution
when attributing mental states to stateless generative systems based on linguistic fluency
alone."

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science