https://github.com/atomashevic/voat-simulation
Code repository for Operational Validation of Large-Language-Model Agent Social Simulation paper
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.0%) to scientific vocabulary
Repository
Code repository for Operational Validation of Large-Language-Model Agent Social Simulation paper
Basic Info
- Host: GitHub
- Owner: atomashevic
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2508.21740
- Size: 552 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ysocial-simulations
Scripts and supporting modules to (a) sample and analyze real Reddit / Voat datasets (MADOC exports), (b) analyze and visualize simulation outputs, and (c) compare structure, content, reputation, toxicity, and topic dynamics across communities and between real and simulated corpora.
The repository is scriptoriented: each file is a selfcontained step or analytical module.
1. High-Level Conceptual Workflow
Data acquisition
- Voat data from MADOC dataset
voat_technology_madoc.parquetlives inMADOC/voat-technology/directory. - Simulation YSocial database file lives under
simulation/.
- Voat data from MADOC dataset
Real community temporal sampling & descriptive analytics
- Get Voat samples:
scripts/voat_samples.py(createsMADOC/voat-technology/sample_X/)
- Get Voat samples:
Basic simulation database extraction
scripts/sim_stats.pyexportsresults/<sim_name>/posts.csv,users.csv,news.csvand figures.scripts/generate_csvs.pyper-user CSVs and hierarchical thread text exports.
Text preprocessing & corpus preparation
scripts/preprocess_simulation_data.pycleans simulated posts/comments into a modeling-ready CSV.- URL extraction:
scripts/extract_voat_urls.py
Network construction & coreperiphery inference
- Baseline:
scripts/core_periphery_network.py/scripts/core-periphery-network-real.py - Enhanced stochastic block model (SBM) ensemble:
scripts/core_periphery_enhanced*.py(+scripts/core_periphery_sbm/) - User attribute association tests:
scripts/core_periphery_user_attribute_tests.py
- Baseline:
Comparative structural analysis
- Simulation vs Voat:
scripts/comparative_network_analysis_voat.py - Generic network mapping:
scripts/map_network.py
- Simulation vs Voat:
Convergence / conversational entropy & alternating chains
- Chain extraction & metrics:
scripts/convergence_entropy_chains.py - Legacy entropy kernels:
scripts/entropy.py - AB alternating chain export:
scripts/export_chain_texts.py
- Chain extraction & metrics:
Topic modeling & semantic similarity
- Voat vs Simulation advanced comparison:
scripts/voat_topic_compare.py - Basic BERTopic pipeline:
scripts/topic_compare_basic.py - Simplified lightweight comparison:
scripts/simplified_topic_compare.py - Embedding similarity distributions:
scripts/voat_sim_embedding_similarity.py - Text cleaning helper (title/body split):
scripts/preprocess_simulation_data.py
- Voat vs Simulation advanced comparison:
Toxicity analysis & plots
- KDE & ECDF panel:
scripts/toxicity_kde_ecdf.py - Single-panel posts vs comments:
scripts/toxicity_kde_posts_vs_comments.py - Targeted toxic pair matching:
scripts/voat_toxic_match.py,scripts/sim_comments_to_voat_match.py
- KDE & ECDF panel:
Named Entity / structural Voat vs simulation diagnostics
scripts/voat_ner_structure.py
Integrated simulation pipeline runner
scripts/run_sim_pipeline.shorchestrates many of the above (visuals, network, topics, embeddings, matching).
Panel / figure utilities & miscellany
scripts/panel-figure.py,scripts/posts_per_user_kde_voat_vs_sim.py,scripts/additional_plots.py,scripts/visualize_simulation_additional.py, etc.
Archival / experimental
scripts/archive/(legacy exploratory scripts). Avoid relying on these for production runs.
2. Detailed Usage Examples
Below are more explicit runs, assuming you are in repository root.
4.1 Voat Sampling
bash
python scripts/voat_samples.py
4.2 Simulation Stats from SQLite
```bash python scripts/sim_stats.py path/to/simulation.sqlite
Produces: results//posts.csv users.csv news.csv + figures
```
4.3 Generate User & Thread CSV/TXT
bash
python scripts/generate_csvs.py results/<db_basename>/posts.csv --output simulation_extracted
4.4 Clean Simulation Text
bash
python scripts/preprocess_simulation_data.py \
--input simulation/posts.csv \
--output simulation/clean_text.csv \
--text-column tweet --min-length 25
4.5 CorePeriphery Network (Simulation CSV)
bash
python scripts/core_periphery_network.py simulation/posts.csv simulation/core_periphery
(Exact CLI arguments may depend on internal argument parsing in the scriptopen the script if unsure.)
4.6 Enhanced SBM Ensemble (Voat)
bash
python scripts/core_periphery_enhanced_voat.py MADOC/voat-technology/sample_1/voat_sample_1.parquet voat_cp_enhanced
4.7 Comparative Network (Voat vs Simulation)
```bash python scripts/comparativenetworkanalysis_voat.py
Assumes data directories exist: simulation/ and MADOC/voat-technology/
```
4.10 Convergence / Entropy
bash
python scripts/convergence_entropy_chains.py --posts-csv simulation/posts.csv \
--out-dir simulation/convergence
4.11 Topic Comparison (Voat vs Simulation)
Simplified version (when you only want a quick match):
bash
python scripts/simplified_topic_compare.py \
--corpus1 MADOC/voat-technology/sample_1/voat_sample_1.parquet \
--corpus2 simulation/clean_text.csv \
--corpus2-column full_text \
--min-topic-size 20 \
--nr-topics auto \
--output-dir topic_compare_simple
Advanced Voat vs Simulation:
bash
python scripts/voat_topic_compare.py \
--sim2-posts-csv simulation/posts.csv \
--outdir simulation/topic_compare \
--min-topic-size 5 \
--drop-header-rows \
--min-doc-chars 25
4.12 Toxicity KDE
bash
python scripts/toxicity_kde_ecdf.py --sim-dir simulation
python scripts/toxicity_kde_posts_vs_comments.py --sim-dir simulation
4.13 Toxic Matching
bash
python scripts/voat_toxic_match.py \
--sim2-posts simulation/posts.csv \
--sim2-tox simulation/toxigen.csv \
--madoc-parquet MADOC/voat-technology/sample_1/voat_sample_1.parquet \
--topn 20
4.14 Embedding Similarity (Voat vs Simulation)
bash
python scripts/voat_sim_embedding_similarity.py \
--sim2-posts simulation/posts.csv \
--sim2-tox simulation/toxigen.csv \
--mode both \
--plot-tsne --plot-umap
Owner
- Name: Aleksandar Tomašević
- Login: atomashevic
- Kind: user
- Location: Novi Sad, Serbia
- Company: University of Novi Sad
- Website: www.atomasevic.com
- Twitter: atomasevic
- Repositories: 2
- Profile: https://github.com/atomashevic
GitHub Events
Total
- Member event: 1
- Public event: 1
- Push event: 2
Last Year
- Member event: 1
- Public event: 1
- Push event: 2