scientific-data-simulator

The Scientific Data Simulator is a Python framework designed for creating and managing reproducible scientific simulations and generating synthetic data.

https://github.com/sandner-art/scientific-data-simulator

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.9%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

The Scientific Data Simulator is a Python framework designed for creating and managing reproducible scientific simulations and generating synthetic data.

Basic Info

Host: GitHub
Owner: sandner-art
Language: Python
Default Branch: main
Size: 260 KB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme Citation

Scientific Data Simulator

| Author: Daniel Sandner | Scientific Data Simulator: A Modular Concept for Reproducible Experiment Design and Synthetic Data Generation - part of the 100 Scientific Visions project (2025-2027)

Overview

The Scientific Data Simulator is a Python framework designed for creating and managing reproducible scientific simulations and generating synthetic data. It aims to address the challenges of reproducibility, extensibility, and ease of use in computational science. The framework emphasizes modularity, allowing users to easily define and extend simulation models, incorporate various data sources, and integrate with external tools. A key feature is the optional integration of Large Language Models (LLMs) to assist in experiment design, lowering the barrier to entry for users and accelerating the research process.

simulator

Key Features

Modular Design: Built on a modular architecture with clear separation of concerns, making it easy to extend and customize.
Extensibility: Abstract base classes define standard interfaces for experiment logic, LLM interaction, and data handling.
Reproducibility: Comprehensive experiment records capture all relevant information about each simulation run (parameters, versions, data provenance, system information).
LLM-Assisted Experiment Design: Optionally use LLMs to help generate experiment code, reducing development time and promoting best practices.
Design of Experiments (DOE): Built-in support for DOE principles, enabling systematic exploration of parameter spaces.
Data Management: Promotes best practices in data management, including metadata management, data provenance tracking, and recommendations for data version control.
Experiment Chaining: Create pipelines of experiments, where the output of one experiment becomes the input of the next.
Parameter Sweeps: Easily define and execute parameter sweeps to explore the behavior of simulations.
Data Preview: Interactive data visualization and summary statistics for quick inspection of results.
Jupyter Notebook Integration: Seamlessly integrate with Jupyter Notebooks for interactive exploration and analysis.
Open Source: Released under the MIT License, encouraging collaboration and community contributions.

Installation

It's highly recommended to use a virtual environment to manage the project's dependencies. This prevents conflicts with other Python projects and ensures reproducibility. Choose one of the following methods:

A. Using venv (Recommended for most users):

Create a virtual environment:

bash python3 -m venv .venv

This creates a new virtual environment in a directory named .venv (you can choose a different name if you prefer). It's a good practice to put the environment inside your project directory. The leading . makes it a hidden directory on most systems.
Activate the environment:

*   **Linux/macOS:**
    ```bash
    source .venv/bin/activate
    ```

*   **Windows (cmd.exe):**
    ```
    .venv\Scripts\activate.bat
    ```

*   **Windows (PowerShell):**
    ```
    .venv\Scripts\Activate.ps1
    ```

You should see `(.venv)` (or your environment name) at the beginning of your terminal prompt, indicating that the environment is active.

Install the dependencies:

bash pip install -r requirements.txt This installs all required packages.
Install in editable mode: bash pip install -e .

B. Using conda (If you use Anaconda or Miniconda):

Create a conda environment:

bash conda create -n scientific-data-simulator python=3.9 # Or another Python version

This creates a new conda environment named scientific-data-simulator (you can choose a different name).
Activate the environment:

bash conda activate scientific-data-simulator
Install the dependencies:

bash pip install -r requirements.txt Even within a conda environment, it is a good practice to install project specific dependencies using pip and requirements.txt
Install in editable mode: bash pip install -e .

Usage

The core concept of the Scientific Data Simulator is the separation of the simulation engine from the experiment logic.

Define your experiment logic: Create a Python class that inherits from the ExperimentLogic abstract base class (in simulator/base.py). Implement the required methods (initialize, run_step, get_results, and optionally visualize). Place this class in a file within the experiments directory (e.g., experiments/my_experiment/logic.py).
Create a configuration file: Create a YAML file (e.g., config.yaml) to define the experiment parameters, input data sources, and other settings.
Run the simulation: Use the provided example scripts in examples/ folder.

Running the Example

To run the included example simulation (a simple sine wave):

Make sure you have activated your virtual environment (see the Installation instructions above).
Navigate to the project root directory in your terminal: bash cd /path/to/scientific-data-simulator # Replace with the actual path
Run the example script:

bash python -m examples.example_1.run_experiment This will execute the example and generate an experiment_record.json file and plot files in an experiments_output subdirectory.

Project Structure

scientific_data_simulator/ ├── simulator/ # The core engine │ ├── __init__.py │ ├── base.py # Abstract base classes (ExperimentLogic, LLMClient) │ ├── engine.py # Core engine logic (execution, logging, etc.) │ ├── config.py # Configuration management │ ├── data_handler.py # Data loading and saving │ ├── visualization.py # Visualization adapters/wrappers │ ├── utils.py # Utility functions │ ├── doe.py # Design of Experiments functions │ ├── llm_client.py # LLM client abstraction │ └── experiment_record.py # ExperimentRecord class │ ├── experiments/ # Specific experiment implementations (ExperimentLogic) │ ├── __init__.py │ ├── example_experiment/ # For the reusable ExperimentLogic │ │ ├── __init__.py │ │ └── logic.py │ └── ... │ ├── tests/ │ ├── __init__.py │ ├── test_engine.py │ ├── test_example_experiment.py # Tests for the ExperimentLogic │ └── ... │ ├── examples/ # Example usage scripts/notebooks │ ├── example_1/ # NEW: Renamed for clarity │ │ ├── run_experiment.py │ │ └── config.yaml │ └── example_notebook.ipynb # Notebooks can stay at the top level │ ├── docs/ │ ├── conf.py │ ├── index.rst │ └── ... │ ├── .gitignore ├── requirements.txt ├── README.md └── CITATION.cff

Documentation

[TODO: Link to Sphinx documentation once it's built.]

From scientific-data-simulator folder run:

```bash

From the project root:

python -m examples.example1.runexperiment

Or, explicitly specifying the config file:

python -m examples.example1.runexperiment --config examples/example_1/config.yaml ```

Citation

If you use Scientific Data Simulator in your research, please cite it as follows:

Daniel Sandner. (2025). Scientific Data Simulator (Version 0.1.0) [Computer software]. https://github.com/your-username/scientificdatasimulator

You can find more detailed citation information in the CITATION.cff file.

Owner

Name: sandner.art |
Login: sandner-art
Kind: user

Website: https://www.sandner.art/
Twitter: SandnerDaniel
Repositories: 1
Profile: https://github.com/sandner-art

Research in creative opensource, 3D, AI/ML, VR/AR

Citation (CITATION.cff)

# CITATION.cff (Example for Scientific Data Simulator)
cff-version: 1.2.0
message: "If you use Scientific Data Simulator, please cite it as below."
authors:
  - family-names: "Sandner"  
    given-names: "Daniel"  
    orcid: "https://orcid.org/0000-0002-1041-814X"  # Replace with your ORCID iD (if you have one)
title: "Scientific Data Simulator: A Modular Concept for Reproducible Experiment Design and Synthetic Data Generation"
version: 0.1.0
date-released: 2025-03-16 # Replace with the actual release date
url: "https://github.com/sandner-art/scientific-data-simulator"  # Replace with your repository URL
preferred-citation:
  type: software
  authors:
    - family-names: "Sandner"
      given-names: "Daniel"
  title: "Scientific Data Simulator"
  version: "0.1.0"
  # doi: "10.5281/zenodo.XXXXXXX"  # Add this *after* you get a DOI from Zenodo
  date-released: 2025-03-16

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science