putior

Register In- and Outputs for Workflow Visualization.

https://github.com/pjt222/putior

Last synced: 10 months ago · JSON representation

Repository

Register In- and Outputs for Workflow Visualization.

Basic Info

Host: GitHub
Owner: pjt222
License: other
Language: R
Default Branch: main
Size: 3.93 MB

Statistics

Stars: 4
Watchers: 1
Forks: 0
Open Issues: 3
Releases: 0

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License

putior

Extract beautiful workflow diagrams from your code annotations

putior (PUT + Input + Output + R) is an R package that extracts structured annotations from source code files and creates beautiful Mermaid flowchart diagrams. Perfect for documenting data pipelines, workflows, and understanding complex codebases.

🌟 Key Features

Simple annotations - Add structured comments to your existing code
Beautiful diagrams - Generate professional Mermaid flowcharts
File flow tracking - Automatically connects scripts based on input/output files
Multiple themes - 5 built-in themes including GitHub-optimized
Cross-language support - Works with R, Python, SQL, shell scripts, and Julia
Flexible output - Console, file, or clipboard export
Customizable styling - Control colors, direction, and node shapes

📦 Installation

```r

Install from CRAN (recommended)

install.packages("putior")

Or install from GitHub (development version)

remotes::install_github("pjt222/putior")

Or with renv

renv::install("putior") # CRAN version renv::install("pjt222/putior") # GitHub version

Or with pak (faster)

pak::pkginstall("putior") # CRAN version pak::pkginstall("pjt222/putior") # GitHub version ```

🚀 Quick Start

Step 1: Annotate Your Code

Add structured annotations to your R or Python scripts using #put comments:

01_fetch_data.R ```r

put label:"Fetch Sales Data", nodetype:"input", output:"salesdata.csv"

Your actual code

library(readr) salesdata <- fetchsalesfromapi() writecsv(salesdata, "sales_data.csv") ```

02_clean_data.py ```python

put label:"Clean and Process", nodetype:"process", input:"salesdata.csv", output:"clean_sales.csv"

import pandas as pd df = pd.readcsv("salesdata.csv")

... data cleaning code ...

df.tocsv("cleansales.csv") ```

Step 2: Extract and Visualize

```r library(putior)

Extract workflow from your scripts

workflow <- put("./scripts/")

Generate diagram

put_diagram(workflow) ```

Result: ```mermaid flowchart TD fetchsales([Fetch Sales Data]) cleandata[Clean and Process]

%% Connections
fetch_sales --> clean_data

%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class fetch_sales inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class clean_data processStyle

```

📈 Common Data Science Pattern

Modular Workflow with source()

The most common data science pattern: modularize functions into separate scripts and orchestrate them in a main workflow:

utils.R - Utility functions ```r

put label:"Data Utilities", node_type:"input"

loadandclean <- function(file) { data <- read.csv(file) data[complete.cases(data), ] }

validate_data <- function(data) { stopifnot(nrow(data) > 0) return(data) } ```

analysis.R - Analysis functions ```r

put label:"Statistical Analysis", input:"utils.R"

performanalysis <- function(data) { # Uses utility functions from utils.R cleaned <- validatedata(data) summary(cleaned) } ```

main.R - Workflow orchestrator ```r

put label:"Main Analysis Pipeline", input:"utils.R,analysis.R", output:"results.csv"

source("utils.R") # Load utility functions source("analysis.R") # Load analysis functions

Execute the pipeline

data <- loadandclean("rawdata.csv") results <- performanalysis(data) write.csv(results, "results.csv") ```

Generated Workflow (Simple): ```mermaid flowchart TD utils([Data Utilities]) analysis[Statistical Analysis] main[Main Analysis Pipeline]

%% Connections
utils --> analysis
utils --> main
analysis --> main

%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class utils inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class analysis processStyle
class main processStyle

```

Generated Workflow (With Data Artifacts): ```r

Show complete data flow including all files

putdiagram(workflow, showartifacts = TRUE) ```

```mermaid flowchart TD utils([Data Utilities]) analysis[Statistical Analysis] main[Main Analysis Pipeline] artifactresultscsv[(results.csv)]

%% Connections
utils --> analysis
utils --> main
analysis --> main
main --> artifact_results_csv

%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class utils inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class analysis processStyle
class main processStyle
classDef artifactStyle fill:#f3f4f6,stroke:#6b7280,stroke-width:1px,color:#374151
class artifact_results_csv artifactStyle

```

This pattern clearly shows: - Function modules (utils.R, analysis.R) are sourced into the main script - Dependencies between modules (analysis depends on utils)
- Complete data flow with artifacts showing terminal outputs like results.csv - Two visualization modes: simple (script connections only) vs. complete (with data artifacts)

📊 Visualization Examples

Basic Workflow

```r

Simple three-step process

workflow <- put("./datapipeline/") putdiagram(workflow) ```

Advanced Data Science Pipeline

Here's how putior handles a complete data science workflow:

File Structure: data_pipeline/ ├── 01_fetch_sales.R # Fetch sales data ├── 02_fetch_customers.R # Fetch customer data ├── 03_clean_sales.py # Clean sales data ├── 04_merge_data.R # Merge datasets ├── 05_analyze.py # Statistical analysis └── 06_report.R # Generate final report

Generated Workflow: ```mermaid flowchart TD fetchsales([Fetch Sales Data]) fetchcustomers([Fetch Customer Data]) cleansales[Clean Sales Data] mergedata[Merge Datasets] analyze[Statistical Analysis] report[[Generate Final Report]]

%% Connections
fetch_sales --> clean_sales
fetch_customers --> merge_data
clean_sales --> merge_data
merge_data --> analyze
analyze --> report

%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class fetch_sales inputStyle
class fetch_customers inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class clean_sales processStyle
class merge_data processStyle
class analyze processStyle
classDef outputStyle fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#15803d
class report outputStyle

```

📋 Using the Diagrams

Embedding in Documentation

The generated Mermaid code works perfectly in:

GitHub README files (native Mermaid support)
GitLab documentation
Notion pages
Obsidian notes
Jupyter notebooks (with extensions)
Sphinx documentation (with plugins)
Any Markdown renderer with Mermaid support

Saving and Sharing

```r

Save to markdown file

put_diagram(workflow, output = "file", file = "workflow.md")

Copy to clipboard for pasting

put_diagram(workflow, output = "clipboard")

Include title for documentation

put_diagram(workflow, output = "file", file = "docs/pipeline.md", title = "Data Processing Pipeline") ```

🔧 Visualization Modes

putior offers two visualization modes to suit different needs:

Workflow Boundaries Demo

First, let's see how workflow boundaries enhance pipeline visualization:

Pipeline with Boundaries (Default): ```r

Complete ETL pipeline with clear start/end boundaries

putdiagram(workflow, showworkflow_boundaries = TRUE) ```

```mermaid flowchart TD pipelinestart([Data Pipeline Start]) extractdata[Extract Raw Data] transformdata[Transform Data] pipelineend([Pipeline Complete])

%% Connections
pipeline_start --> extract_data
extract_data --> transform_data
transform_data --> pipeline_end

%% Styling
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class extract_data processStyle
class transform_data processStyle
classDef startStyle fill:#fef3c7,stroke:#d97706,stroke-width:3px,color:#92400e
class pipeline_start startStyle
classDef endStyle fill:#dcfce7,stroke:#16a34a,stroke-width:3px,color:#15803d
class pipeline_end endStyle

```

Same Pipeline without Boundaries: ```r

Clean diagram without workflow control styling

putdiagram(workflow, showworkflow_boundaries = FALSE) ```

```mermaid flowchart TD pipelinestart([Data Pipeline Start]) extractdata[Extract Raw Data] transformdata[Transform Data] pipelineend([Pipeline Complete])

%% Connections
pipeline_start --> extract_data
extract_data --> transform_data
transform_data --> pipeline_end

%% Styling
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class extract_data processStyle
class transform_data processStyle

```

Simple Mode (Default)

Shows only script-to-script connections - perfect for understanding code dependencies: r put_diagram(workflow) # Default: simple mode

Use when: - Documenting code architecture - Showing function dependencies - Clean, simple workflow diagrams

Artifact Mode (Complete Data Flow)

Shows all data files as nodes - provides complete picture of data flow including terminal outputs: r put_diagram(workflow, show_artifacts = TRUE)

Use when: - Documenting data pipelines - Tracking data lineage - Showing complete input/output flow - Understanding data dependencies

Comparison Example

Simple Mode: mermaid flowchart TD load[Load Data] --> process[Process Data] process --> analyze[Analyze]

Artifact Mode: ```mermaid flowchart TD load[Load Data] rawdata[(rawdata.csv)] process[Process Data] cleandata[(cleandata.csv)] analyze[Analyze] results[(results.json)]

load --> raw_data
raw_data --> process
process --> clean_data
clean_data --> analyze
analyze --> results

```

Key Differences

| Mode | Shows | Best For | |------|-------|----------| | Simple | Script connections only | Code architecture, dependencies | | Artifact | Scripts + data files | Data pipelines, complete data flow |

File Labeling

Add file names to connections for extra clarity: ```r

Show file names on arrows

putdiagram(workflow, showartifacts = TRUE, show_files = TRUE) ```

🎨 Theme System

putior provides 5 carefully designed themes optimized for different environments:

```r

Get list of available themes

getdiagramthemes() ```

Theme Overview

| Theme | Best For | Description | |-------|----------|-------------| | light | Documentation sites, tutorials | Default light theme with bright colors | | dark | Dark mode apps, terminals | Dark theme with muted colors | | auto | GitHub README files | GitHub-adaptive theme that works in both modes | | minimal | Business reports, presentations | Grayscale professional theme | | github | GitHub README (recommended) | Optimized for maximum GitHub compatibility |

Theme Examples

Light Theme r put_diagram(workflow, theme = "light") ```mermaid flowchart TD fetchdata([Fetch API Data]) cleandata[Clean and Validate] generate_report[[Generate Final Report]]

%% Connections
fetch_data --> clean_data
clean_data --> generate_report

%% Styling
classDef inputStyle fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000000
class fetch_data inputStyle
classDef processStyle fill:#f3e5f5,stroke:#4a148c,stroke-width:2px,color:#000000
class clean_data processStyle
classDef outputStyle fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px,color:#000000
class generate_report outputStyle

```

Dark Theme r put_diagram(workflow, theme = "dark") ```mermaid flowchart TD fetchdata([Fetch API Data]) cleandata[Clean and Validate] generate_report[[Generate Final Report]]

%% Connections
fetch_data --> clean_data
clean_data --> generate_report

%% Styling
classDef inputStyle fill:#1a237e,stroke:#3f51b5,stroke-width:2px,color:#ffffff
class fetch_data inputStyle
classDef processStyle fill:#4a148c,stroke:#9c27b0,stroke-width:2px,color:#ffffff
class clean_data processStyle
classDef outputStyle fill:#1b5e20,stroke:#4caf50,stroke-width:2px,color:#ffffff
class generate_report outputStyle

```

Auto Theme (GitHub Adaptive) r put_diagram(workflow, theme = "auto") # Recommended for GitHub! ```mermaid flowchart TD fetchdata([Fetch API Data]) cleandata[Clean and Validate] generate_report[[Generate Final Report]]

%% Connections
fetch_data --> clean_data
clean_data --> generate_report

%% Styling
classDef inputStyle fill:#3b82f6,stroke:#1d4ed8,stroke-width:2px,color:#ffffff
class fetch_data inputStyle
classDef processStyle fill:#8b5cf6,stroke:#6d28d9,stroke-width:2px,color:#ffffff
class clean_data processStyle
classDef outputStyle fill:#10b981,stroke:#047857,stroke-width:2px,color:#ffffff
class generate_report outputStyle

```

GitHub Theme (Maximum Compatibility) r put_diagram(workflow, theme = "github") # Best for GitHub README ```mermaid flowchart TD fetchdata([Fetch API Data]) cleandata[Clean and Validate] generate_report[[Generate Final Report]]

%% Connections
fetch_data --> clean_data
clean_data --> generate_report

%% Styling
classDef inputStyle fill:#dbeafe,stroke:#2563eb,stroke-width:2px,color:#1e40af
class fetch_data inputStyle
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class clean_data processStyle
classDef outputStyle fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#15803d
class generate_report outputStyle

```

Minimal Theme r put_diagram(workflow, theme = "minimal") # Professional documents ```mermaid flowchart TD fetchdata([Fetch API Data]) cleandata[Clean and Validate] generate_report[[Generate Final Report]]

%% Connections
fetch_data --> clean_data
clean_data --> generate_report

%% Styling
classDef inputStyle fill:#f8fafc,stroke:#64748b,stroke-width:1px,color:#1e293b
class fetch_data inputStyle
classDef processStyle fill:#f1f5f9,stroke:#64748b,stroke-width:1px,color:#1e293b
class clean_data processStyle
classDef outputStyle fill:#f8fafc,stroke:#64748b,stroke-width:1px,color:#1e293b
class generate_report outputStyle

```

When to Use Each Theme

| Theme | Use Case | Environment | |-------|----------|-------------| | light | Documentation sites, tutorials | Light backgrounds | | dark | Dark mode apps, terminals | Dark backgrounds | | auto | GitHub README files | Adapts automatically | | github | GitHub README (recommended) | Maximum compatibility | | minimal | Business reports, presentations | Print-friendly |

Pro Tips

For GitHub: Use theme = "github" for maximum compatibility, or theme = "auto" for adaptive colors
For Documentation: Use theme = "light" or theme = "dark" to match your site
For Reports: Use theme = "minimal" for professional, print-friendly diagrams
For Demos: Light theme usually shows colors best in presentations

Theme Usage Examples

```r

For GitHub README (recommended)

put_diagram(workflow, theme = "github")

For GitHub README (adaptive)

put_diagram(workflow, theme = "auto")

For dark documentation sites

put_diagram(workflow, theme = "dark", direction = "LR")

For professional reports

put_diagram(workflow, theme = "minimal", output = "file", file = "report.md")

Save all themes for comparison

themes <- c("light", "dark", "auto", "github", "minimal") for(theme in themes) { putdiagram(workflow, theme = theme, output = "file", file = paste0("workflow", theme, ".md"), title = paste("Workflow -", stringr::strtotitle(theme), "Theme")) } ```

🔧 Customization Options

Flow Direction

r put_diagram(workflow, direction = "TD") # Top to bottom (default) put_diagram(workflow, direction = "LR") # Left to right put_diagram(workflow, direction = "BT") # Bottom to top put_diagram(workflow, direction = "RL") # Right to left

Node Labels

r put_diagram(workflow, node_labels = "name") # Show node IDs put_diagram(workflow, node_labels = "label") # Show descriptions (default) put_diagram(workflow, node_labels = "both") # Show name: description

File Connections

```r

Show file names on arrows

putdiagram(workflow, showfiles = TRUE)

Clean arrows without file names

putdiagram(workflow, showfiles = FALSE) ```

Styling Control

```r

Include colored styling (default)

putdiagram(workflow, stylenodes = TRUE)

Plain diagram without colors

putdiagram(workflow, stylenodes = FALSE)

Control workflow boundary styling

putdiagram(workflow, showworkflowboundaries = TRUE) # Special start/end styling (default) putdiagram(workflow, showworkflowboundaries = FALSE) # Regular node styling ```

Workflow Boundaries

```r

Enable workflow boundaries (default) - start/end get special styling

putdiagram(workflow, showworkflow_boundaries = TRUE)

Disable workflow boundaries - start/end render as regular nodes

putdiagram(workflow, showworkflow_boundaries = FALSE) ```

Output Options

```r

Console output (default)

put_diagram(workflow)

Save to markdown file

putdiagram(workflow, output = "file", file = "myworkflow.md")

Copy to clipboard for pasting

put_diagram(workflow, output = "clipboard") ```

📝 Annotation Reference

Basic Syntax

All PUT annotations follow this format: ```r

put property1:"value1", property2:"value2", property3:"value3"

```

Alternative Formats (All Valid)

```r

put id:"node_id", label:"Description" # Standard

put id:"node_id", label:"Description" # Space after

put| id:"node_id", label:"Description" # Pipe separator

put: id:"node_id", label:"Description" # Colon separator

```

Annotations

| Annotation | Description | Example | Required | |------------|-------------|---------|----------| | id | Unique identifier for the node (auto-generated if omitted) | "fetch_data", "clean_sales" | Optional* | | label | Human-readable description | "Fetch Sales Data", "Clean and Process" | Recommended |

*Note: If id is omitted, a UUID will be automatically generated. If you provide an empty id (e.g., id:""), you'll get a validation warning.

Optional Annotations

| Annotation | Description | Example | Default | |------------|-------------|---------|---------| | node_type | Visual shape of the node | "input", "process", "output", "decision", "start", "end" | "process" | | input | Input files (comma-separated) | "raw_data.csv, config.json" | None | | output | Output files (comma-separated) | "processed_data.csv, summary.txt" | Current file name* |

*Note: If output is omitted, it defaults to the name of the file containing the annotation. This ensures nodes can be connected in workflows.

Node Types and Shapes

putior uses a data-centric approach with workflow boundaries as special control elements:

Data Processing Nodes: - "input" - Data sources, APIs, file readers → Stadium shape ([text]) - "process" - Data transformation, analysis → Rectangle [text]
- "output" - Final results, reports, exports → Subroutine [[text]] - "decision" - Conditional logic, branching → Diamond {text}

Workflow Control Nodes: - "start" - Workflow entry point → Stadium shape with orange styling - "end" - Workflow termination → Stadium shape with green styling

Workflow Boundaries

Control the visualization of workflow start/end points with show_workflow_boundaries:

```r

Special workflow boundary styling (default)

putdiagram(workflow, showworkflow_boundaries = TRUE)

Regular nodes without special workflow styling

putdiagram(workflow, showworkflow_boundaries = FALSE) ```

With boundaries enabled (default): - node_type:"start" gets distinctive orange styling with thicker borders - node_type:"end" gets distinctive green styling with thicker borders

With boundaries disabled: - Start/end nodes render as regular stadium shapes without special colors

Example Annotations

R Scripts: ```r

put id:"loadsalesdata", label:"Load Sales Data from API", nodetype:"input", output:"rawsales.csv, metadata.json"

put id:"validatedata", label:"Validate and Clean Data", nodetype:"process", input:"rawsales.csv", output:"cleansales.csv"

put id:"generatereport", label:"Generate Executive Summary", nodetype:"output", input:"cleansales.csv, metadata.json", output:"executivesummary.pdf"

```

Python Scripts: ```python

put id:"collectdata", label:"Collect Raw Data", nodetype:"input", output:"raw_data.csv"

put id:"trainmodel", label:"Train ML Model", nodetype:"process", input:"features.csv", output:"model.pkl"

put id:"predict", label:"Generate Predictions", nodetype:"output", input:"model.pkl, testdata.csv", output:"predictions.csv"

```

Multiple Annotations Per File: ```r

analysis.R

put id:"createsummary", label:"Calculate Summary Stats", nodetype:"process", input:"processeddata.csv", output:"summarystats.json"

put id:"createreport", label:"Generate Sales Report", nodetype:"output", input:"processeddata.csv", output:"salesreport.html"

Your R code here...

```

Workflow Entry and Exit Points: ```r

main_workflow.R

put id:"workflowstart", label:"Start Analysis Pipeline", nodetype:"start", output:"config.json"

put id:"workflowend", label:"Pipeline Complete", nodetype:"end", input:"final_report.pdf"

```

Workflow Boundary Examples: ```r

Complete pipeline with boundaries

put id:"pipelinestart", label:"Data Pipeline Start", nodetype:"start", output:"raw_config.json"

put id:"extractdata", label:"Extract Raw Data", nodetype:"process", input:"rawconfig.json", output:"rawdata.csv"

put id:"transformdata", label:"Transform Data", nodetype:"process", input:"rawdata.csv", output:"cleandata.csv"

put id:"pipelineend", label:"Pipeline Complete", nodetype:"end", input:"clean_data.csv"

```

Generated Workflow with Boundaries: ```mermaid flowchart TD pipelinestart([Data Pipeline Start]) extractdata[Extract Raw Data] transformdata[Transform Data] pipelineend([Pipeline Complete])

pipeline_start --> extract_data
extract_data --> transform_data
transform_data --> pipeline_end

classDef startStyle fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px,color:#1b5e20
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
classDef endStyle fill:#ffebee,stroke:#c62828,stroke-width:3px,color:#b71c1c
class pipeline_start startStyle
class extract_data,transform_data processStyle
class pipeline_end endStyle

```

Supported File Types

putior automatically detects and processes these file types: - R: .R, .r - Python: .py - SQL: .sql - Shell: .sh - Julia: .jl

🛠️ Advanced Usage

Directory Scanning

```r

Scan current directory

workflow <- put(".")

Scan specific directory

workflow <- put("./src/")

Recursive scanning (include subdirectories)

workflow <- put("./project/", recursive = TRUE)

Custom file patterns

workflow <- put("./analysis/", pattern = "\.(R|py)$")

Single file

workflow <- put("./script.R") ```

Debugging and Validation

```r

Include line numbers for debugging

workflow <- put("./src/", includelinenumbers = TRUE)

Disable validation warnings

workflow <- put("./src/", validate = FALSE)

Test annotation syntax

isvalidputannotation('#put id:"test", label:"Test Node"') # TRUE isvalidputannotation("#put invalid syntax") # FALSE ```

UUID Auto-Generation

When you omit the id field, putior automatically generates a unique UUID:

```r

Annotations without explicit IDs

put label:"Load Data", node_type:"input", output:"data.csv"

put label:"Process Data", node_type:"process", input:"data.csv"

Extract workflow - IDs will be auto-generated

workflow <- put("./") print(workflow$id)

[1] "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

[2] "b2c3d4e5-f6a7-8901-bcde-f23456789012"

```

This feature is perfect for: - Quick prototyping without worrying about unique IDs - Temporary workflows where IDs don't matter - Ensuring uniqueness across large codebases

Note: If you provide an empty id (e.g., id:""), you'll get a validation warning.

Tracking Source Relationships

When you have a main script that sources other scripts, annotate them to show the sourcing relationships:

```r

main.R - sources other scripts

put label:"Main Workflow", input:"utils.R,analysis.R", output:"results.csv"

source("utils.R") # Reading utils.R into main.R source("analysis.R") # Reading analysis.R into main.R

utils.R - sourced by main.R

put label:"Utility Functions", node_type:"input"

output defaults to "utils.R"

analysis.R - sourced by main.R, depends on utils.R

put label:"Analysis Functions", input:"utils.R"

output defaults to "analysis.R"

```

This creates a diagram showing: - utils.R → main.R (sourced into) - analysis.R → main.R (sourced into) - utils.R → analysis.R (dependency)

🔄 Self-Documentation: putior Documents Itself!

As a demonstration of putior's capabilities, we've added PUT annotations to putior's own source code. This creates a beautiful visualization of how the package works internally:

```r

Extract putior's own workflow

workflow <- put("./R/") put_diagram(workflow, theme = "github", title = "putior Package Internals") ```

Result:

```mermaid

title: putior Package Internals

flowchart TD putentry([Entry Point - Scan Files]) processfile[Process Single File] parser[Parse Annotation Syntax] convertdf[Convert to Data Frame] diagramgen[Generate Mermaid Diagram] nodedefs[Create Node Definitions] connections[Generate Node Connections] outputhandler([Output Final Diagram])

%% Connections
put_entry --> process_file
process_file --> parser
parser --> convert_df
convert_df --> diagram_gen
diagram_gen --> node_defs
node_defs --> connections
connections --> output_handler

%% Styling
classDef processStyle fill:#ede9fe,stroke:#7c3aed,stroke-width:2px,color:#5b21b6
class process_file processStyle
class parser processStyle
class convert_df processStyle
class diagram_gen processStyle
class node_defs processStyle
class connections processStyle
classDef startStyle fill:#fef3c7,stroke:#d97706,stroke-width:3px,color:#92400e
class put_entry startStyle
classDef endStyle fill:#dcfce7,stroke:#16a34a,stroke-width:3px,color:#15803d
class output_handler endStyle

```

This self-documentation shows the two main phases of putior: 1. Parsing Phase: Scanning files → extracting annotations → converting to workflow data 2. Diagram Generation Phase: Taking workflow data → creating nodes/connections → outputting diagram

To see the complete data flow with intermediate files, run: r put_diagram(workflow, show_artifacts = TRUE, theme = "github")

🤝 Contributing

Contributions welcome! Please open an issue or pull request on GitHub.

Development Setup: ```bash git clone https://github.com/pjt222/putior.git cd putior

Install dev dependencies

Rscript -e "devtools::installdevdeps()"

Run tests

Rscript -e "devtools::test()"

Check package

Rscript -e "devtools::check()" ```

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📊 How putior Compares to Other R Packages

putior fills a unique niche in the R ecosystem by combining annotation-based workflow extraction with beautiful diagram generation:

| Package | Focus | Approach | Output | Best For | |---------|-------|----------|--------|----------| | putior | Data workflow visualization | Code annotations | Mermaid diagrams | Pipeline documentation | | CodeDepends | Code dependency analysis | Static analysis | Variable graphs | Understanding code structure | | DiagrammeR | General diagramming | Manual diagram code | Interactive graphs | Custom diagrams | | visNetwork | Interactive networks | Manual network definition | Interactive vis.js | Complex network exploration | | dm | Database relationships | Schema analysis | ER diagrams | Database documentation | | flowchart | Study flow diagrams | Dataframe input | ggplot2 charts | Clinical trials |

Key Advantages of putior

📝 Annotation-Based: Workflow documentation lives in your code comments
🔄 Multi-Language: Works across R, Python, SQL, Shell, and Julia
📁 File Flow Tracking: Automatically connects scripts based on input/output files
🎨 Beautiful Output: GitHub-ready Mermaid diagrams with multiple themes
📦 Lightweight: Minimal dependencies (only requires tools package)
🔍 Two Views: Simple script connections + complete data artifact flow

🙏 Acknowledgments

Built with Mermaid for beautiful diagram generation
Inspired by the need for better code documentation and workflow visualization
Thanks to the R community for excellent development tooling

👥 Contributors

Philipp Thoss (@pjt222) - Primary author and maintainer
Claude (Anthropic) - Co-author on 38 commits, contributing to package development, documentation, and testing

Note: While GitHub's contributor graph only displays primary commit authors, Claude's contributions are properly attributed through Co-Authored-By tags in the commit messages. To see all contributions, use: git log --grep="Co-Authored-By: Claude"

🌟 Shoutout to Related R Packages

putior stands on the shoulders of giants in the R visualization and workflow ecosystem:

CodeDepends by Duncan Temple Lang - pioneering work in R code dependency analysis
targets by William Michael Landau - powerful pipeline toolkit for reproducible computation
DiagrammeR by Richard Iannone - bringing beautiful graph visualization to R
ggraph by Thomas Lin Pedersen - grammar of graphics for networks and trees
visNetwork by Almende B.V. - interactive network visualization excellence
networkD3 by Christopher Gandrud - D3.js network graphs in R
dm by energie360° AG - relational data model visualization
flowchart by Adrian Antico - participant flow diagrams
igraph by Gábor Csárdi & Tamás Nepusz - the foundation of network analysis in R

Each of these packages excels in their domain, and putior complements them by focusing specifically on code workflow documentation through annotations.

Made with ❤️ for polyglot data science workflows across R, Python, Julia, SQL, Shell, and beyond

Owner

Name: Philipp Thoss
Login: pjt222
Kind: user

Repositories: 6
Profile: https://github.com/pjt222

Data Scientist, Chemist, Maille Artisan

GitHub Events

Total

Issues event: 3
Watch event: 4
Delete event: 2
Issue comment event: 7
Push event: 44
Create event: 2

Last Year

Issues event: 3
Watch event: 4
Delete event: 2
Issue comment event: 7
Push event: 44
Create event: 2

Packages

Total packages: 1
Total downloads:
- cran 163 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: putior

"Register In- and Outputs for Workflow Visualization"

Homepage: https://pjt222.github.io/putior/
Documentation: http://cran.r-project.org/web/packages/putior/putior.pdf
License: MIT + file LICENSE
Latest release: 0.1.0
published about 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 163 Last month

Rankings

Dependent packages count: 26.2%

Dependent repos count: 32.3%

Average: 48.3%

Downloads: 86.4%

Maintainers (1)

ph.thoss@gmx.de

Last synced: 11 months ago

putior

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

putior

🌟 Key Features

📦 Installation

Install from CRAN (recommended)

Or install from GitHub (development version)

Or with renv

Or with pak (faster)

🚀 Quick Start

Step 1: Annotate Your Code

put label:"Fetch Sales Data", nodetype:"input", output:"salesdata.csv"

Your actual code

put label:"Clean and Process", nodetype:"process", input:"salesdata.csv", output:"clean_sales.csv"

... data cleaning code ...

Step 2: Extract and Visualize

Extract workflow from your scripts

Generate diagram

📈 Common Data Science Pattern

Modular Workflow with source()

put label:"Data Utilities", node_type:"input"

put label:"Statistical Analysis", input:"utils.R"

put label:"Main Analysis Pipeline", input:"utils.R,analysis.R", output:"results.csv"

Execute the pipeline

Show complete data flow including all files

📊 Visualization Examples

Basic Workflow

Simple three-step process

Advanced Data Science Pipeline

📋 Using the Diagrams

Embedding in Documentation

Saving and Sharing

Save to markdown file

Copy to clipboard for pasting

Include title for documentation

🔧 Visualization Modes

Workflow Boundaries Demo

Complete ETL pipeline with clear start/end boundaries

Clean diagram without workflow control styling

Simple Mode (Default)

Artifact Mode (Complete Data Flow)

Comparison Example

Key Differences

File Labeling

Show file names on arrows

🎨 Theme System

Get list of available themes

Theme Overview

Theme Examples

When to Use Each Theme

Pro Tips

Theme Usage Examples

For GitHub README (recommended)

For GitHub README (adaptive)

For dark documentation sites

For professional reports

Save all themes for comparison

🔧 Customization Options

Flow Direction

Node Labels

File Connections

Show file names on arrows

Clean arrows without file names

Styling Control

Include colored styling (default)

Plain diagram without colors

Control workflow boundary styling

Workflow Boundaries

Enable workflow boundaries (default) - start/end get special styling

Disable workflow boundaries - start/end render as regular nodes

Output Options

Console output (default)

Save to markdown file

Copy to clipboard for pasting

📝 Annotation Reference