https://github.com/aida-ugent/llm_visualization_results

https://github.com/aida-ugent/llm_visualization_results

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: aida-ugent
  • Language: Python
  • Default Branch: main
  • Size: 63.8 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme

README.md

Are LLMs Ready to Help Non-Expert Users Make Charts of Official Statistics Data?

This repository contains the complete experimental data and results supporting our research on LLM-based automated chart generation from official statistics data.

Research Overview

This study evaluates the capability of Large Language Models (LLMs) to democratize access to official statistics by automatically generating visualizations from complex tabular data in response to natural language queries. Working with diverse datasets from Statistics Netherlands, we conducted a comprehensive comparison across 10 different LLM approaches to assess whether current generative AI models can bridge the gap between expert data analysis and non-expert user needs.

Repository Contents

Model Comparison Results

We evaluated 10 different LLM approaches across the same Statistics Netherlands datasets:

Single-Shot Generation Models

  • CLAUDE 3.5 (claude-3-5-sonnet-20241022) - Anthropic's Claude 3.5 Sonnet
  • DEEPSEEK-CHAT - DeepSeek's conversational model
  • GEMINI 2.0 FLASH THINKING (gemini-2.0-flash-thinking-exp-01-21) - Google's Gemini with reasoning
  • GEMMA 2 - Google's open-source Gemma model
  • GPT-4o (gpt-4o-2024-11-20) - OpenAI's GPT-4o
  • LLAMA3.1 - Meta's Llama 3.1 model
  • O1-HIGH (o1-2024-12-17-high) - OpenAI's O1 model with high reasoning
  • O1-HIGH + ADDITIONAL CONTEXT - O1 with enhanced prompting context
  • QWEN 2.5 - Alibaba's Qwen model

Iterative Agentic Approach

  • CLAUDE 3.7 (CLAUDE 3.7_25_iterations_more_context_data_visualization) - Our iterative self-evaluation system with up to 25 refinement iterations

Datasets from Statistics Netherlands

All models were tested on 7 official Statistics Netherlands datasets:

  • Industry Production - Industrial output and manufacturing statistics
  • Milk Supply - Agricultural supply chain data
  • Caribbean Netherlands - Demographic birth statistics for Caribbean Netherlands
  • Consumer Prices - Consumer price index and inflation data
  • Producer Price Index (PPI) - Manufacturing and wholesale price indices
  • Municipal Accounts - Local government financial statistics
  • Population - Demographic and population census data

Task Complexity Levels

  • Easy - Direct visualization of single data series
  • Medium - Multi-step data processing with moderate complexity
  • Hard - Complex analytical tasks requiring sophisticated data manipulation

Understanding the Results Structure

Single-Shot Model Results

Each single-shot model directory contains: - Dataset subdirectories organized by topic (e.g., CARIBBEAN NETHERLANDS/, CONSUMER PRICES/) - Visualization outputs (.png files) named with model identifier and task difficulty - Multiple attempts for some tasks (indicated by _2.py.png suffixes)

Iterative Agentic Results (Claude 3.7)

The CLAUDE 3.7_25_iterations_more_context_data_visualization/ directory contains detailed traces of the iterative process:

Directory naming: YYYYMMDD_HHMMSS_[dataset]_[difficulty]

Contents per experiment: - .log - Complete reasoning process and decision-making trace - code_iteration_X.py - Evolution of generated Python code across iterations - visualization.png - Final chart output - Auxiliary files - Dataset-specific helper files when needed

This approach demonstrates our iterative self-evaluation system: 1. Initial Attempt - LLM generates first solution 2. Self-Assessment - Model evaluates its own output 3. Iterative Refinement - Up to 25 iterations of improvement 4. Convergence - Process stops when satisfactory or maximum iterations reached

Core Evaluation Documents

  • final_evaluation_results.pdf - Complete quantitative and qualitative analysis across all models
  • summary_results_by_difficulty.pdf - Performance trends across complexity levels for all approaches
  • dataset_retrieval_results.pdf - Analysis of data retrieval capabilities across models
  • full_evaluation_questions.pdf - All evaluation prompts and assessment criteria
  • full_list_of_prompts.pdf - Complete prompt engineering details

Owner

  • Name: Ghent University Artificial Intelligence & Data Analytics Group
  • Login: aida-ugent
  • Kind: organization
  • Email: tijl.debie@ugent.be
  • Location: Ghent

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Issues and Pull Requests

Last synced: 10 months ago