https://github.com/aida-ugent/llm_visualization_results
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: aida-ugent
- Language: Python
- Default Branch: main
- Size: 63.8 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Are LLMs Ready to Help Non-Expert Users Make Charts of Official Statistics Data?
This repository contains the complete experimental data and results supporting our research on LLM-based automated chart generation from official statistics data.
Research Overview
This study evaluates the capability of Large Language Models (LLMs) to democratize access to official statistics by automatically generating visualizations from complex tabular data in response to natural language queries. Working with diverse datasets from Statistics Netherlands, we conducted a comprehensive comparison across 10 different LLM approaches to assess whether current generative AI models can bridge the gap between expert data analysis and non-expert user needs.
Repository Contents
Model Comparison Results
We evaluated 10 different LLM approaches across the same Statistics Netherlands datasets:
Single-Shot Generation Models
- CLAUDE 3.5 (
claude-3-5-sonnet-20241022) - Anthropic's Claude 3.5 Sonnet - DEEPSEEK-CHAT - DeepSeek's conversational model
- GEMINI 2.0 FLASH THINKING (
gemini-2.0-flash-thinking-exp-01-21) - Google's Gemini with reasoning - GEMMA 2 - Google's open-source Gemma model
- GPT-4o (
gpt-4o-2024-11-20) - OpenAI's GPT-4o - LLAMA3.1 - Meta's Llama 3.1 model
- O1-HIGH (
o1-2024-12-17-high) - OpenAI's O1 model with high reasoning - O1-HIGH + ADDITIONAL CONTEXT - O1 with enhanced prompting context
- QWEN 2.5 - Alibaba's Qwen model
Iterative Agentic Approach
- CLAUDE 3.7 (
CLAUDE 3.7_25_iterations_more_context_data_visualization) - Our iterative self-evaluation system with up to 25 refinement iterations
Datasets from Statistics Netherlands
All models were tested on 7 official Statistics Netherlands datasets:
- Industry Production - Industrial output and manufacturing statistics
- Milk Supply - Agricultural supply chain data
- Caribbean Netherlands - Demographic birth statistics for Caribbean Netherlands
- Consumer Prices - Consumer price index and inflation data
- Producer Price Index (PPI) - Manufacturing and wholesale price indices
- Municipal Accounts - Local government financial statistics
- Population - Demographic and population census data
Task Complexity Levels
- Easy - Direct visualization of single data series
- Medium - Multi-step data processing with moderate complexity
- Hard - Complex analytical tasks requiring sophisticated data manipulation
Understanding the Results Structure
Single-Shot Model Results
Each single-shot model directory contains:
- Dataset subdirectories organized by topic (e.g., CARIBBEAN NETHERLANDS/, CONSUMER PRICES/)
- Visualization outputs (.png files) named with model identifier and task difficulty
- Multiple attempts for some tasks (indicated by _2.py.png suffixes)
Iterative Agentic Results (Claude 3.7)
The CLAUDE 3.7_25_iterations_more_context_data_visualization/ directory contains detailed traces of the iterative process:
Directory naming: YYYYMMDD_HHMMSS_[dataset]_[difficulty]
Contents per experiment:
- .log - Complete reasoning process and decision-making trace
- code_iteration_X.py - Evolution of generated Python code across iterations
- visualization.png - Final chart output
- Auxiliary files - Dataset-specific helper files when needed
This approach demonstrates our iterative self-evaluation system: 1. Initial Attempt - LLM generates first solution 2. Self-Assessment - Model evaluates its own output 3. Iterative Refinement - Up to 25 iterations of improvement 4. Convergence - Process stops when satisfactory or maximum iterations reached
Core Evaluation Documents
final_evaluation_results.pdf- Complete quantitative and qualitative analysis across all modelssummary_results_by_difficulty.pdf- Performance trends across complexity levels for all approachesdataset_retrieval_results.pdf- Analysis of data retrieval capabilities across modelsfull_evaluation_questions.pdf- All evaluation prompts and assessment criteriafull_list_of_prompts.pdf- Complete prompt engineering details
Owner
- Name: Ghent University Artificial Intelligence & Data Analytics Group
- Login: aida-ugent
- Kind: organization
- Email: tijl.debie@ugent.be
- Location: Ghent
- Website: aida.ugent.be
- Repositories: 36
- Profile: https://github.com/aida-ugent
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Issues and Pull Requests
Last synced: 10 months ago