Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: allainerain
  • License: mit
  • Language: JavaScript
  • Default Branch: main
  • Size: 229 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License Code of conduct Citation Security Support

README.md

📊 InsightLab: Facilitating Insight Discovery using Large Language Models

This project is build on LIDA. LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface).

InsightLab aims to improve on the capabilities of LIDA by introducing new modules for insight discovery.

Original research on LIDA Details on the original components of LIDA are described in the paper here and in this tutorial notebook. See the project page here for updates!.

Note on Code Execution: To create visualizations, LIDA generates and executes code. Ensure that you run LIDA in a secure environment.

Table of Contents

  1. Features
  1. Getting Started
  1. Documentation and Citation

Features

This work has two parts: a library and a web app. The library can be accessed in the lida folder while the web app can be accessed in the lida-streamlit folder.

Updated LIDA library features

LIDA treats visualizations as code and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code.

  • [x] Data Summarization
  • [x] Data Transformation
  • [x] Goal Generation
  • [x] Visualization Generation
  • [x] Visualization Editing
  • [x] Visualization Explanation
  • [x] Visualization Evaluation and Repair
  • [x] Chart Question and Answering
  • [x] Insight Generation
  • [x] Insight Discovery Research

InsightLab Workflows

InsightLab

Getting Started with the Web App

Setup and verify that your python environment is python 3.10 or higher (preferably, use Conda).

Clone the repository

bash git clone https://github.com/allainerain/lida.git

Install the requirements

bash pip install -U requirements.txt

LIDA depends on llmx and openai. If you had these libraries installed previously, consider updating them.

bash pip install -U llmx openai

Set environment variables

Create a .env file with the following

python OPENAI_APIKEY = "sk-xxxxxxx"

Run the web app

bash cd lida-streamlit streamlit run main.py

Getting Started with the Library

The fastest and recommended way to learn about InsightLab's capabilities is through the InsightLab handbook notebook.

Library Methods

Data Summarization

Given a dataset, generate a compact summary of the data.

```python from lida import Manager

lida = Manager() summary = lida.summarize("data/cars.json") # generate data summary ```

Data Transformation

Given natural language, transform the dataset.

python new_dataset = lida.autotransform(data, summary, instructions="add a new column for profit", textgen_config=textgen_config)

Given code, transform the dataset.

python new_dataset = lida.transform(code_specs="code", data, summary)

Goal Generation

Generate a set of visualization goals given a data summary.

python goals = lida.goals(summary, n=5, persona="ceo with aerodynamics background") # generate goals

Add a persona parameter to generate goals based on that persona.

Visualization Generation

Generate, refine, execute and filter visualization code given a data summary and visualization goal. Note that LIDA represents visualizations as code.

```python

generate charts (generate and execute visualization code)

charts = lida.visualize(summary=summary, goal=goals[0], library="matplotlib") # seaborn, ggplot .. ```

Visualization Editing

Given a visualization, edit the visualization using natural language.

```python

modify chart using natural language

instructions = ["convert this to a bar chart", "change the color to red", "change y axes label to Fuel Efficiency", "translate the title to french"] editedcharts = lida.edit(code=code, summary=summary, instructions=instructions, library=library, textgenconfig=textgen_config)

```

Visualization Explanation

Given a visualization, generate a natural language explanation of the visualization code (accessibility, data transformations applied, visualization code)

```python

generate explanation for chart

explanation = lida.explain(code=charts[0].code, summary=summary) ```

Visualization Evaluation and Repair

Given a visualization, evaluate to find repair instructions (which may be human authored, or generated), repair the visualization.

python evaluations = lida.evaluate(code=code, goal=goals[i], library=library)

Prompting

Given a goal, generate prompting-probing questions to allow the user to critically analyze the visualization.

```python prompts = lida.prompt(goal=goal, textgenconfig=textgenconfig)

```

Insight Generation

Given answers to prompts, search the web for relevant references and generate suggested insights.

python insights = lida.insights(goal=goal, answers=answers, prompts=promts, textgen_config=textgen_config)

Probing

Given answers to prompts, search the web for relevant references and suggest more probing questions.

python probing_questions = lida.research(goal=goal, answers=answers, prompts=promts, textgen_config=textgen_config)

Documentation and Citation

This work is build on the work on LIDA. A short paper describing LIDA (Accepted at ACL 2023 Conference) is available here.

bibtex @inproceedings{dibia2023lida, title = "{LIDA}: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models", author = "Dibia, Victor", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-demo.11", doi = "10.18653/v1/2023.acl-demo.11", pages = "113--126", }

LIDA builds on insights in automatic generation of visualization from an earlier paper - Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks.

Owner

  • Name: Allaine Tan
  • Login: allainerain
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Dibia
    given-names: Victor
title: "LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models"
version: 1.0.0
date-released: 2023-07-01
url: "https://aclanthology.org/2023.acl-demo.11"
doi: "10.18653/v1/2023.acl-demo.11"
conference:
  name: "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  month: jul
  year: 2023
  address: "Toronto, Canada"
  publisher: "Association for Computational Linguistics"

GitHub Events

Total
  • Push event: 9
  • Pull request event: 2
  • Create event: 1
Last Year
  • Push event: 9
  • Pull request event: 2
  • Create event: 1

Dependencies

docker-compose.yml docker
  • web-ui latest
pyproject.toml pypi
  • altair *
  • fastapi *
  • geopandas *
  • kaleido >=0.2.1, !=0.2.1.post1
  • llmx >=0.0.21a
  • matplotlib *
  • matplotlib-venn *
  • networkx *
  • numpy *
  • pandas *
  • plotly *
  • plotnine *
  • pydantic *
  • python-multipart *
  • scipy *
  • seaborn *
  • statsmodels *
  • typer *
  • uvicorn *
  • wordcloud *
requirements.txt pypi
  • beautifulsoup4 *
  • faiss-cpu *
  • langchain *
  • langchain-community *
  • langchain-openai *
  • llmx *
  • lxml *
  • matplotlib *
  • pandas *
  • plotly *
  • python-dotenv *
  • seaborn *
  • streamlit *