insight-lab

https://github.com/allainerain/insight-lab

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: allainerain
License: mit
Language: JavaScript
Default Branch: main
Size: 229 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Code of conduct Citation Security Support

📊 InsightLab: Facilitating Insight Discovery using Large Language Models

This project is build on LIDA. LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface).

InsightLab aims to improve on the capabilities of LIDA by introducing new modules for insight discovery.

Original research on LIDA Details on the original components of LIDA are described in the paper here and in this tutorial notebook. See the project page here for updates!.

Note on Code Execution: To create visualizations, LIDA generates and executes code. Ensure that you run LIDA in a secure environment.

Features

1.1. Updated LIDA Library Features
1.2. InsightLab Workflows

Getting Started

2.1. With the WebApp
2.2. With the Library

Documentation and Citation
- Documentation and Citation

Features

This work has two parts: a library and a web app. The library can be accessed in the lida folder while the web app can be accessed in the lida-streamlit folder.

Updated LIDA library features

LIDA treats visualizations as code and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code.

[x] Data Summarization
[x] Data Transformation
[x] Goal Generation
[x] Visualization Generation
[x] Visualization Editing
[x] Visualization Explanation
[x] Visualization Evaluation and Repair
[x] Chart Question and Answering
[x] Insight Generation
[x] Insight Discovery Research

InsightLab Workflows

InsightLab

Getting Started with the Web App

Setup and verify that your python environment is python 3.10 or higher (preferably, use Conda).

Clone the repository

bash git clone https://github.com/allainerain/lida.git

Install the requirements

bash pip install -U requirements.txt

LIDA depends on llmx and openai. If you had these libraries installed previously, consider updating them.

bash pip install -U llmx openai

Set environment variables

Create a .env file with the following

python OPENAI_APIKEY = "sk-xxxxxxx"

Run the web app

bash cd lida-streamlit streamlit run main.py

Getting Started with the Library

The fastest and recommended way to learn about InsightLab's capabilities is through the InsightLab handbook notebook.

Library Methods

Data Summarization

Given a dataset, generate a compact summary of the data.

```python from lida import Manager

lida = Manager() summary = lida.summarize("data/cars.json") # generate data summary ```

Data Transformation

Given natural language, transform the dataset.

python new_dataset = lida.autotransform(data, summary, instructions="add a new column for profit", textgen_config=textgen_config)

Given code, transform the dataset.

python new_dataset = lida.transform(code_specs="code", data, summary)

Goal Generation

Generate a set of visualization goals given a data summary.

python goals = lida.goals(summary, n=5, persona="ceo with aerodynamics background") # generate goals

Add a persona parameter to generate goals based on that persona.

Visualization Generation

Generate, refine, execute and filter visualization code given a data summary and visualization goal. Note that LIDA represents visualizations as code.

```python

generate charts (generate and execute visualization code)

charts = lida.visualize(summary=summary, goal=goals[0], library="matplotlib") # seaborn, ggplot .. ```

Visualization Editing

Given a visualization, edit the visualization using natural language.

```python

modify chart using natural language

instructions = ["convert this to a bar chart", "change the color to red", "change y axes label to Fuel Efficiency", "translate the title to french"] editedcharts = lida.edit(code=code, summary=summary, instructions=instructions, library=library, textgenconfig=textgen_config)

```

Visualization Explanation

Given a visualization, generate a natural language explanation of the visualization code (accessibility, data transformations applied, visualization code)

```python

generate explanation for chart

explanation = lida.explain(code=charts[0].code, summary=summary) ```

Visualization Evaluation and Repair

Given a visualization, evaluate to find repair instructions (which may be human authored, or generated), repair the visualization.

python evaluations = lida.evaluate(code=code, goal=goals[i], library=library)

Prompting

Given a goal, generate prompting-probing questions to allow the user to critically analyze the visualization.

```python prompts = lida.prompt(goal=goal, textgenconfig=textgenconfig)

```

Insight Generation

Given answers to prompts, search the web for relevant references and generate suggested insights.

python insights = lida.insights(goal=goal, answers=answers, prompts=promts, textgen_config=textgen_config)

Probing

Given answers to prompts, search the web for relevant references and suggest more probing questions.

python probing_questions = lida.research(goal=goal, answers=answers, prompts=promts, textgen_config=textgen_config)

Documentation and Citation

This work is build on the work on LIDA. A short paper describing LIDA (Accepted at ACL 2023 Conference) is available here.

bibtex @inproceedings{dibia2023lida, title = "{LIDA}: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models", author = "Dibia, Victor", booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.acl-demo.11", doi = "10.18653/v1/2023.acl-demo.11", pages = "113--126", }

LIDA builds on insights in automatic generation of visualization from an earlier paper - Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks.

Owner

Name: Allaine Tan
Login: allainerain
Kind: user

Website: https://allainetan.webflow.io/
Repositories: 1
Profile: https://github.com/allainerain

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Dibia
    given-names: Victor
title: "LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models"
version: 1.0.0
date-released: 2023-07-01
url: "https://aclanthology.org/2023.acl-demo.11"
doi: "10.18653/v1/2023.acl-demo.11"
conference:
  name: "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  month: jul
  year: 2023
  address: "Toronto, Canada"
  publisher: "Association for Computational Linguistics"

GitHub Events

Total

Push event: 9
Pull request event: 2
Create event: 1

Last Year

Push event: 9
Pull request event: 2
Create event: 1

Dependencies

docker-compose.yml docker

web-ui latest

pyproject.toml pypi

altair *
fastapi *
geopandas *
kaleido >=0.2.1, !=0.2.1.post1
llmx >=0.0.21a
matplotlib *
matplotlib-venn *
networkx *
numpy *
pandas *
plotly *
plotnine *
pydantic *
python-multipart *
scipy *
seaborn *
statsmodels *
typer *
uvicorn *
wordcloud *

requirements.txt pypi

beautifulsoup4 *
faiss-cpu *
langchain *
langchain-community *
langchain-openai *
llmx *
lxml *
matplotlib *
pandas *
plotly *
python-dotenv *
seaborn *
streamlit *

insight-lab

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

📊 InsightLab: Facilitating Insight Discovery using Large Language Models

Table of Contents

Features

Updated LIDA library features

InsightLab Workflows

Getting Started with the Web App

Clone the repository

Install the requirements

Set environment variables

Run the web app

Getting Started with the Library

Library Methods

Data Summarization

Data Transformation

Goal Generation

Visualization Generation

generate charts (generate and execute visualization code)

Visualization Editing

modify chart using natural language

Visualization Explanation

generate explanation for chart

Visualization Evaluation and Repair

Prompting

Insight Generation

Probing

Documentation and Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies