insight-lab
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: allainerain
- License: mit
- Language: JavaScript
- Default Branch: main
- Size: 229 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
📊 InsightLab: Facilitating Insight Discovery using Large Language Models
This project is build on LIDA. LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface).
InsightLab aims to improve on the capabilities of LIDA by introducing new modules for insight discovery.
Original research on LIDA Details on the original components of LIDA are described in the paper here and in this tutorial notebook. See the project page here for updates!.
Note on Code Execution: To create visualizations, LIDA generates and executes code. Ensure that you run LIDA in a secure environment.
Table of Contents
- Features
- Getting Started
- Documentation and Citation
Features
This work has two parts: a library and a web app. The library can be accessed in the lida folder while the web app can be accessed in the lida-streamlit folder.
Updated LIDA library features
LIDA treats visualizations as code and provides a clean api for generating, executing, editing, explaining, evaluating and repairing visualization code.
- [x] Data Summarization
- [x] Data Transformation
- [x] Goal Generation
- [x] Visualization Generation
- [x] Visualization Editing
- [x] Visualization Explanation
- [x] Visualization Evaluation and Repair
- [x] Chart Question and Answering
- [x] Insight Generation
- [x] Insight Discovery Research
InsightLab Workflows
Getting Started with the Web App
Setup and verify that your python environment is python 3.10 or higher (preferably, use Conda).
Clone the repository
bash
git clone https://github.com/allainerain/lida.git
Install the requirements
bash
pip install -U requirements.txt
LIDA depends on llmx and openai. If you had these libraries installed previously, consider updating them.
bash
pip install -U llmx openai
Set environment variables
Create a .env file with the following
python
OPENAI_APIKEY = "sk-xxxxxxx"
Run the web app
bash
cd lida-streamlit
streamlit run main.py
Getting Started with the Library
The fastest and recommended way to learn about InsightLab's capabilities is through the InsightLab handbook notebook.
Library Methods
Data Summarization
Given a dataset, generate a compact summary of the data.
```python from lida import Manager
lida = Manager() summary = lida.summarize("data/cars.json") # generate data summary ```
Data Transformation
Given natural language, transform the dataset.
python
new_dataset = lida.autotransform(data, summary, instructions="add a new column for profit", textgen_config=textgen_config)
Given code, transform the dataset.
python
new_dataset = lida.transform(code_specs="code", data, summary)
Goal Generation
Generate a set of visualization goals given a data summary.
python
goals = lida.goals(summary, n=5, persona="ceo with aerodynamics background") # generate goals
Add a persona parameter to generate goals based on that persona.
Visualization Generation
Generate, refine, execute and filter visualization code given a data summary and visualization goal. Note that LIDA represents visualizations as code.
```python
generate charts (generate and execute visualization code)
charts = lida.visualize(summary=summary, goal=goals[0], library="matplotlib") # seaborn, ggplot .. ```
Visualization Editing
Given a visualization, edit the visualization using natural language.
```python
modify chart using natural language
instructions = ["convert this to a bar chart", "change the color to red", "change y axes label to Fuel Efficiency", "translate the title to french"] editedcharts = lida.edit(code=code, summary=summary, instructions=instructions, library=library, textgenconfig=textgen_config)
```
Visualization Explanation
Given a visualization, generate a natural language explanation of the visualization code (accessibility, data transformations applied, visualization code)
```python
generate explanation for chart
explanation = lida.explain(code=charts[0].code, summary=summary) ```
Visualization Evaluation and Repair
Given a visualization, evaluate to find repair instructions (which may be human authored, or generated), repair the visualization.
python
evaluations = lida.evaluate(code=code, goal=goals[i], library=library)
Prompting
Given a goal, generate prompting-probing questions to allow the user to critically analyze the visualization.
```python prompts = lida.prompt(goal=goal, textgenconfig=textgenconfig)
```
Insight Generation
Given answers to prompts, search the web for relevant references and generate suggested insights.
python
insights = lida.insights(goal=goal, answers=answers, prompts=promts, textgen_config=textgen_config)
Probing
Given answers to prompts, search the web for relevant references and suggest more probing questions.
python
probing_questions = lida.research(goal=goal, answers=answers, prompts=promts, textgen_config=textgen_config)
Documentation and Citation
This work is build on the work on LIDA. A short paper describing LIDA (Accepted at ACL 2023 Conference) is available here.
bibtex
@inproceedings{dibia2023lida,
title = "{LIDA}: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models",
author = "Dibia, Victor",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.acl-demo.11",
doi = "10.18653/v1/2023.acl-demo.11",
pages = "113--126",
}
LIDA builds on insights in automatic generation of visualization from an earlier paper - Data2Vis: Automatic Generation of Data Visualizations Using Sequence to Sequence Recurrent Neural Networks.
Owner
- Name: Allaine Tan
- Login: allainerain
- Kind: user
- Website: https://allainetan.webflow.io/
- Repositories: 1
- Profile: https://github.com/allainerain
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Dibia
given-names: Victor
title: "LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models"
version: 1.0.0
date-released: 2023-07-01
url: "https://aclanthology.org/2023.acl-demo.11"
doi: "10.18653/v1/2023.acl-demo.11"
conference:
name: "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
month: jul
year: 2023
address: "Toronto, Canada"
publisher: "Association for Computational Linguistics"
GitHub Events
Total
- Push event: 9
- Pull request event: 2
- Create event: 1
Last Year
- Push event: 9
- Pull request event: 2
- Create event: 1
Dependencies
- web-ui latest
- altair *
- fastapi *
- geopandas *
- kaleido >=0.2.1, !=0.2.1.post1
- llmx >=0.0.21a
- matplotlib *
- matplotlib-venn *
- networkx *
- numpy *
- pandas *
- plotly *
- plotnine *
- pydantic *
- python-multipart *
- scipy *
- seaborn *
- statsmodels *
- typer *
- uvicorn *
- wordcloud *
- beautifulsoup4 *
- faiss-cpu *
- langchain *
- langchain-community *
- langchain-openai *
- llmx *
- lxml *
- matplotlib *
- pandas *
- plotly *
- python-dotenv *
- seaborn *
- streamlit *