gpt_index
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Shingii
- License: mit
- Language: Python
- Default Branch: main
- Size: 8.38 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 23
- Releases: 0
Metadata Files
README.md
🗂️ LlamaIndex 🦙 (GPT Index)
⚠️ NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out this transition gradually.
2/25/2023: By default, our docs/notebooks/instructions now reference "LlamaIndex" instead of "GPT Index".
2/19/2023: By default, our docs/notebooks/instructions now use the
llama-indexpackage. However thegpt-indexpackage still exists as a duplicate!2/16/2023: We have a duplicate
llama-indexpip package. Simply replace all imports ofgpt_indexwithllama_indexif you choose topip install llama-index.
LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.
PyPi: - LlamaIndex: https://pypi.org/project/llama-index/. - GPT Index (duplicate): https://pypi.org/project/gpt-index/.
Documentation: https://gpt-index.readthedocs.io/en/latest/.
Twitter: https://twitter.com/gpt_index.
Discord: https://discord.gg/dGcwcsnxhU.
LlamaHub (community library of data loaders): https://llamahub.ai
🚀 Overview
NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!
Context
- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
- How do we best augment LLMs with our own private data?
- One paradigm that has emerged is in-context learning (the other is finetuning), where we insert context into the input prompt. That way, we take advantage of the LLM's reasoning capabilities to generate a response.
To perform LLM's data augmentation in a performant, efficient, and cheap manner, we need to solve two components: - Data Ingestion - Data Indexing
Proposed Solution
That's where the LlamaIndex comes in. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion:
- Offers data connectors to your existing data sources and data formats (API's, PDF's, docs, SQL, etc.)
- Provides indices over your unstructured and structured data for use with LLM's.
These indices help to abstract away common boilerplate and pain points for in-context learning:
- Storing context in an easy-to-access format for prompt insertion.
- Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.
- Dealing with text splitting.
- Provides users an interface to query the index (feed in an input prompt) and obtain a knowledge-augmented output.
- Offers you a comprehensive toolset trading off cost and performance.
💡 Contributing
Interesting in contributing? See our Contribution Guide for more details.
📄 Documentation
Full documentation can be found here: https://gpt-index.readthedocs.io/en/latest/.
Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!
💻 Example Usage
pip install llama-index
Examples are in the examples folder. Indices are in the indices folder (see list of indices below).
To build a simple vector store index: ```python import os os.environ["OPENAIAPIKEY"] = 'YOUROPENAIAPI_KEY'
from llamaindex import GPTSimpleVectorIndex, SimpleDirectoryReader documents = SimpleDirectoryReader('data').loaddata() index = GPTSimpleVectorIndex(documents) ```
To save to and load from disk: ```python
save to disk
index.savetodisk('index.json')
load from disk
index = GPTSimpleVectorIndex.loadfromdisk('index.json') ```
To query:
python
index.query("<question_text>?")
🔧 Dependencies
The main third-party package requirements are tiktoken, openai, and langchain.
All requirements should be contained within the setup.py file. To run the package locally without building the wheel, simply run pip install -r requirements.txt.
📖 Citation
Reference to cite if you use LlamaIndex in a paper:
@software{Liu_LlamaIndex_2022,
author = {Liu, Jerry},
doi = {10.5281/zenodo.1234},
month = {11},
title = {{LlamaIndex}},
url = {https://github.com/jerryjliu/gpt_index},
year = {2022}
}
Owner
- Name: Shingaii
- Login: Shingii
- Kind: user
- Repositories: 2
- Profile: https://github.com/Shingii
Musician - Sound Designer - Curious about python and how to use it
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Liu" given-names: "Jerry" orcid: "https://orcid.org/0000-0002-6694-3517" title: "LlamaIndex" doi: 10.5281/zenodo.1234 date-released: 2022-11-1 url: "https://github.com/jerryjliu/gpt_index"
GitHub Events
Total
- Push event: 4
- Pull request event: 6
- Create event: 4
Last Year
- Push event: 4
- Pull request event: 6
- Create event: 4