Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: alexisneuhaus
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 4.76 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License Citation

README.md

🗂️ ️GPT Index

GPT Index is a project consisting of a set of data structures designed to make it easier to use large external knowledge bases with LLMs.

PyPi: https://pypi.org/project/gpt-index/.

Documentation: https://gpt-index.readthedocs.io/en/latest/.

Twitter: https://twitter.com/gpt_index.

Discord: https://discord.gg/dGcwcsnxhU.

LlamaHub (community library of data loaders): https://llamahub.ai

🚀 Overview

NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

Context

  • LLMs are a phenomenonal piece of technology for knowledge generation and reasoning.
  • A big limitation of LLMs is context size (e.g. Davinci's limit is 4096 tokens. Large, but not infinite).
  • The ability to feed "knowledge" to LLMs is restricted to this limited prompt size and model weights.

Proposed Solution

At its core, GPT Index contains a toolkit of index data structures designed to easily connect LLM's with your external data. GPT Index helps to provide the following advantages: - Remove concerns over prompt size limitations. - Abstract common usage patterns to reduce boilerplate code in your LLM app. - Provide data connectors to your common data sources (Google Docs, Slack, etc.). - Provide cost transparency + tools that reduce cost while increasing performance.

Each data structure offers distinct use cases and a variety of customizable parameters. These indices can then be queried in a general purpose manner, in order to achieve any task that you would typically achieve with an LLM: - Question-Answering - Summarization - Text Generation (Stories, TODO's, emails, etc.) - and more!

💡 Contributing

Interesting in contributing? See our Contribution Guide for more details.

📄 Documentation

Full documentation can be found here: https://gpt-index.readthedocs.io/en/latest/.

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

💻 Example Usage

pip install gpt-index

Examples are in the examples folder. Indices are in the indices folder (see list of indices below).

To build a simple vector store index: ```python import os os.environ["OPENAIAPIKEY"] = 'YOUROPENAIAPI_KEY'

from gptindex import GPTSimpleVectorIndex, SimpleDirectoryReader documents = SimpleDirectoryReader('data').loaddata() index = GPTSimpleVectorIndex(documents) ```

To save to and load from disk: ```python

save to disk

index.savetodisk('index.json')

load from disk

index = GPTSimpleVectorIndex.loadfromdisk('index.json') ```

To query: python index.query("<question_text>?")

🔧 Dependencies

The main third-party package requirements are tiktoken, openai, and langchain.

All requirements should be contained within the setup.py file. To run the package locally without building the wheel, simply run pip install -r requirements.txt.

📖 Citation

Reference to cite if you use GPT Index in a paper:

@software{Liu_GPT_Index_2022, author = {Liu, Jerry}, doi = {10.5281/zenodo.1234}, month = {11}, title = {{GPT Index}}, url = {https://github.com/jerryjliu/gpt_index},year = {2022} }

Owner

  • Login: alexisneuhaus
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Liu"
  given-names: "Jerry"
  orcid: "https://orcid.org/0000-0002-6694-3517"
title: "GPT Index"
doi: 10.5281/zenodo.1234
date-released: 2022-11-1
url: "https://github.com/jerryjliu/gpt_index"

GitHub Events

Total
Last Year

Dependencies

data_requirements.txt pypi
  • discord.py *
  • google-api-python-client *
  • google-auth-httplib2 *
  • google-auth-oauthlib *
  • pymongo *
  • slack_sdk *
  • wikipedia *
docs/requirements.txt pypi
  • docutils <0.17
  • myst-parser *
  • sphinx >=4.3.0
  • sphinx_rtd_theme >=0.5.1
examples/paul_graham_essay/data/.modules/file-pandas_csv_requirements.txt pypi
  • pandas *
gpt_index.egg-info/requires.txt pypi
  • dataclasses_json *
  • langchain *
  • nltk *
  • numpy *
  • openai >=0.26.4
  • pandas *
  • tenacity <8.2.0
  • tiktoken *
  • transformers *
pyproject.toml pypi
requirements.txt pypi
  • black ==22.12.0
  • flake8 ==6.0.0
  • flake8-docstrings ==1.6.0
  • ipython ==8.10.0
  • isort ==5.11.4
  • mypy ==0.991
  • pylint ==2.15.10
  • pytest ==7.2.1
  • pytest-dotenv ==0.5.2
  • rake_nltk ==1.0.6
  • types-requests ==2.28.11.8
  • types-setuptools ==67.1.0.0
setup.py pypi
  • dataclasses_json *
  • langchain *
  • nltk *
  • numpy *
  • openai >=0.26.4
  • pandas *
  • tenacity <8.2.0
  • transformers *