llama_index

LlamaIndex is the leading framework for building LLM-powered agents over your data.

https://github.com/run-llama/llama_index

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    23 of 1498 committers (1.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.1%) to scientific vocabulary

Keywords

agents application data fine-tuning framework llamaindex llm multi-agents rag vector-database

Keywords from Contributors

jax autopep8 codeformatter pre-commit-hook yapf cryptocurrencies formatter python39 python313 python312

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation

Repository

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Basic Info
  • Host: GitHub
  • Owner: run-llama
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://docs.llamaindex.ai
  • Size: 315 MB
Statistics
  • Stars: 43,943
  • Watchers: 263
  • Forks: 6,323
  • Open Issues: 277
  • Releases: 469
Topics
agents application data fine-tuning framework llamaindex llm multi-agents rag vector-database
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Security

README.md

LlamaIndex

PyPI - Downloads Build GitHub contributors Discord Twitter Reddit Ask AI

LlamaIndex (GPT Index) is a data framework for your LLM application. Building with LlamaIndex typically involves working with LlamaIndex core and a chosen set of integrations (or plugins). There are two ways to start building with LlamaIndex in Python:

  1. Starter: llama-index. A starter Python package that includes core LlamaIndex as well as a selection of integrations.

  2. Customized: llama-index-core. Install core LlamaIndex and add your chosen LlamaIndex integration packages on LlamaHub that are required for your application. There are over 300 LlamaIndex integration packages that work seamlessly with core, allowing you to build with your preferred LLM, embedding, and vector store providers.

The LlamaIndex Python library is namespaced such that import statements which include core imply that the core package is being used. In contrast, those statements without core imply that an integration package is being used.

```python

typical pattern

from llamaindex.core.xxx import ClassABC # core submodule xxx from llamaindex.xxx.yyy import ( SubclassABC, ) # integration yyy for submodule xxx

concrete example

from llamaindex.core.llms import LLM from llamaindex.llms.openai import OpenAI ```

Important Links

LlamaIndex.TS (Typescript/Javascript)

Documentation

X (formerly Twitter)

LinkedIn

Reddit

Discord

Ecosystem

Overview

NOTE: This README is not updated as frequently as the documentation. Please check out the documentation above for the latest updates!

Context

  • LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
  • How do we best augment LLMs with our own private data?

We need a comprehensive toolkit to help perform this data augmentation for LLMs.

Proposed Solution

That's where LlamaIndex comes in. LlamaIndex is a "data framework" to help you build LLM apps. It provides the following tools:

  • Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.).
  • Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
  • Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
  • Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, or anything else).

LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit their needs.

Contributing

Interested in contributing? Contributions to LlamaIndex core as well as contributing integrations that build on the core are both accepted and highly encouraged! See our Contribution Guide for more details.

New integrations should meaningfully integrate with existing LlamaIndex framework components. At the discretion of LlamaIndex maintainers, some integrations may be declined.

Documentation

Full documentation can be found here

Please check it out for the most up-to-date tutorials, how-to guides, references, and other resources!

Example Usage

```sh

custom selection of integrations to work with core

pip install llama-index-core pip install llama-index-llms-openai pip install llama-index-llms-replicate pip install llama-index-embeddings-huggingface ```

Examples are in the docs/examples folder. Indices are in the indices folder (see list of indices below).

To build a simple vector store index using OpenAI:

```python import os

os.environ["OPENAIAPIKEY"] = "YOUROPENAIAPI_KEY"

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("YOURDATADIRECTORY").loaddata() index = VectorStoreIndex.fromdocuments(documents) ```

To build a simple vector store index using non-OpenAI LLMs, e.g. Llama 2 hosted on Replicate, where you can easily create a free trial API token:

```python import os

os.environ["REPLICATEAPITOKEN"] = "YOURREPLICATEAPI_TOKEN"

from llamaindex.core import Settings, VectorStoreIndex, SimpleDirectoryReader from llamaindex.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.replicate import Replicate from transformers import AutoTokenizer

set the LLM

llama27bchat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e" Settings.llm = Replicate( model=llama27bchat, temperature=0.01, additionalkwargs={"topp": 1, "maxnewtokens": 300}, )

set tokenizer to match LLM

Settings.tokenizer = AutoTokenizer.from_pretrained( "NousResearch/Llama-2-7b-chat-hf" )

set the embed model

Settings.embedmodel = HuggingFaceEmbedding( modelname="BAAI/bge-small-en-v1.5" )

documents = SimpleDirectoryReader("YOURDATADIRECTORY").loaddata() index = VectorStoreIndex.fromdocuments( documents, ) ```

To query:

python query_engine = index.as_query_engine() query_engine.query("YOUR_QUESTION")

By default, data is stored in-memory. To persist to disk (under ./storage):

python index.storage_context.persist()

To reload from disk:

```python from llamaindex.core import StorageContext, loadindexfromstorage

rebuild storage context

storagecontext = StorageContext.fromdefaults(persist_dir="./storage")

load index

index = loadindexfromstorage(storagecontext) ```

Dependencies

We use poetry as the package manager for all Python packages. As a result, the dependencies of each Python package can be found by referencing the pyproject.toml file in each of the package's folders.

bash cd <desired-package-folder> pip install poetry poetry install --with dev

Citation

Reference to cite if you use LlamaIndex in a paper:

@software{Liu_LlamaIndex_2022, author = {Liu, Jerry}, doi = {10.5281/zenodo.1234}, month = {11}, title = {{LlamaIndex}}, url = {https://github.com/jerryjliu/llama_index}, year = {2022} }

Owner

  • Name: LlamaIndex
  • Login: run-llama
  • Kind: organization

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 6,390
  • Total Committers: 1,498
  • Avg Commits per committer: 4.266
  • Development Distribution Score (DDS): 0.843
Past Year
  • Commits: 2,270
  • Committers: 719
  • Avg Commits per committer: 3.157
  • Development Distribution Score (DDS): 0.789
Top Committers
Name Email Commits
Logan l****h@l****m 1,001
Jerry Liu j****8@g****m 919
dependabot[bot] 4****] 404
Simon Suo s****o@g****m 210
Andrei Fajardo 9****i 169
Haotian Zhang s****g@g****m 135
Massimiliano Pippi m****i@g****m 125
Ravi Theja r****1@g****m 104
Laurie Voss g****b@s****m 72
James Braza j****a@g****m 69
Sourabh Desai s****i@g****m 51
Emanuel Ferreira c****s@g****m 39
Javier Torres j****s@g****m 30
Matthew Farrellee m****t@c****u 29
Tomaz Bratanic b****z@g****m 25
Nick Fiacco n****o@g****m 24
Ethan Yang e****g@i****m 22
Ofer Mendelevitch o****d@g****m 22
Huu Le (Lee) 3****j 20
Guodong s****g@1****m 20
yisding y****g@g****m 19
hongyishi s****8@g****m 19
Roger Yang 8****g 19
Jael Gu m****u@z****m 18
jon-chuang 9****g 17
Shorthills AI 1****I 16
Rendy Febry r****y@e****m 16
Anoop Sharma a****7@g****m 16
Aaron Jimenez a****v@g****m 16
Jordan Parker j****6@g****m 15
and 1,468 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4,387
  • Total pull requests: 6,265
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 6 days
  • Total issue authors: 2,470
  • Total pull request authors: 1,353
  • Average comments per issue: 2.54
  • Average comments per pull request: 0.73
  • Merged pull requests: 4,924
  • Bot issues: 4
  • Bot pull requests: 410
Past Year
  • Issues: 1,618
  • Pull requests: 3,251
  • Average time to close issues: 22 days
  • Average time to close pull requests: 2 days
  • Issue authors: 998
  • Pull request authors: 634
  • Average comments per issue: 1.72
  • Average comments per pull request: 0.71
  • Merged pull requests: 2,581
  • Bot issues: 0
  • Bot pull requests: 38
Top Authors
Issue Authors
  • nerdai (43)
  • logan-markewich (43)
  • justinzyw (34)
  • brycecf (34)
  • mirallm (30)
  • gich2009 (29)
  • Prem-Nitin (28)
  • strawgate (28)
  • JINO-ROHIT (27)
  • DataNoob0723 (22)
  • 912100012 (21)
  • mw19930312 (20)
  • LikhithRishi (19)
  • tituslhy (19)
  • RakeshReddyKondeti (17)
Pull Request Authors
  • logan-markewich (1,103)
  • dependabot[bot] (410)
  • masci (237)
  • jerryjliu (222)
  • nerdai (175)
  • AstraBert (97)
  • seldo (76)
  • ravi03071991 (72)
  • hatianzhang (58)
  • sourabhdesai (53)
  • EmanuelCampos (53)
  • nightosong (36)
  • mattf (32)
  • Javtor (32)
  • tomasonjo (31)
Top Labels
Issue Labels
triage (2,427) bug (1,832) question (1,615) enhancement (633) stale (427) documentation (70) P1 (54) P0 (39) v0.10.X (33) docs (24) P2 (22) lgtm (21) p2 (18) size:XS (14) size:S (10) request contribution board (9) topic:workflows (8) size:M (4) size:L (3) vector store (3) dependencies (3) good first issue (3) size:XXL (3) azure (2) discord (2) LlamaParse (2) contributions wanted (1) topic:ollama (1) topic:vector stores (1) index (1)
Pull Request Labels
lgtm (3,374) size:XS (1,892) size:L (1,011) size:S (932) size:M (880) dependencies (410) size:XL (378) size:XXL (238) bug (39) triage (37) python (35) question (33) topic:workflows (17) docs (14) enhancement (13) documentation (9) P1 (8) github_actions (7) P0 (6) stale (3) P2 (2) topic:vector stores (1) toipic:storage (1) package:llama-index-readers-confluence (1) package:llama-index-embeddings-nvidia (1) package:llama-index-vector-stores-weaviate (1) package:llama-index-embeddings-vertex (1) package:llama-index-llms-nvidia (1) package:llama-index-postprocessor-dashscope-rerank (1) topic:CI (1)

Packages

  • Total packages: 12
  • Total downloads:
    • pypi 7,084,371 last-month
  • Total docker downloads: 637,410
  • Total dependent packages: 742
    (may contain duplicates)
  • Total dependent repositories: 1,464
    (may contain duplicates)
  • Total versions: 1,087
  • Total maintainers: 4
  • Total advisories: 15
pypi.org: llama-index

Interface between LLMs and your data

  • Versions: 468
  • Dependent Packages: 153
  • Dependent Repositories: 1,464
  • Downloads: 2,538,631 Last month
  • Docker Downloads: 637,410
Rankings
Stargazers count: 0.1%
Forks count: 0.2%
Dependent packages count: 0.2%
Dependent repos count: 0.3%
Downloads: 0.3%
Average: 0.7%
Docker downloads count: 2.9%
Maintainers (2)
Last synced: about 1 year ago
proxy.golang.org: github.com/run-llama/llama_index
  • Versions: 350
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.4%
Average: 10.0%
Dependent repos count: 10.6%
Last synced: 6 months ago
pypi.org: llama-index-retrievers-superlinked

LlamaIndex retriever integration for Superlinked

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 141 Last month
Rankings
Stargazers count: 0.2%
Forks count: 0.2%
Dependent packages count: 8.7%
Average: 14.5%
Dependent repos count: 48.9%
Maintainers (1)
Last synced: 6 months ago
pypi.org: lindex-patch

Interface between LLMs and your data

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 42 Last month
Rankings
Dependent packages count: 9.1%
Average: 30.2%
Dependent repos count: 51.2%
Maintainers (1)
Last synced: 6 months ago
pypi.org: llama-index-retrievers-vectorize

llama-index retrievers Vectorize.io integration

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 75 Last month
Rankings
Dependent packages count: 9.1%
Average: 30.3%
Dependent repos count: 51.5%
Maintainers (1)
Last synced: 6 months ago
pypi.org: llama-index-test-starter

Interface between LLMs and your data

  • Versions: 14
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 28 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.6%
Dependent repos count: 65.3%
Maintainers (1)
Last synced: 6 months ago
pypi.org: llama-index-bundle

Interface between LLMs and your data

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 17 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.7%
Dependent repos count: 65.4%
Maintainers (1)
Last synced: 6 months ago
pypi.org: flying-delta

Interface between LLMs and your data

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 18 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.7%
Dependent repos count: 65.4%
Maintainers (1)
Last synced: 6 months ago
pypi.org: llama-index-legacy

Interface between LLMs and your data

  • Versions: 9
  • Dependent Packages: 10
  • Dependent Repositories: 0
  • Downloads: 339,812 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.7%
Dependent repos count: 65.4%
Maintainers (1)
Last synced: 6 months ago
pypi.org: flying-delta-legacy

Interface between LLMs and your data

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 19 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.7%
Dependent repos count: 65.5%
Maintainers (1)
Last synced: 6 months ago
pypi.org: flying-delta-core

Interface between LLMs and your data

  • Versions: 6
  • Dependent Packages: 61
  • Dependent Repositories: 0
  • Downloads: 20 Last month
Rankings
Dependent packages count: 10.0%
Average: 38.0%
Dependent repos count: 66.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/build_package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/autobuild v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/dev_docs.yml actions
  • actions/checkout v2 composite
  • cpina/github-action-push-to-another-repository main composite
.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/publish_release.yml actions
  • actions/checkout v2 composite
  • actions/create-release v1 composite
  • actions/setup-python v2 composite
  • actions/upload-release-asset v1 composite
  • pypa/gh-action-pypi-publish master composite
.github/workflows/publish_release_gpt_index.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish master composite
.github/workflows/unit_test.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
data_requirements.txt pypi
  • boto3 *
  • discord.py *
  • google-api-python-client *
  • google-auth-httplib2 *
  • google-auth-oauthlib *
  • jsonpath-ng *
  • moto *
  • pymongo *
  • slack_sdk *
  • vellum-ai ==0.0.15
  • wikipedia *
docs/requirements.txt pypi
  • autodoc_pydantic *
  • docutils <0.17
  • furo >=2023.3.27
  • m2r2 *
  • myst-nb *
  • myst-parser *
  • pydantic <2.0.0
  • sphinx >=4.3.0
  • sphinx-autobuild *
  • sphinx_rtd_theme *
pyproject.toml pypi
requirements.txt pypi
  • black ==23.7.0
  • ipython ==8.10.0
  • mypy ==0.991
  • pre-commit ==3.2.0
  • pylint ==2.15.10
  • pytest ==7.2.1
  • pytest-asyncio ==0.21.0
  • pytest-dotenv ==0.5.2
  • rake_nltk ==1.0.6
  • ruff ==0.0.285
  • types-redis ==4.5.5.0
  • types-requests ==2.28.11.8
  • types-setuptools ==67.1.0.0
setup.py pypi
  • beautifulsoup4 *
  • dataclasses_json *
  • fsspec >=2023.5.0
  • langchain >=0.0.293
  • nest_asyncio *
  • nltk *
  • numpy *
  • openai >=0.26.4
  • pandas *
  • sqlalchemy >=2.0.15
  • tenacity >=8.2.0,<9.0.0
  • tiktoken *
  • typing-inspect >=0.8.0
  • typing_extensions >=4.5.0
  • urllib3 <2