https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System

https://github.com/epsilla-cloud/vectordb

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.2%) to scientific vocabulary

Keywords

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Keywords from Contributors

anthropic gemini langchain
Last synced: 6 months ago · JSON representation

Repository

Epsilla is a high performance Vector Database Management System

Basic Info
  • Host: GitHub
  • Owner: epsilla-cloud
  • License: gpl-3.0
  • Language: C++
  • Default Branch: main
  • Homepage: https://www.epsilla.com
  • Size: 1.09 MB
Statistics
  • Stars: 860
  • Watchers: 6
  • Forks: 40
  • Open Issues: 13
  • Releases: 15
Topics
ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

Epsilla Logo **A 10x faster, cheaper, and better vector database** DocumentationDiscordTwitterBlogYouTubeFeedback


Epsilla is an open-source vector database. Our focus is on ensuring scalability, high performance, and cost-effectiveness of vector search. EpsillaDB bridges the gap between information retrieval and memory retention in Large Language Models.

Quick Start using Docker

1. Run Backend in Docker shell docker pull epsilla/vectordb docker run --pull=always -d -p 8888:8888 -v /data:/data epsilla/vectordb

2. Interact with Python Client shell pip install pyepsilla

```python from pyepsilla import vectordb

client = vectordb.Client(host='localhost', port='8888') client.loaddb(dbname="MyDB", dbpath="/data/epsilla") client.usedb(db_name="MyDB")

client.createtable( tablename="MyTable", table_fields=[ {"name": "ID", "dataType": "INT", "primaryKey": True}, {"name": "Doc", "dataType": "STRING"}, ], indices=[ {"name": "Index", "field": "Doc"}, ] )

client.insert( table_name="MyTable", records=[ {"ID": 1, "Doc": "Jupiter is the largest planet in our solar system."}, {"ID": 2, "Doc": "Cheetahs are the fastest land animals, reaching speeds over 60 mph."}, {"ID": 3, "Doc": "Vincent van Gogh painted the famous work \"Starry Night.\""}, {"ID": 4, "Doc": "The Amazon River is the longest river in the world."}, {"ID": 5, "Doc": "The Moon completes one orbit around Earth every 27 days."}, ], )

client.query( tablename="MyTable", querytext="Celestial bodies and their characteristics", limit=2 )

Result

{

'message': 'Query search successfully.',

'result': [

{'Doc': 'Jupiter is the largest planet in our solar system.', 'ID': 1},

{'Doc': 'The Moon completes one orbit around Earth every 27 days.', 'ID': 5}

],

'statusCode': 200

}

```

Features:

  • High performance and production-scale similarity search for embedding vectors.

  • Full fledged database management system with familiar database, table, and field concepts. Vector is just another field type.

  • Metadata filtering.

  • Hybrid search with a fusion of dense and sparse vectors.

  • Built-in embedding support, with natural language in natural language out search experience.

  • Cloud native architecture with compute storage separation, serverless, and multi-tenancy.

  • Rich ecosystem integrations including LangChain and LlamaIndex.

  • Python/JavaScript/Ruby clients, and REST API interface.

Epsilla's core is written in C++ and leverages the advanced academic parallel graph traversal techniques for vector indexing, achieving 10 times faster vector search than HNSW while maintaining precision levels of over 99.9%.

Epsilla Cloud

Try our fully managed vector DBaaS at Epsilla Cloud

(Experimental) Use Epsilla as a python library without starting a docker image

1. Build Epsilla Python Bindings lib package shell cd engine/scripts (If on Ubuntu, run this first: bash setup-dev.sh) bash install_oatpp_modules.sh cd .. bash build.sh ls -lh build/*.so

2. Run test with python bindings lib "epsilla.so" "libvectordb_dylib.so in the folder "build" built in the previous step shell cd engine export PYTHONPATH=./build/ export DB_PATH=/tmp/db33 python3 test/bindings/python/test.py

Here are some sample code: ```python import epsilla

epsilla.loaddb(dbname="db", dbpath="/data/epsilla") epsilla.usedb(dbname="db") epsilla.createtable( tablename="MyTable", tablefields=[ {"name": "ID", "dataType": "INT", "primaryKey": True}, {"name": "Doc", "dataType": "STRING"}, {"name": "EmbeddingEuclidean", "dataType": "VECTORFLOAT", "dimensions": 4, "metricType": "EUCLIDEAN"} ] ) epsilla.insert( tablename="MyTable", records=[ {"ID": 1, "Doc": "Berlin", "EmbeddingEuclidean": [0.05, 0.61, 0.76, 0.74]}, {"ID": 2, "Doc": "London", "EmbeddingEuclidean": [0.19, 0.81, 0.75, 0.11]}, {"ID": 3, "Doc": "Moscow", "EmbeddingEuclidean": [0.36, 0.55, 0.47, 0.94]} ] ) (code, response) = epsilla.query( tablename="MyTable", queryfield="EmbeddingEuclidean", responsefields=["ID", "Doc", "EmbeddingEuclidean"], queryvector=[0.35, 0.55, 0.47, 0.94], filter="ID < 6", limit=10, with_distance=True ) print(code, response) ```

Owner

  • Name: epsilla-cloud
  • Login: epsilla-cloud
  • Kind: organization

GitHub Events

Total
  • Create event: 5
  • Release event: 2
  • Issues event: 1
  • Watch event: 47
  • Issue comment event: 2
  • Push event: 16
  • Pull request review event: 2
  • Pull request event: 3
  • Fork event: 4
Last Year
  • Create event: 5
  • Release event: 2
  • Issues event: 1
  • Watch event: 47
  • Issue comment event: 2
  • Push event: 16
  • Pull request review event: 2
  • Pull request event: 3
  • Fork event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 308
  • Total Committers: 9
  • Avg Commits per committer: 34.222
  • Development Distribution Score (DDS): 0.558
Past Year
  • Commits: 9
  • Committers: 2
  • Avg Commits per committer: 4.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
richard-epsilla 1****a 136
Eric 1****a 135
rickiEpsilla r****i@e****m 21
TopKeyboard 1****d 9
Tony Yang t****a@g****m 3
rickiEpsilla 1****a 1
jonherke 3****e 1
Julius Lipp 4****p 1
Andriy Mulyar a****r@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 27
  • Total pull requests: 125
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 2 days
  • Total issue authors: 9
  • Total pull request authors: 8
  • Average comments per issue: 0.96
  • Average comments per pull request: 0.08
  • Merged pull requests: 119
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 3
  • Average time to close issues: 4 months
  • Average time to close pull requests: 21 minutes
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • richard-epsilla (14)
  • jmikedupont2 (3)
  • negativenagesh (2)
  • pkpro (2)
  • KaineRecycler (2)
  • lonngxiang (1)
  • SimonBerens (1)
  • ThatOneShortGuy (1)
  • tonyyanga (1)
Pull Request Authors
  • richard-epsilla (79)
  • eric-epsilla (28)
  • ricki-epsilla (15)
  • TopKeyboard (12)
  • juliuslipp (2)
  • AndriyMulyar (2)
  • jonherke (1)
  • tonyyanga (1)
Top Labels
Issue Labels
feature (18) code merged (8) 0.3.2 (5) Added to roadmap (4) bug (3) 0.4 (3) working in progress (1) enhancement (1) good first issue (1) question (1) 0.3.4 (1)
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads:
    • cargo 2,637 total
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 17
  • Total maintainers: 1
proxy.golang.org: github.com/epsilla-cloud/vectordb
  • Versions: 15
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 2.5%
Forks count: 4.2%
Average: 4.4%
Dependent packages count: 5.4%
Dependent repos count: 5.7%
Last synced: 6 months ago
crates.io: epsilla

Epsilla Rust SDK

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,300 Total
Rankings
Dependent repos count: 28.2%
Dependent packages count: 33.2%
Average: 52.6%
Downloads: 96.5%
Maintainers (1)
Last synced: 6 months ago
crates.io: epsilla-client

Epsilla Rust SDK

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,337 Total
Rankings
Dependent repos count: 28.2%
Dependent packages count: 33.2%
Average: 52.7%
Downloads: 96.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/build-embedding.yml actions
  • actions/checkout v3 composite
  • docker/login-action f4ef78c080cd8ba55a85445d5b36e214a81df20a composite
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • docker/login-action f4ef78c080cd8ba55a85445d5b36e214a81df20a composite
engine/Dockerfile docker
  • ubuntu 22.04 build
.github/workflows/build-dev.yml actions
  • actions/checkout v3 composite
  • docker/login-action f4ef78c080cd8ba55a85445d5b36e214a81df20a composite
.github/workflows/build-base.yml actions
  • actions/checkout v3 composite
  • docker/login-action f4ef78c080cd8ba55a85445d5b36e214a81df20a composite