hakes

HAKES: Efficient Data Search with Embedding Vectors at Scale

https://github.com/nusdbsystem/hakes

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

HAKES: Efficient Data Search with Embedding Vectors at Scale

Basic Info
  • Host: GitHub
  • Owner: nusdbsystem
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Size: 1.04 MB
Statistics
  • Stars: 38
  • Watchers: 8
  • Forks: 22
  • Open Issues: 1
  • Releases: 0
Created almost 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation Notice

README.md

HAKES

HAKES is an embedding vector data search system. It features modular and disaggregated architecture designs across the three data management modules, data storage, vector search and embedding model hosting. It aims for resource efficiency and fine-grained scaling in cloud/clustered deployment. Moreover, HAKES provides a proof-of-concept (PoC) implementation of security-protection mode leveraging Intel Software Guard Extentions (SGX) to operate in untrusted environment.

VLDB 2025

To reproduce our experiments in our VLDB 2025 paper, please consider the HAKES-Search Repo, a cleaned codebase we used for paper submission. We will also release the instructions for experiment data preparation and the trained index parameters there.

Key modules

  • hakes-worker: exposes Key-value and AKNN search interface.
  • embed-worker: host embedding models. It support tflm and tvm c runtime to run model inference on CPU.
  • embed-endpoint: allow connection to external embedding services. We provide plugin for OpenAI embedding service and HuggingFace inference endpoints.
  • fnpacker: middleware when embed-worker are deployed as functions on a serverless platform (Current implementation demostrate usage with Apache OpenWhisk). It can exposes an http endpoint with one or more function endpoint backends.
  • search-worker: serve a two-phase vector search: a fast filter phase with quantized index followed by a accurate refine phase with full vectors. It allows injecting fine tuned index parameters online, which enables adaption for specific query workloads.
  • hakes-store: an efficient fault-tolerant storage layer designed for shared storage architecture. It uses LSM-tree to organize data and boost resource efficiency for cloud deployment with cloud shared storage and serverless computing.

For Intel SGX security protection mode.

  • requires SGX-enabled linux servers and attestation service set up over the servers according to the documentation on Intel SGX Data Center Attestation Primitives.
  • hakes-worker, embed-worker, search-worker can be compiled with SGX support to perform data processing on plain-text data only inside trusted execution environment (enclave) set up by SGX.
  • key-service: store secret keys for data encryption and manages access control for the enclaves.

Deployment

All components of HAKES are containerised and instructions to build the images can be found under docker.

Ongoing development

  • A CLI tool to facilitate management of HAKES deployments for multiple datasets
  • Additional documentations and guides
  • Examples

Reference

Please cite our publication when you use HAKES in your research or development.

  • Guoyu Hu, Shaofeng Cai, Tien Tuan Anh Dinh, Zhongle Xie, Cong Yue, Gang Chen, and Beng Chin Ooi. HAKES: Scalable Vector Database for Embedding Search Service. PVLDB, 18(9): 3049 - 3062, 2025. doi:10.14778/3746405.3746427

Contact

Feel free to send me an email for any questions:

Guoyu Hu (guoyu.hu@u.nus.edu or hugy718@gmail.com)

Owner

  • Name: nusdbsystem
  • Login: nusdbsystem
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: ""
  given-names: "HAKES Authors"
title: "HAKES: Efficient Data Search with Embedding Vectors"
version: 0.0.1
date-released: 2024-08-22
url: "https://github.com/nusdbsystem/HAKES"

GitHub Events

Total
  • Watch event: 31
  • Delete event: 1
  • Issue comment event: 5
  • Push event: 19
  • Pull request review comment event: 1
  • Pull request review event: 4
  • Pull request event: 33
  • Fork event: 19
  • Create event: 2
Last Year
  • Watch event: 31
  • Delete event: 1
  • Issue comment event: 5
  • Push event: 19
  • Pull request review comment event: 1
  • Pull request review event: 4
  • Pull request event: 33
  • Fork event: 19
  • Create event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 14
  • Average time to close issues: less than a minute
  • Average time to close pull requests: 1 day
  • Total issue authors: 1
  • Total pull request authors: 5
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.21
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 14
  • Average time to close issues: less than a minute
  • Average time to close pull requests: 1 day
  • Issue authors: 1
  • Pull request authors: 5
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.21
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • hugy718 (6)
  • tinyAdapter (5)
  • yc1111 (3)
  • allvphx (1)
  • solopku (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

docker/build-base/no-sgx/Dockerfile docker
  • ubuntu 20.04 build
docker/build-base/sgx/Dockerfile docker
  • sgx_dcap_ssl_2.14_1.11 v1 build
docker/embed-worker/no-sgx/Dockerfile docker
  • hakes_es_base_nosgx v1 build
  • ubuntu 20.04 build
docker/embed-worker/sgx/Dockerfile docker
  • hakes_es_base_sgx v1 build
  • sgx_dcap_ssl_2.14_1.11 v1 build
docker/fnpacker/Dockerfile docker
  • golang 1.21-bullseye build
docker/hakes-store/hakes-store/Dockerfile docker
  • golang 1.21-bullseye build
docker/hakes-store/store-daemon/Dockerfile docker
  • golang 1.21-bullseye build
docker/hakes-worker/no-sgx/Dockerfile docker
  • hakes_es_base_nosgx v1 build
  • ubuntu 20.04 build
docker/hakes-worker/sgx/Dockerfile docker
  • hakes_es_base_sgx v1 build
  • sgx_dcap_ssl_2.14_1.11 v1 build
docker/key-service/Dockerfile docker
  • hakes_es_base_sgx v1 build
  • sgx_dcap_ssl_2.14_1.11 v1 build
docker/key-service-client/Dockerfile docker
  • hakes_es_base_sgx v1 build
  • sgx_dcap_ssl_2.14_1.11 v1 build
docker/search-worker/no-sgx/Dockerfile docker
  • hakes_es_base_nosgx v1 build
  • ubuntu 20.04 build
docker/search-worker/sgx/Dockerfile docker
  • hakes_es_base_sgx v1 build
  • sgx_dcap_ssl_2.14_1.11 v1 build
docker/sgx-dcap/Dockerfile docker
  • ubuntu 20.04 build
hakes-store/lambda/aws/Dockerfile docker
  • public.ecr.aws/lambda/go 1 build
fnpacker/go.mod go
  • github.com/apache/openwhisk-client-go v0.0.0-20220811044404-a6921af2f086
  • github.com/cloudfoundry/jibber_jabber v0.0.0-20151120183258-bcc4c8345a21
  • github.com/fatih/color v1.10.0
  • github.com/google/go-querystring v1.0.0
  • github.com/hokaccha/go-prettyjson v0.0.0-20210113012101-fb4e108d2519
  • github.com/mattn/go-colorable v0.1.8
  • github.com/mattn/go-isatty v0.0.12
  • github.com/nicksnyder/go-i18n v1.10.1
  • github.com/pelletier/go-toml v1.2.0
  • golang.org/x/sys v0.0.0-20210112080510-489259a85091
  • gopkg.in/yaml.v2 v2.3.0
fnpacker/go.sum go
  • github.com/BurntSushi/toml v0.3.1
  • github.com/apache/openwhisk-client-go v0.0.0-20220811044404-a6921af2f086
  • github.com/cloudfoundry/jibber_jabber v0.0.0-20151120183258-bcc4c8345a21
  • github.com/davecgh/go-spew v1.1.0
  • github.com/davecgh/go-spew v1.1.1
  • github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
  • github.com/fatih/color v1.10.0
  • github.com/fsnotify/fsnotify v1.4.7
  • github.com/fsnotify/fsnotify v1.4.9
  • github.com/golang/protobuf v1.2.0
  • github.com/golang/protobuf v1.4.0-rc.1
  • github.com/golang/protobuf v1.4.0-rc.1.0.20200221234624-67d41d38c208
  • github.com/golang/protobuf v1.4.0-rc.2
  • github.com/golang/protobuf v1.4.0-rc.4.0.20200313231945-b860323f09d0
  • github.com/golang/protobuf v1.4.0
  • github.com/golang/protobuf v1.4.2
  • github.com/google/go-cmp v0.3.0
  • github.com/google/go-cmp v0.3.1
  • github.com/google/go-cmp v0.4.0
  • github.com/google/go-querystring v1.0.0
  • github.com/hokaccha/go-prettyjson v0.0.0-20210113012101-fb4e108d2519
  • github.com/hpcloud/tail v1.0.0
  • github.com/mattn/go-colorable v0.1.8
  • github.com/mattn/go-isatty v0.0.12
  • github.com/nicksnyder/go-i18n v1.10.1
  • github.com/nxadm/tail v1.4.4
  • github.com/onsi/ginkgo v1.6.0
  • github.com/onsi/ginkgo v1.12.1
  • github.com/onsi/ginkgo v1.15.0
  • github.com/onsi/gomega v1.7.1
  • github.com/onsi/gomega v1.10.1
  • github.com/onsi/gomega v1.10.5
  • github.com/pelletier/go-toml v1.2.0
  • github.com/pmezard/go-difflib v1.0.0
  • github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2
  • github.com/stretchr/objx v0.1.0
  • github.com/stretchr/objx v0.3.0
  • github.com/stretchr/testify v1.3.0
  • github.com/stretchr/testify v1.6.1
  • github.com/yuin/goldmark v1.2.1
  • golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2
  • golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550
  • golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9
  • golang.org/x/mod v0.3.0
  • golang.org/x/net v0.0.0-20180906233101-161cd47e91fd
  • golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3
  • golang.org/x/net v0.0.0-20190620200207-3b0461eec859
  • golang.org/x/net v0.0.0-20200520004742-59133d7f0dd7
  • golang.org/x/net v0.0.0-20201021035429-f5854403a974
  • golang.org/x/net v0.0.0-20201202161906-c7110b5ffcbb
  • golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f
  • golang.org/x/sync v0.0.0-20190423024810-112230192c58
  • golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9
  • golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e
  • golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a
  • golang.org/x/sys v0.0.0-20190412213103-97732733099d
  • golang.org/x/sys v0.0.0-20190904154756-749cb33beabd
  • golang.org/x/sys v0.0.0-20191005200804-aed5e4c7ecf9
  • golang.org/x/sys v0.0.0-20191120155948-bd437916bb0e
  • golang.org/x/sys v0.0.0-20200116001909-b77594299b42
  • golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae
  • golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd
  • golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f
  • golang.org/x/sys v0.0.0-20210112080510-489259a85091
  • golang.org/x/text v0.3.0
  • golang.org/x/text v0.3.3
  • golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e
  • golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e
  • golang.org/x/tools v0.0.0-20201224043029-2b0845dc783e
  • golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7
  • golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898
  • golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543
  • golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
  • google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd
  • google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64
  • google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60
  • google.golang.org/protobuf v1.20.1-0.20200309200217-e05f789c0967
  • google.golang.org/protobuf v1.21.0
  • google.golang.org/protobuf v1.23.0
  • gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
  • gopkg.in/fsnotify.v1 v1.4.7
  • gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7
  • gopkg.in/yaml.v2 v2.2.1
  • gopkg.in/yaml.v2 v2.2.4
  • gopkg.in/yaml.v2 v2.3.0
  • gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
  • gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b
hakes-store/go.mod go
  • github.com/aws/aws-lambda-go v1.41.0
  • github.com/aws/aws-sdk-go-v2 v1.18.1
  • github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.10
  • github.com/aws/aws-sdk-go-v2/config v1.17.8
  • github.com/aws/aws-sdk-go-v2/credentials v1.12.21
  • github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.17
  • github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.34
  • github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.28
  • github.com/aws/aws-sdk-go-v2/internal/ini v1.3.24
  • github.com/aws/aws-sdk-go-v2/internal/v4a v1.0.25
  • github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.9.11
  • github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.1.28
  • github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.27
  • github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.14.2
  • github.com/aws/aws-sdk-go-v2/service/kinesis v1.17.12
  • github.com/aws/aws-sdk-go-v2/service/lambda v1.37.0
  • github.com/aws/aws-sdk-go-v2/service/s3 v1.33.1
  • github.com/aws/aws-sdk-go-v2/service/sso v1.11.23
  • github.com/aws/aws-sdk-go-v2/service/ssooidc v1.13.6
  • github.com/aws/aws-sdk-go-v2/service/sts v1.16.19
  • github.com/aws/smithy-go v1.13.5
  • github.com/bytedance/sonic v1.11.6
  • github.com/bytedance/sonic/loader v0.1.1
  • github.com/cespare/xxhash v1.1.0
  • github.com/cespare/xxhash/v2 v2.2.0
  • github.com/cloudwego/base64x v0.1.4
  • github.com/cloudwego/iasm v0.2.0
  • github.com/dgraph-io/badger/v3 v3.2103.5
  • github.com/dgraph-io/ristretto v0.1.1
  • github.com/dustin/go-humanize v1.0.0
  • github.com/gabriel-vasile/mimetype v1.4.3
  • github.com/gin-contrib/sse v0.1.0
  • github.com/gin-gonic/gin v1.10.0
  • github.com/go-playground/locales v0.14.1
  • github.com/go-playground/universal-translator v0.18.1
  • github.com/go-playground/validator/v10 v10.20.0
  • github.com/go-zookeeper/zk v1.0.3
  • github.com/goccy/go-json v0.10.2
  • github.com/gogo/protobuf v1.3.2
  • github.com/golang/glog v1.0.0
  • github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da
  • github.com/golang/protobuf v1.5.2
  • github.com/golang/snappy v0.0.3
  • github.com/google/flatbuffers v1.12.1
  • github.com/jmespath/go-jmespath v0.4.0
  • github.com/json-iterator/go v1.1.12
  • github.com/klauspost/compress v1.13.6
  • github.com/klauspost/cpuid/v2 v2.2.7
  • github.com/kr/text v0.2.0
  • github.com/leodido/go-urn v1.4.0
  • github.com/mackerelio/go-osstat v0.2.4
  • github.com/mattn/go-isatty v0.0.20
  • github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
  • github.com/modern-go/reflect2 v1.0.2
  • github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e
  • github.com/pelletier/go-toml/v2 v2.2.2
  • github.com/pkg/errors v0.9.1
  • github.com/twitchyliquid64/golang-asm v0.15.1
  • github.com/ugorji/go/codec v1.2.12
  • go.opencensus.io v0.23.0
  • golang.org/x/arch v0.8.0
  • golang.org/x/crypto v0.23.0
  • golang.org/x/net v0.25.0
  • golang.org/x/sys v0.20.0
  • golang.org/x/text v0.15.0
  • google.golang.org/genproto v0.0.0-20221118155620-16455021b5e6
  • google.golang.org/grpc v1.52.0
  • google.golang.org/protobuf v1.34.1
  • gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f
  • gopkg.in/yaml.v2 v2.4.0
  • gopkg.in/yaml.v3 v3.0.1
hakes-store/go.sum go
  • 256 dependencies