hakes
HAKES: Efficient Data Search with Embedding Vectors at Scale
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.8%) to scientific vocabulary
Repository
HAKES: Efficient Data Search with Embedding Vectors at Scale
Basic Info
- Host: GitHub
- Owner: nusdbsystem
- License: apache-2.0
- Language: C++
- Default Branch: main
- Size: 1.04 MB
Statistics
- Stars: 38
- Watchers: 8
- Forks: 22
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
HAKES
HAKES is an embedding vector data search system. It features modular and disaggregated architecture designs across the three data management modules, data storage, vector search and embedding model hosting. It aims for resource efficiency and fine-grained scaling in cloud/clustered deployment. Moreover, HAKES provides a proof-of-concept (PoC) implementation of security-protection mode leveraging Intel Software Guard Extentions (SGX) to operate in untrusted environment.
VLDB 2025
To reproduce our experiments in our VLDB 2025 paper, please consider the HAKES-Search Repo, a cleaned codebase we used for paper submission. We will also release the instructions for experiment data preparation and the trained index parameters there.
Key modules
hakes-worker: exposes Key-value and AKNN search interface.embed-worker: host embedding models. It support tflm and tvm c runtime to run model inference on CPU.embed-endpoint: allow connection to external embedding services. We provide plugin for OpenAI embedding service and HuggingFace inference endpoints.fnpacker: middleware when embed-worker are deployed as functions on a serverless platform (Current implementation demostrate usage with Apache OpenWhisk). It can exposes an http endpoint with one or more function endpoint backends.search-worker: serve a two-phase vector search: a fast filter phase with quantized index followed by a accurate refine phase with full vectors. It allows injecting fine tuned index parameters online, which enables adaption for specific query workloads.hakes-store: an efficient fault-tolerant storage layer designed for shared storage architecture. It uses LSM-tree to organize data and boost resource efficiency for cloud deployment with cloud shared storage and serverless computing.
For Intel SGX security protection mode.
- requires SGX-enabled linux servers and attestation service set up over the servers according to the documentation on Intel SGX Data Center Attestation Primitives.
hakes-worker,embed-worker,search-workercan be compiled with SGX support to perform data processing on plain-text data only inside trusted execution environment (enclave) set up by SGX.key-service: store secret keys for data encryption and manages access control for the enclaves.
Deployment
All components of HAKES are containerised and instructions to build the images can be found under docker.
Ongoing development
- A CLI tool to facilitate management of HAKES deployments for multiple datasets
- Additional documentations and guides
- Examples
Reference
Please cite our publication when you use HAKES in your research or development.
- Guoyu Hu, Shaofeng Cai, Tien Tuan Anh Dinh, Zhongle Xie, Cong Yue, Gang Chen, and Beng Chin Ooi. HAKES: Scalable Vector Database for Embedding Search Service. PVLDB, 18(9): 3049 - 3062, 2025. doi:10.14778/3746405.3746427
Contact
Feel free to send me an email for any questions:
Guoyu Hu (guoyu.hu@u.nus.edu or hugy718@gmail.com)
Owner
- Name: nusdbsystem
- Login: nusdbsystem
- Kind: organization
- Repositories: 14
- Profile: https://github.com/nusdbsystem
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "" given-names: "HAKES Authors" title: "HAKES: Efficient Data Search with Embedding Vectors" version: 0.0.1 date-released: 2024-08-22 url: "https://github.com/nusdbsystem/HAKES"
GitHub Events
Total
- Watch event: 31
- Delete event: 1
- Issue comment event: 5
- Push event: 19
- Pull request review comment event: 1
- Pull request review event: 4
- Pull request event: 33
- Fork event: 19
- Create event: 2
Last Year
- Watch event: 31
- Delete event: 1
- Issue comment event: 5
- Push event: 19
- Pull request review comment event: 1
- Pull request review event: 4
- Pull request event: 33
- Fork event: 19
- Create event: 2
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 14
- Average time to close issues: less than a minute
- Average time to close pull requests: 1 day
- Total issue authors: 1
- Total pull request authors: 5
- Average comments per issue: 0.0
- Average comments per pull request: 0.21
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 14
- Average time to close issues: less than a minute
- Average time to close pull requests: 1 day
- Issue authors: 1
- Pull request authors: 5
- Average comments per issue: 0.0
- Average comments per pull request: 0.21
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- hugy718 (6)
- tinyAdapter (5)
- yc1111 (3)
- allvphx (1)
- solopku (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ubuntu 20.04 build
- sgx_dcap_ssl_2.14_1.11 v1 build
- hakes_es_base_nosgx v1 build
- ubuntu 20.04 build
- hakes_es_base_sgx v1 build
- sgx_dcap_ssl_2.14_1.11 v1 build
- golang 1.21-bullseye build
- golang 1.21-bullseye build
- golang 1.21-bullseye build
- hakes_es_base_nosgx v1 build
- ubuntu 20.04 build
- hakes_es_base_sgx v1 build
- sgx_dcap_ssl_2.14_1.11 v1 build
- hakes_es_base_sgx v1 build
- sgx_dcap_ssl_2.14_1.11 v1 build
- hakes_es_base_sgx v1 build
- sgx_dcap_ssl_2.14_1.11 v1 build
- hakes_es_base_nosgx v1 build
- ubuntu 20.04 build
- hakes_es_base_sgx v1 build
- sgx_dcap_ssl_2.14_1.11 v1 build
- ubuntu 20.04 build
- public.ecr.aws/lambda/go 1 build
- github.com/apache/openwhisk-client-go v0.0.0-20220811044404-a6921af2f086
- github.com/cloudfoundry/jibber_jabber v0.0.0-20151120183258-bcc4c8345a21
- github.com/fatih/color v1.10.0
- github.com/google/go-querystring v1.0.0
- github.com/hokaccha/go-prettyjson v0.0.0-20210113012101-fb4e108d2519
- github.com/mattn/go-colorable v0.1.8
- github.com/mattn/go-isatty v0.0.12
- github.com/nicksnyder/go-i18n v1.10.1
- github.com/pelletier/go-toml v1.2.0
- golang.org/x/sys v0.0.0-20210112080510-489259a85091
- gopkg.in/yaml.v2 v2.3.0
- github.com/BurntSushi/toml v0.3.1
- github.com/apache/openwhisk-client-go v0.0.0-20220811044404-a6921af2f086
- github.com/cloudfoundry/jibber_jabber v0.0.0-20151120183258-bcc4c8345a21
- github.com/davecgh/go-spew v1.1.0
- github.com/davecgh/go-spew v1.1.1
- github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc
- github.com/fatih/color v1.10.0
- github.com/fsnotify/fsnotify v1.4.7
- github.com/fsnotify/fsnotify v1.4.9
- github.com/golang/protobuf v1.2.0
- github.com/golang/protobuf v1.4.0-rc.1
- github.com/golang/protobuf v1.4.0-rc.1.0.20200221234624-67d41d38c208
- github.com/golang/protobuf v1.4.0-rc.2
- github.com/golang/protobuf v1.4.0-rc.4.0.20200313231945-b860323f09d0
- github.com/golang/protobuf v1.4.0
- github.com/golang/protobuf v1.4.2
- github.com/google/go-cmp v0.3.0
- github.com/google/go-cmp v0.3.1
- github.com/google/go-cmp v0.4.0
- github.com/google/go-querystring v1.0.0
- github.com/hokaccha/go-prettyjson v0.0.0-20210113012101-fb4e108d2519
- github.com/hpcloud/tail v1.0.0
- github.com/mattn/go-colorable v0.1.8
- github.com/mattn/go-isatty v0.0.12
- github.com/nicksnyder/go-i18n v1.10.1
- github.com/nxadm/tail v1.4.4
- github.com/onsi/ginkgo v1.6.0
- github.com/onsi/ginkgo v1.12.1
- github.com/onsi/ginkgo v1.15.0
- github.com/onsi/gomega v1.7.1
- github.com/onsi/gomega v1.10.1
- github.com/onsi/gomega v1.10.5
- github.com/pelletier/go-toml v1.2.0
- github.com/pmezard/go-difflib v1.0.0
- github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2
- github.com/stretchr/objx v0.1.0
- github.com/stretchr/objx v0.3.0
- github.com/stretchr/testify v1.3.0
- github.com/stretchr/testify v1.6.1
- github.com/yuin/goldmark v1.2.1
- golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2
- golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550
- golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9
- golang.org/x/mod v0.3.0
- golang.org/x/net v0.0.0-20180906233101-161cd47e91fd
- golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3
- golang.org/x/net v0.0.0-20190620200207-3b0461eec859
- golang.org/x/net v0.0.0-20200520004742-59133d7f0dd7
- golang.org/x/net v0.0.0-20201021035429-f5854403a974
- golang.org/x/net v0.0.0-20201202161906-c7110b5ffcbb
- golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f
- golang.org/x/sync v0.0.0-20190423024810-112230192c58
- golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9
- golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e
- golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a
- golang.org/x/sys v0.0.0-20190412213103-97732733099d
- golang.org/x/sys v0.0.0-20190904154756-749cb33beabd
- golang.org/x/sys v0.0.0-20191005200804-aed5e4c7ecf9
- golang.org/x/sys v0.0.0-20191120155948-bd437916bb0e
- golang.org/x/sys v0.0.0-20200116001909-b77594299b42
- golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae
- golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd
- golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f
- golang.org/x/sys v0.0.0-20210112080510-489259a85091
- golang.org/x/text v0.3.0
- golang.org/x/text v0.3.3
- golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e
- golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e
- golang.org/x/tools v0.0.0-20201224043029-2b0845dc783e
- golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7
- golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898
- golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543
- golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
- google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd
- google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64
- google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60
- google.golang.org/protobuf v1.20.1-0.20200309200217-e05f789c0967
- google.golang.org/protobuf v1.21.0
- google.golang.org/protobuf v1.23.0
- gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
- gopkg.in/fsnotify.v1 v1.4.7
- gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7
- gopkg.in/yaml.v2 v2.2.1
- gopkg.in/yaml.v2 v2.2.4
- gopkg.in/yaml.v2 v2.3.0
- gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c
- gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b
- github.com/aws/aws-lambda-go v1.41.0
- github.com/aws/aws-sdk-go-v2 v1.18.1
- github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.10
- github.com/aws/aws-sdk-go-v2/config v1.17.8
- github.com/aws/aws-sdk-go-v2/credentials v1.12.21
- github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.12.17
- github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.34
- github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.28
- github.com/aws/aws-sdk-go-v2/internal/ini v1.3.24
- github.com/aws/aws-sdk-go-v2/internal/v4a v1.0.25
- github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.9.11
- github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.1.28
- github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.27
- github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.14.2
- github.com/aws/aws-sdk-go-v2/service/kinesis v1.17.12
- github.com/aws/aws-sdk-go-v2/service/lambda v1.37.0
- github.com/aws/aws-sdk-go-v2/service/s3 v1.33.1
- github.com/aws/aws-sdk-go-v2/service/sso v1.11.23
- github.com/aws/aws-sdk-go-v2/service/ssooidc v1.13.6
- github.com/aws/aws-sdk-go-v2/service/sts v1.16.19
- github.com/aws/smithy-go v1.13.5
- github.com/bytedance/sonic v1.11.6
- github.com/bytedance/sonic/loader v0.1.1
- github.com/cespare/xxhash v1.1.0
- github.com/cespare/xxhash/v2 v2.2.0
- github.com/cloudwego/base64x v0.1.4
- github.com/cloudwego/iasm v0.2.0
- github.com/dgraph-io/badger/v3 v3.2103.5
- github.com/dgraph-io/ristretto v0.1.1
- github.com/dustin/go-humanize v1.0.0
- github.com/gabriel-vasile/mimetype v1.4.3
- github.com/gin-contrib/sse v0.1.0
- github.com/gin-gonic/gin v1.10.0
- github.com/go-playground/locales v0.14.1
- github.com/go-playground/universal-translator v0.18.1
- github.com/go-playground/validator/v10 v10.20.0
- github.com/go-zookeeper/zk v1.0.3
- github.com/goccy/go-json v0.10.2
- github.com/gogo/protobuf v1.3.2
- github.com/golang/glog v1.0.0
- github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da
- github.com/golang/protobuf v1.5.2
- github.com/golang/snappy v0.0.3
- github.com/google/flatbuffers v1.12.1
- github.com/jmespath/go-jmespath v0.4.0
- github.com/json-iterator/go v1.1.12
- github.com/klauspost/compress v1.13.6
- github.com/klauspost/cpuid/v2 v2.2.7
- github.com/kr/text v0.2.0
- github.com/leodido/go-urn v1.4.0
- github.com/mackerelio/go-osstat v0.2.4
- github.com/mattn/go-isatty v0.0.20
- github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd
- github.com/modern-go/reflect2 v1.0.2
- github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e
- github.com/pelletier/go-toml/v2 v2.2.2
- github.com/pkg/errors v0.9.1
- github.com/twitchyliquid64/golang-asm v0.15.1
- github.com/ugorji/go/codec v1.2.12
- go.opencensus.io v0.23.0
- golang.org/x/arch v0.8.0
- golang.org/x/crypto v0.23.0
- golang.org/x/net v0.25.0
- golang.org/x/sys v0.20.0
- golang.org/x/text v0.15.0
- google.golang.org/genproto v0.0.0-20221118155620-16455021b5e6
- google.golang.org/grpc v1.52.0
- google.golang.org/protobuf v1.34.1
- gopkg.in/check.v1 v1.0.0-20200227125254-8fa46927fb4f
- gopkg.in/yaml.v2 v2.4.0
- gopkg.in/yaml.v3 v3.0.1
- 256 dependencies