https://github.com/cwida/pdx

⚡ Faster similarity search with PDX: A vertical data layout for vectors

https://github.com/cwida/pdx

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

faiss information-retrieval nearest-neighbor-search simd similarity-search vector-indexing vector-search vector-similarity-search
Last synced: 5 months ago · JSON representation

Repository

⚡ Faster similarity search with PDX: A vertical data layout for vectors

Basic Info
Statistics
  • Stars: 54
  • Watchers: 5
  • Forks: 5
  • Open Issues: 3
  • Releases: 2
Topics
faiss information-retrieval nearest-neighbor-search simd similarity-search vector-indexing vector-search vector-similarity-search
Created 12 months ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

PDX: A Transposed Data Layout for Similarity Search
Paper License GitHub stars

PDX is a data layout that transposes vectors in a column-major order. This layout unleashes the true potential of dimension pruning, accelerating vanilla IVF indexes by factors:

PDX Layout

PDX benefits:


Contents

Pruning in a nutshell

Pruning means avoiding checking all the dimensions of a vector to determine if it is a neighbour of a query. The PDX layout unleashes the true potential of these algorithms (e.g., ADSampling), accelerating vanilla IVF indexes by factors.

Pruning is especially effective for large embeddings (d > 512) and when targeting high recalls (> 0.90) or nearly exact results.

Down below, you will find benchmarks against FAISS.

Try PDX

Try PDX with your data using our Python bindings and examples. We have implemented PDX on Flat (float32) and Quantized (8-bit) IVF indexes and exhaustive search settings.

Prerequisites

  • PDX is available for x86_64 (with AVX512), ARM, and Apple silicon
  • Python 3.11 or higher
  • FAISS with Python Bindings
  • Clang++17 or higher
  • CMake 3.26 or higher

Installation Steps

  1. Clone and init submodules sh git clone https://github.com/cwida/PDX git submodule init git submodule update

  2. [Optional] Install FFTW (for higher throughput) sh wget https://www.fftw.org/fftw-3.3.10.tar.gz tar -xvzf fftw-3.3.10.tar.gz cd fftw-3.3.10 ./configure --enable-float --enable-shared sudo make sudo make install ldconfig

  3. Install Python dependencies and the bindings. sh export CXX="/usr/bin/clang++-18" # Set proper CXX first pip install -r requirements.txt python setup.py clean --all python -m pip install .

  4. Run the examples under /examples ```sh

    Creates an IVF index with FAISS on random data

    Then, it compares the search performance of PDXearch and FAISS

    python ./examples/pdx_simple.py ``` For more details on the available examples and how to use your own data, refer to /examples/README.md.

[!NOTE] We heavily rely on FAISS to create the underlying IVF indexes.

Use Cases and Benchmarks

We present single-threaded benchmarks against FAISS+AVX512 on an r7iz.xlarge (Intel Sapphire Rapids) instance.

Two-Level IVF (IVF2)

IVF2 tackles a bottleneck of IVF indexes: finding the nearest centroids. By clustering the original IVF centroids, we can use PDX to quickly scan them (thanks to pruning) without sacrificing recall. This achieves significant throughput improvements when paired with 8-bit quantization.

PDX Layout

Vanilla IVF

Here, PDX, paired with the pruning algorithm ADSampling on float32, achieves significant speedups.

PDX Layout

Exhaustive search + IVF

An exhaustive search scans all the vectors in the collection. Having an IVF index with PDX can EXTREMELY accelerate this without sacrificing recall, thanks to the reliable pruning of ADSampling.

PDX Layout

The key observation here is that thanks to the underlying IVF index, the exhaustive search starts with the most promising clusters. A tight threshold is found early on, which enables the quick pruning of most candidates.

Exhaustive search without an index

By creating random clusters with the PDX layout, you can still accelerate exhaustive search without an index. Unlike ADSampling, with BOND (our pruning algorithm), you can use the raw vectors. Gains vary depending on the distribution of the data.

PDX Layout

No pruning and no index

Even without pruning, PDX distance kernels can be faster than SIMD ones in most CPU microarchitectures. For detailed information, check Figure 3 of our publication. You can also try it yourself in our playground here.

The Data Layout

PDX is a transposed layout (a.k.a. columnar, or decomposed layout), which means that the same dimensions of different vectors are stored sequentially. This decomposition occurs within a block (e.g., a cluster in an IVF index).

We have evolved our layout from the one presented in our publication to reduce random access, and adapted it to work with 8-bit and (in the future) 1-bit vectors.

float32

For float32, the first 25% of the dimensions are fully decomposed. We refer to this as the "vertical block." The rest (75%) are decomposed into subvectors of 64 dimensions. We refer to this as the "horizontal block." The vertical block is used for efficient pruning, and the horizontal block is accessed on the candidates that were not pruned. This horizontal block is still decomposed every 64 dimensions. The idea behind this is that we still have a chance to prune the few remaining candidates every 64 dimensions.

The following image shows this layout. Storage is sequential from left to right, and from top to bottom.

PDX Layout F32

8 bits

Smaller data types are not friendly to PDX, as we must accumulate distances on wider types, resulting in asymmetry. We can work around this by changing the PDX layout. For 8 bits, the vertical block is decomposed every 4 dimensions. This allows us to use dot product instructions (VPDPBUSD in x86 and UDOT/SDOT in NEON) to calculate L2 or IP kernels while still benefiting from PDX. The horizontal block remains decomposed every 64 dimensions.

PDX Layout F32

binary

For Hamming/Jaccard kernels, we use a layout decomposed every 8 dimensions (naturally grouped into bytes). The population count accumulation can be done in bytes. If d > 256, we flush the popcounts into a wider type every 32 words (corresponding to 256 dimensions). This has not been implemented in this repository yet, but you can find some promising benchmarks here.

Roadmap

  • Out-of-core execution (disk-based setting).
  • Add unit tests.
  • Implement multi-threading capabilities.
  • Add PDX to the VIBE benchmark.
  • Adaptive quantization on 8-bit and 4-bit.
  • Create a documentation.

[!IMPORTANT]
PDX is an ongoing research project. In its current state, it is not production-ready code.

Benchmarking

To run our benchmark suite in C++, refer to BENCHMARKING.md.

Citation

If you use PDX for your research, consider citing us:

@article{kuffo2025pdx, title={PDX: A Data Layout for Vector Similarity Search}, author={Kuffo, Leonardo and Krippner, Elena and Boncz, Peter}, journal={Proceedings of the ACM on Management of Data}, volume={3}, number={3}, pages={1--26}, year={2025}, publisher={ACM New York, NY, USA} }

SIGMOD

The code used for the experiments presented at SIGMOD'25 can be found in the sigmod branch.

Owner

  • Name: CWI Database Architectures Group
  • Login: cwida
  • Kind: organization
  • Location: Amsterdam, The Netherlands

GitHub Events

Total
  • Create event: 5
  • Issues event: 3
  • Release event: 1
  • Watch event: 40
  • Delete event: 2
  • Issue comment event: 11
  • Member event: 1
  • Push event: 70
  • Pull request event: 7
  • Fork event: 3
Last Year
  • Create event: 5
  • Issues event: 3
  • Release event: 1
  • Watch event: 40
  • Delete event: 2
  • Issue comment event: 11
  • Member event: 1
  • Push event: 70
  • Pull request event: 7
  • Fork event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 minutes
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 4 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • slhuang (2)
  • fangshil (1)
Pull Request Authors
  • lkuffo (4)
  • arjenpdevries (1)
Top Labels
Issue Labels
Pull Request Labels