ArrowSpace: introducing Spectral Indexing for vector search

ArrowSpace: introducing Spectral Indexing for vector search - Published in JOSS (2025)

https://github.com/mec-is/arrowspace-rs

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords from Contributors

bioinformatics labels graph-algorithms pypi
Last synced: 3 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Mec-iS
  • License: apache-2.0
  • Language: Rust
  • Default Branch: main
  • Size: 4.23 MB
Statistics
  • Stars: 6
  • Watchers: 0
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created 4 months ago · Last pushed 3 months ago
Metadata Files
Readme Contributing License Code of conduct Citation Codeowners

README.md

ArrowSpace

DOI

Fast spectral vector search that finds similarity beyond traditional distance metrics

ArrowSpace is a high-performance Rust library for vector similarity search that goes beyond geometric distance (cosine, L2) by incorporating spectral graph properties, enabling more nuanced similarity matching for scientific and structured data.

ArrowSpace is a data structure library that encapsulates the use of λτ indexing; a novel scoring method that mixes Rayleigh and Laplacian scoring (see RESEARCH.md) for building vector-search-friendly lookup tables with built-in spectral-awareness. This allows better managing of datasets where spectral characteristics are most relevant. It pairs dense, row‑major arrays with per‑row spectral scores (λτ) derived from a Rayleigh-Laplacian score built over items, enabling lambda‑aware similarity, range queries, and composable operations like superposition and element‑wise multiplication over rows. It has been designed to work on datasets where spectral characteristics can be leveraged to find matches that are usually ranked lower by commonly used distance metrics.

Run cargo run --example proteins_lookup for an example about how it compares with cosine similarity.

Usage

```rust use arrowspace::builder::ArrowSpaceBuilder;

// Simple example that works immediately let vectors = vec![ vec![1.0, 2.0, 3.0], vec![2.0, 3.0, 1.0], vec![3.0, 1.0, 2.0], ];

let (aspace, _) = ArrowSpaceBuilder::new() .build(vectors); ```

Requirements

  • Rust 1.78+ (edition 2024)

Installation

As a Library Dependency

Add to your Cargo.toml: toml [dependencies] arrowspace = "*"

From Source

bash git clone https://github.com/Mec-iS/arrowspace-rs cd arrowspace-rs cargo build --release

Running Examples

bash cargo run --example compare_cosine cargo run --example proteins_lookup

Running Tests

bash cargo test

Run Bench

$ cargo bench

Minimal usage

Construct an ArrowSpace from rows and compute a synthetic index λτ used in similarities search (spectral search):

  • Build λτ‑graph from data (preferred path):
    • Use ArrowSpaceBuilder::new().build(items) to get an ArrowSpace and its Laplacian+Tau mode; the builder will compute per‑row synthetic indices immediately.
    • Use ArrowSpaceBuilder::new().with_lambda_graph(...).build(items) to get an ArrowSpace and its Laplacian+Tau mode by specifying the parameters for the graph where the Laplacian is computed.
    • Use ArrowSpaceBuilder::new().with_lambdas(...).with_synthesis(...).build(items) to get an ArrowSpace and its Laplacian+Tau indices by specifying which lambdas values to use.
  • Search the space: ```rust use arrowSpace::builder::ArrowSpaceBuilder; use arrowSpace::core::ArrowItem;

// define the search parameters: alpha=1.0 is equivalent to cosine similarity let alpha = 0.7; let beta = 0.3;

// Build ArrowSpace from item vectors let items = vec![ vec![1.0, 2.0, 3.0], // Item 1 vec![2.0, 3.0, 1.0], // Item 2 vec![3.0, 1.0, 2.0], // Item 3 ];

let (aspace, graph) = ArrowSpaceBuilder::new() .withlambdagraph(0.5, 3, 2.0, sigma: 0.25) .build(items);

// prepare query vector let query = aspace.preparequeryitem(vec![1.5, 2.5, 2.0], &graph); // search the space let results = aspace.searchlambdaaware(&query, 1, alpha); println!("{:?}", results);

```

Main Features (spectral graph construction and search)

  • Data structure for vector search:
    • Lambda+Tau graph from data (default): builds a Laplacian over items from the row matrix, then computes per‑row synthetic λτ using laplacian + TauMode (see paper) with Median policy by default; override via with_synthesis(alpha, mode) to change α or τ policy.
    • Direct lambda ε‑graph (lower‑level): constructs a Laplacian from a vector of λ values with ε thresholding and k‑capping, union‑symmetrized CSR; use when supplying external λ instead of synthetic.
    • (optional) Hypergraph overlays: build Laplacians from hyperedges (clique expansion, normalized variant) and overlay “boosts” to strengthen pairs; for prebuilt/hypergraph paths, synthetic λ is opt‑in via with_synthesis.
    • (optional) Ensembles: parameterized variants (k adjust, radius/ε expand, hypergraph transforms) for graph experimentation while reusing the same data matrix; synthetic λ is computed per chosen base when enabled.
  • Examples:
    • End‑to‑end examples: protein‑like lookup with λ‑band range query using a ZSET‑style index; showcases for hypergraph, λ‑graph, and synthetic laplacian + TauMode flows.
    • Extensive tests spanning ArrowSpace algebra, Rayleigh properties, lambda scale‑invariance, superposition bounds, λ‑graph symmetry and k‑capping semantics, hypergraph correctness, diffusion/random‑walk simulations, fractal integrations, and synthetic λ via Median/Mean/Percentile τ policies.

Key concepts

See paper

Owner

  • Name: Lorenzo
  • Login: Mec-iS
  • Kind: user
  • Location: UK
  • Company: Free as in Free Software

Python. Rust. Geospatial. Satellite Imaging. RegTech. HealthTech. FinTech. Graph Thinking. ML. @DerwenAI @smartcorelib @HTTP-APIs

JOSS Publication

ArrowSpace: introducing Spectral Indexing for vector search
Published
September 27, 2025
Volume 10, Issue 113, Page 9002
Authors
Lorenzo Moriondo ORCID
Independent Researcher (London, UK / Tokyo, Japan) - tuned.org.uk
Editor
Daniel S. Katz ORCID
Tags
embeddings vector database RAG numerical scientific computing

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Moriondo
  given-names: Lorenzo
  orcid: "https://orcid.org/0000-0002-8804-2963"
doi: 10.5281/zenodo.17213264
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Moriondo
    given-names: Lorenzo
    orcid: "https://orcid.org/0000-0002-8804-2963"
  date-published: 2025-09-27
  doi: 10.21105/joss.09002
  issn: 2475-9066
  issue: 113
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 9002
  title: "ArrowSpace: introducing Spectral Indexing for vector search"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.09002"
  volume: 10
title: "ArrowSpace: introducing Spectral Indexing for vector search"

GitHub Events

Total
  • Create event: 3
  • Release event: 3
  • Issues event: 23
  • Watch event: 3
  • Issue comment event: 9
  • Push event: 28
  • Pull request review event: 1
  • Pull request event: 7
  • Fork event: 3
Last Year
  • Create event: 3
  • Release event: 3
  • Issues event: 23
  • Watch event: 3
  • Issue comment event: 9
  • Push event: 28
  • Pull request review event: 1
  • Pull request event: 7
  • Fork event: 3

Committers

Last synced: 3 months ago

All Time
  • Total Commits: 51
  • Total Committers: 4
  • Avg Commits per committer: 12.75
  • Development Distribution Score (DDS): 0.078
Past Year
  • Commits: 51
  • Committers: 4
  • Avg Commits per committer: 12.75
  • Development Distribution Score (DDS): 0.078
Top Committers
Name Email Commits
Lorenzo Mec-iS t****g@g****m 47
Seth s****k@g****m 2
Daniel S. Katz d****z@i****g 1
dependabot[bot] 4****]@u****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 3 months ago

All Time
  • Total issues: 10
  • Total pull requests: 4
  • Average time to close issues: 3 days
  • Average time to close pull requests: 8 minutes
  • Total issue authors: 2
  • Total pull request authors: 4
  • Average comments per issue: 0.8
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 10
  • Pull requests: 4
  • Average time to close issues: 3 days
  • Average time to close pull requests: 8 minutes
  • Issue authors: 2
  • Pull request authors: 4
  • Average comments per issue: 0.8
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • DiogoRibeiro7 (8)
  • sstadick (2)
Pull Request Authors
  • sstadick (1)
  • dependabot[bot] (1)
  • Mec-iS (1)
  • danielskatz (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (1) rust (1)

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 412 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
crates.io: arrowspace

Spectral vector search with taumode (λτ) indexing

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 412 Total
Rankings
Dependent repos count: 20.3%
Dependent packages count: 26.8%
Average: 47.2%
Downloads: 94.6%
Maintainers (1)
Last synced: 3 months ago

Dependencies

Cargo.lock cargo
  • aho-corasick 1.1.3
  • anes 0.1.6
  • anstyle 1.0.11
  • approx 0.5.1
  • autocfg 1.5.0
  • bitflags 2.9.3
  • bumpalo 3.19.0
  • cast 0.3.0
  • cfg-if 1.0.3
  • ciborium 0.2.2
  • ciborium-io 0.2.2
  • ciborium-ll 0.2.2
  • clap 4.5.46
  • clap_builder 4.5.46
  • clap_lex 0.7.5
  • criterion 0.7.0
  • criterion-plot 0.6.0
  • crossbeam-deque 0.8.6
  • crossbeam-epoch 0.9.18
  • crossbeam-utils 0.8.21
  • crunchy 0.2.4
  • either 1.15.0
  • getrandom 0.3.3
  • half 2.6.0
  • itertools 0.13.0
  • itoa 1.0.15
  • js-sys 0.3.77
  • libc 0.2.175
  • log 0.4.27
  • memchr 2.7.5
  • num 0.4.3
  • num-bigint 0.4.6
  • num-complex 0.4.6
  • num-integer 0.1.46
  • num-iter 0.1.45
  • num-rational 0.4.2
  • num-traits 0.2.19
  • once_cell 1.21.3
  • oorandom 11.1.5
  • ordered-float 5.0.0
  • plotters 0.3.7
  • plotters-backend 0.3.7
  • plotters-svg 0.3.7
  • ppv-lite86 0.2.21
  • proc-macro2 1.0.101
  • quote 1.0.40
  • r-efi 5.3.0
  • rand 0.8.5
  • rand 0.9.2
  • rand_chacha 0.9.0
  • rand_core 0.6.4
  • rand_core 0.9.3
  • rayon 1.11.0
  • rayon-core 1.13.0
  • regex 1.11.2
  • regex-automata 0.4.10
  • regex-syntax 0.8.6
  • rustversion 1.0.22
  • ryu 1.0.20
  • same-file 1.0.6
  • serde 1.0.219
  • serde_derive 1.0.219
  • serde_json 1.0.143
  • smartcore 0.4.2
  • syn 2.0.106
  • tinytemplate 1.2.1
  • unicode-ident 1.0.18
  • walkdir 2.5.0
  • wasi 0.14.2+wasi-0.2.4
  • wasm-bindgen 0.2.100
  • wasm-bindgen-backend 0.2.100
  • wasm-bindgen-macro 0.2.100
  • wasm-bindgen-macro-support 0.2.100
  • wasm-bindgen-shared 0.2.100
  • web-sys 0.3.77
  • winapi-util 0.1.10
  • windows-link 0.1.3
  • windows-sys 0.60.2
  • windows-targets 0.53.3
  • windows_aarch64_gnullvm 0.53.0
  • windows_aarch64_msvc 0.53.0
  • windows_i686_gnu 0.53.0
  • windows_i686_gnullvm 0.53.0
  • windows_i686_msvc 0.53.0
  • windows_x86_64_gnu 0.53.0
  • windows_x86_64_gnullvm 0.53.0
  • windows_x86_64_msvc 0.53.0
  • wit-bindgen-rt 0.39.0
  • zerocopy 0.8.26
  • zerocopy-derive 0.8.26
Cargo.toml cargo
  • criterion 0.7.0 development
  • ordered-float 5.0.0
  • rand 0.9.2
  • smartcore ^0.4.2