Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: kolafish
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Size: 15.2 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 6
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Funding License Citation Authors

README.md

Docs Build Status codecov Join the chat at https://discord.gg/MT27AG5EVE License: MIT Crates.io

Tantivy, the fastest full-text search engine library written in Rust

Fast full-text search engine library written in Rust

If you are looking for an alternative to Elasticsearch or Apache Solr, check out Quickwit, our distributed search engine built on top of Tantivy.

Tantivy is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

Tantivy is, in fact, strongly inspired by Lucene's design.

Benchmark

The following benchmark breaks down the performance for different types of queries/collections.

Your mileage WILL vary depending on the nature of queries and their load.

Details about the benchmark can be found at this repository.

Features

  • Full-text search
  • Configurable tokenizer (stemming available for 17 Latin languages) with third party support for Chinese (tantivy-jieba and cang-jie), Japanese (lindera, Vaporetto, and tantivy-tokenizer-tiny-segmenter) and Korean (lindera + lindera-ko-dic-builder)
  • Fast (check out the :racehorse: :sparkles: benchmark :sparkles: :racehorse:)
  • Tiny startup time (<10ms), perfect for command-line tools
  • BM25 scoring (the same as Lucene)
  • Natural query language (e.g. (michael AND jackson) OR "king of pop")
  • Phrase queries search (e.g. "michael jackson")
  • Incremental indexing
  • Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
  • Mmap directory
  • SIMD integer compression when the platform/CPU includes the SSE2 instruction set
  • Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
  • &[u8] fast fields
  • Text, i64, u64, f64, dates, ip, bool, and hierarchical facet fields
  • Compressed document store (LZ4, Zstd, None)
  • Range queries
  • Faceted search
  • Configurable indexing (optional term frequency and position indexing)
  • JSON Field
  • Aggregation Collector: histogram, range buckets, average, and stats metrics
  • LogMergePolicy with deletes
  • Searcher Warmer API
  • Cheesy logo with a horse

Non-features

Distributed search is out of the scope of Tantivy, but if you are looking for this feature, check out Quickwit.

Getting started

Tantivy works on stable Rust and supports Linux, macOS, and Windows.

How can I support this project?

There are many ways to support this project.

  • Use Tantivy and tell us about your experience on Discord or by email (paul.masurel@gmail.com)
  • Report bugs
  • Write a blog post
  • Help with documentation by asking questions or submitting PRs
  • Contribute code (you can join our Discord server)
  • Talk about Tantivy around you

Contributing code

We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR. Feel free to update CHANGELOG.md with your contribution.

Tokenizer

When implementing a tokenizer for tantivy depend on the tantivy-tokenizer-api crate.

Clone and build locally

Tantivy compiles on stable Rust. To check out and run tests, you can simply run:

bash git clone https://github.com/quickwit-oss/tantivy.git cd tantivy cargo test

Companies Using Tantivy

Etsy   ParadeDB   Nuclia   Humanfirst.ai Element.io Nuclia   Humanfirst.ai    Element.io

FAQ

Can I use Tantivy in other languages?

You can also find other bindings on GitHub but they may be less maintained.

What are some examples of Tantivy use?

  • seshat: A matrix message database/indexer
  • tantiny: Tiny full-text search for Ruby
  • lnx: adaptable, typo tolerant search engine with a REST API
  • and more!

On average, how much faster is Tantivy compared to Lucene?

Does tantivy support incremental indexing?

  • Yes.

How can I edit documents?

  • Data in tantivy is immutable. To edit a document, the document needs to be deleted and reindexed.

When will my documents be searchable during indexing?

  • Documents will be searchable after a commit is called on an IndexWriter. Existing IndexReaders will also need to be reloaded in order to reflect the changes. Finally, changes are only visible to newly acquired Searcher.

Owner

  • Name: Yu Jin
  • Login: kolafish
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - alias: Quickwit Inc.
    website: "https://quickwit.io"
title: "tantivy"
version: 0.22.0
doi: 10.5281/zenodo.13942948
date-released: 2024-10-17
url: "https://github.com/quickwit-oss/tantivy"

GitHub Events

Total
  • Push event: 23
  • Pull request event: 5
  • Create event: 4
Last Year
  • Push event: 23
  • Pull request event: 5
  • Create event: 4

Dependencies

.github/workflows/coverage.yml actions
  • Swatinem/rust-cache v2 composite
  • actions/checkout v4 composite
  • codecov/codecov-action v3 composite
  • taiki-e/install-action cargo-llvm-cov composite
.github/workflows/long_running.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v4 composite
.github/workflows/test.yml actions
  • Swatinem/rust-cache v2 composite
  • actions-rs/clippy-check v1 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v4 composite
  • taiki-e/install-action nextest composite
Cargo.toml cargo
  • binggan 0.14.0 development
  • fail 0.5.0 development
  • futures 0.3.21 development
  • maplit 1.0.2 development
  • matches 0.1.9 development
  • more-asserts 0.3.1 development
  • paste 1.0.11 development
  • postcard 1.0.4 development
  • pretty_assertions 1.2.1 development
  • proptest 1.0.0 development
  • rand 0.8.5 development
  • rand_distr 0.4.3 development
  • test-log 0.2.10 development
  • time 0.3.10 development
  • aho-corasick 1.0
  • arc-swap 1.5.0
  • base64 0.22.0
  • bitpacking 0.9.2
  • bon 3.3.1
  • byteorder 1.4.3
  • census 0.4.2
  • columnar 0.5
  • common 0.9
  • crc32fast 1.3.2
  • crossbeam-channel 0.5.4
  • downcast-rs 2.0.1
  • fail 0.5.0
  • fastdivide 0.4.0
  • fnv 1.0.7
  • fs4 0.13.1
  • futures-channel 0.3.28
  • futures-util 0.3.28
  • htmlescape 0.3.1
  • hyperloglogplus 0.4.1
  • itertools 0.14.0
  • levenshtein_automata 0.2.1
  • log 0.4.16
  • lru 0.12.0
  • lz4_flex 0.11
  • measure_time 0.9.0
  • memmap2 0.9.0
  • once_cell 1.10.0
  • oneshot 0.1.7
  • query-grammar 0.24.0
  • rayon 1.5.2
  • regex 1.5.5
  • rust-stemmers 1.2.0
  • rustc-hash 2.0.0
  • serde 1.0.219
  • serde_json 1.0.140
  • sketches-ddsketch 0.3.0
  • smallvec 1.8.0
  • sstable 0.5
  • stacker 0.5
  • tantivy-bitpacker 0.8
  • tantivy-fst 0.5
  • tempfile 3.12.0
  • thiserror 2.0.1
  • time 0.3.35
  • tokenizer-api 0.5
  • uuid 1.0.0
  • zstd 0.13
bitpacker/Cargo.toml cargo
  • proptest 1 development
  • rand 0.8 development
  • bitpacking 0.9.2
columnar/Cargo.toml cargo
  • binggan 0.14.0 development
  • more-asserts 0.3.1 development
  • proptest 1 development
  • rand 0.8 development
  • common 0.9
  • downcast-rs 2.0.1
  • fastdivide 0.4.0
  • itertools 0.14.0
  • serde 1.0.152
  • sstable 0.5
  • stacker 0.5
  • tantivy-bitpacker 0.8
columnar/columnar-cli/Cargo.toml cargo
columnar/columnar-cli-inspect/Cargo.toml cargo
common/Cargo.toml cargo
  • binggan 0.14.0 development
  • proptest 1.0.0 development
  • rand 0.8.4 development
  • async-trait 0.1
  • byteorder 1.4.3
  • ownedbytes 0.9
  • serde 1.0.136
  • time 0.3.10
ownedbytes/Cargo.toml cargo
query-grammar/Cargo.toml cargo
sstable/Cargo.toml cargo
  • criterion 0.5 development
  • names 0.14 development
  • proptest 1 development
  • rand 0.8 development
  • common 0.9
  • futures-util 0.3.30
  • itertools 0.14.0
  • tantivy-bitpacker 0.8
  • tantivy-fst 0.5
  • zstd 0.13
stacker/Cargo.toml cargo
  • binggan 0.14.0 development
  • proptest 1.2.0 development
  • rand 0.8.5 development
  • rustc-hash 2.1.0 development
  • zipf 7.0.0 development
  • ahash 0.8.11
  • common 0.9
  • murmurhash32 0.3
  • rand_distr 0.4.3
stacker/fuzz_test/Cargo.toml cargo
tokenizer-api/Cargo.toml cargo
tests/compat_tests_data/index_v6/meta.json cpan
tests/compat_tests_data/index_v7/meta.json cpan