probly-search ·

A full-text search library, written in Rust, optimized for insertion speed, that provides full control over the scoring calculations.

This start initially as a port of the Node library NDX.

Demo

Recipe (title) search with 50k documents.

https://quantleaf.github.io/probly-search-demo/

Features

Three ways to do scoring
- BM25 ranking function to rank matching documents. The same ranking function that is used by default in Lucene >= 6.0.0.
- zero-to-one, a library unique scoring function that provides a normalized score that is bounded by 0 and 1. Perfect for matching titles/labels with queries.
- Ability to fully customize your own scoring function by implenting the ScoreCalculator trait.
Trie based dynamic Inverted Index.
Multiple fields full-text indexing and searching.
Per-field score boosting.
Configurable tokenizer.
Free text queries with query expansion.
Fast allocation, but latent deletion.
WASM compatible

Documentation

Adding, Removing and Searching documents

See Integration tests.

Use this library with WASM

See recipe search demo project

A basic example

Creating an index with a document that has 2 fields. Query documents, and remove a document.

use std::collections::HashSet; use probly_search::{ index::Index, query::{ score::default::{bm25, zero_to_one}, QueryResult, }, }; // A white space tokenizer fn tokenizer(s: &str) -> Vec<Cow<str>> { s.split(' ').map(Cow::from).collect::<Vec<_>>() } // We have to provide extraction functions for the fields we want to index // Title fn title_extract(d: &Doc) -> Vec<&str> { vec![d.title.as_str()] } // Description fn description_extract(d: &Doc) -> Vec<&str> { vec![d.description.as_str()] } // Create index with 2 fields let mut index = Index::<usize>::new(2); // Create docs from a custom Doc struct let doc_1 = Doc { id: 0, title: "abc".to_string(), description: "dfg".to_string(), }; let doc_2 = Doc { id: 1, title: "dfgh".to_string(), description: "abcd".to_string(), }; // Add documents to index index.add_document( &[title_extract, description_extract], tokenizer, doc_1.id, &doc_1, ); index.add_document( &[title_extract, description_extract], tokenizer, doc_2.id, &doc_2, ); // Search, expected 2 results let mut result = index.query( &"abc", &mut bm25::new(), tokenizer, &[1., 1.], ); assert_eq!(result.len(), 2); assert_eq!( result[0], QueryResult { key: 0, score: 0.6931471805599453 } ); assert_eq!( result[1], QueryResult { key: 1, score: 0.28104699650060755 } ); // Remove documents from index index.remove_document(doc_1.id); // Vacuum to remove completely index.vacuum(); // Search, expect 1 result result = index.query( &"abc", &mut bm25::new(), tokenizer, &[1., 1.], ); assert_eq!(result.len(), 1); assert_eq!( result[0], QueryResult { key: 1, score: 0.1166450426074421 } );

Go through source tests in for the BM25 implementation and zero-to-one implementation for more query examples.

Testing

Run all tests with

cargo test

Benchmark

Run all benchmarks with

cargo bench

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github/workflows		.github/workflows
benches		benches
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

probly-search ·

Demo

Features

Documentation

Adding, Removing and Searching documents

Use this library with WASM

A basic example

Testing

Benchmark

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

probly-search ·

Demo

Features

Documentation

Adding, Removing and Searching documents

Use this library with WASM

A basic example

Testing

Benchmark

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages