Sunspot

Sunspot is a Ruby library for expressive, powerful interaction with the Solr search engine. Sunspot is built on top of the RSolr library, which provides a low-level interface for Solr interaction; Sunspot provides a simple, intuitive, expressive DSL backed by powerful features for indexing objects and searching for them.

Sunspot is designed to be easily plugged in to any ORM, or even non-database-backed objects such as the filesystem.

This README provides a high level overview; class-by-class and method-by-method documentation is available in the API reference.

For questions about how to use Sunspot in your app, please use the Sunspot Mailing List or search Stack Overflow.

Quickstart with Rails

Add to Gemfile:

gem 'sunspot_rails' gem 'sunspot_solr' # optional pre-packaged Solr distribution for use in development. Not for use in production.

Bundle it!

bundle install

Generate a default configuration file:

rails generate sunspot_rails:install

If sunspot_solr was installed, start the packaged Solr distribution with:

bundle exec rake sunspot:solr:start # or sunspot:solr:run to start in foreground

This will generate a /solr folder with default configuration files and indexes.

If you're using source control, it's recommended that the files generated for indexing and running (PIDs) are not checked in. You can do this by adding the following lines to .gitignore:

solr/data solr/test/data solr/development/data solr/default/data solr/pids

Setting Up Objects

Add a searchable block to the objects you wish to index.

class Post < ActiveRecord::Base searchable do text :title, :body text :comments do comments.map { |comment| comment.body } end boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end

text fields will be full-text searchable. Other fields (e.g., integer and string) can be used to scope queries.

Nested Documents support

You can use the child_documents feature to add nested documents to other models, as such:

class Comment < ActiveRecord::Base searchable do # NOTE: this is not necessary, Solr uses '_root_' field to refer to the parent document integer :post_id # ------------------------------------------------------------------------------------- integer :author_id text :body time :published_at end end class Post < ActiveRecord::Base searchable do text :title, :body child_documents :comments # Must be of type Comment boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end

Please note, you should always use an Array of searchable documents in the child_documents field.

Searching Objects

Post.search do fulltext 'best pizza' with :blog_id, 1 with(:published_at).less_than Time.now field_list :blog_id, :title order_by :published_at, :desc paginate :page => 2, :per_page => 15 facet :category_ids, :author_id end

Search In Depth

Given an object Post setup in earlier steps ...

Full Text

# All posts with a `text` field (:title, :body, or :comments) containing 'pizza' Post.search { fulltext 'pizza' } # Posts with pizza, scored higher if pizza appears in the title Post.search do fulltext 'pizza' do boost_fields :title => 2.0 end end # Posts with pizza, scored higher if featured Post.search do fulltext 'pizza' do boost(2.0) { with(:featured, true) } end end # Posts with pizza *only* in the title Post.search do fulltext 'pizza' do fields(:title) end end # Posts with pizza in the title (boosted) or in the body (not boosted) Post.search do fulltext 'pizza' do fields(:body, :title => 2.0) end end

Phrases

Solr allows searching for phrases: search terms that are close together.

In the default query parser used by Sunspot (edismax), phrase searches are represented as a double quoted group of words.

# Posts with the exact phrase "great pizza" Post.search do fulltext '"great pizza"' end

If specified, query_phrase_slop sets the number of words that may appear between the words in a phrase.

# One word can appear between the words in the phrase, so "great big pizza" # also matches, in addition to "great pizza" Post.search do fulltext '"great pizza"' do query_phrase_slop 1 end end

Phrase Boosts

Phrase boosts add boost to terms that appear in close proximity; the terms do not have to appear in a phrase, but if they do, the document will score more highly.

# Matches documents with great and pizza, and scores documents more # highly if the terms appear in a phrase in the title field Post.search do fulltext 'great pizza' do phrase_fields :title => 2.0 end end # Matches documents with great and pizza, and scores documents more # highly if the terms appear in a phrase (or with one word between them) # in the title field Post.search do fulltext 'great pizza' do phrase_fields :title => 2.0 phrase_slop 1 end end

Scoping (Scalar Fields)

Fields not defined as text (e.g., integer, boolean, time, etc...) can be used to scope (restrict) queries before full-text matching is performed.

Positive Restrictions

# Posts with a blog_id of 1 Post.search do with(:blog_id, 1) end # Posts with an average rating between 3.0 and 5.0 Post.search do with(:average_rating, 3.0..5.0) end # Posts with a category of 1, 3, or 5 Post.search do with(:category_ids, [1, 3, 5]) end # Posts published since a week ago Post.search do with(:published_at).greater_than(1.week.ago) end

Negative Restrictions

# Posts not in category 1 or 3 Post.search do without(:category_ids, [1, 3]) end # All examples in "positive" also work negated using `without`

Empty Restrictions

# Passing an empty array is equivalent to a no-op, allowing you to replace this... Post.search do with(:category_ids, id_list) if id_list.present? end # ...with this Post.search do with(:category_ids, id_list) end

Restrictions and Field List

# Posts with a blog_id of 1 Post.search do with(:blog_id, 1) field_list [:title] end Post.search do without(:category_ids, [1, 3]) field_list [:title, :author_id] end

Disjunctions and Conjunctions

# Posts that do not have an expired time or have not yet expired Post.search do any_of do with(:expired_at).greater_than(Time.now) with(:expired_at, nil) end end

# Posts with blog_id 1 and author_id 2 Post.search do all_of do with(:blog_id, 1) with(:author_id, 2) end end

# Posts scoring with any of the two fields. Post.search do any do fulltext "keyword1", :fields => :title fulltext "keyword2", :fields => :body end end

Disjunctions and conjunctions may be nested

Post.search do any_of do with(:blog_id, 1) all_of do with(:blog_id, 2) with(:category_ids, 3) end end any do all do fulltext "keyword", :fields => :title fulltext "keyword", :fields => :body end all do fulltext "keyword", :fields => :first_name fulltext "keyword", :fields => :last_name end fulltext "keyword", :fields => :description end end

Combined with Full-Text

Scopes/restrictions can be combined with full-text searching. The scope/restriction pares down the objects that are searched for the full-text term.

# Posts with blog_id 1 and 'pizza' in the title Post.search do with(:blog_id, 1) fulltext("pizza") end

Nested Documents search with `ChildOf` and `ParentWhich`

You can search for child (or parent) documents based on filters applied on parents (or children).

`ChildOf` operator

Using this operator, you can search for child documents using a filter on the parents.

# Search all children which has a parent named as specified below. Sunspot.search(Child) do child_of(Parent) do with(:name, 'FirstName LastName') end end

`ParentWhich` operator

Using this operator, you can search for parent documents using a filter on their children.

# Search all parents which have children # that are between 12 and 17 years old. Sunspot.search(Parent) do parent_which(Child) do with :age, 12..17 end end

Pagination

All results from Solr are paginated

The results array that is returned has methods mixed in that allow it to operate seamlessly with common pagination libraries like will_paginate and kaminari.

By default, Sunspot requests the first 30 results from Solr.

search = Post.search do fulltext "pizza" end # Imagine there are 60 *total* results (at 30 results/page, that is two pages) results = search.results # => Array with 30 Post elements search.total # => 60 results.total_pages # => 2 results.first_page? # => true results.last_page? # => false results.previous_page # => nil results.next_page # => 2 results.out_of_bounds? # => false results.offset # => 0

To retrieve the next page of results, recreate the search and use the paginate method.

search = Post.search do fulltext "pizza" paginate :page => 2 end # Again, imagine there are 60 total results; this is the second page results = search.results # => Array with 30 Post elements search.total # => 60 results.total_pages # => 2 results.first_page? # => false results.last_page? # => true results.previous_page # => 1 results.next_page # => nil results.out_of_bounds? # => false results.offset # => 30

A custom number of results per page can be specified with the :per_page option to paginate:

search = Post.search do fulltext "pizza" paginate :page => 1, :per_page => 50 end

Cursor-based pagination

Solr 4.7 and above

With default Solr pagination it may turn that same records appear on different pages (e.g. if many records have the same search score). Cursor-based pagination allows to avoid this.

Useful for any kinds of export, infinite scroll, etc.

Cursor for the first page is "*".

search = Post.search do fulltext "pizza" paginate :cursor => "*" end results = search.results # Results will contain cursor for the next page results.next_page_cursor # => "AoIIP4AAACxQcm9maWxlIDEwMTk=" # Imagine there are 60 *total* results (at 30 results/page, that is two pages) results.current_cursor # => "*" results.total_pages # => 2 results.first_page? # => true results.last_page? # => false

To retrieve the next page of results, recreate the search and use the paginate method with cursor from previous results.

search = Post.search do fulltext "pizza" paginate :cursor => "AoIIP4AAACxQcm9maWxlIDEwMTk=" end results = search.results # Again, imagine there are 60 total results; this is the second page results.next_page_cursor # => "AoEsUHJvZmlsZSAxNzY5" results.current_cursor # => "AoIIP4AAACxQcm9maWxlIDEwMTk=" results.total_pages # => 2 results.first_page? # => false # Last page will be detected only when current page contains less then per_page elements or contains nothing results.last_page? # => false

:per_page option is also supported.

Faceting

Faceting is a feature of Solr that determines the number of documents that match a given search and an additional criterion. This allows you to build powerful drill-down interfaces for search.

Each facet returns zero or more rows, each of which represents a particular criterion conjoined with the actual query being performed. For field facets, each row represents a particular value for a given field. For query facets, each row represents an arbitrary scope; the facet itself is just a means of logically grouping the scopes.

By default Sunspot will only return the first 100 facet values. You can increase this limit, or force it to return all facets by setting limit to -1.

Field Facets

# Posts that match 'pizza' returning counts for each :author_id search = Post.search do fulltext "pizza" facet :author_id end search.facet(:author_id).rows.each do |facet| puts "Author #{facet.value} has #{facet.count} pizza posts!" end

If you are searching by a specific field and you still want to see all the options available in that field you can exclude it in the faceting.

# Posts that match 'pizza' and author with id 42 # Returning counts for each :author_id (even those not in the search result) search = Post.search do fulltext "pizza" author_filter = with(:author_id, 42) facet :author_id, exclude: [author_filter] end search.facet(:author_id).rows.each do |facet| puts "Author #{facet.value} has #{facet.count} pizza posts!" end

Query Facets

# Posts faceted by ranges of average ratings search = Post.search do facet(:average_rating) do row(1.0..2.0) do with(:average_rating, 1.0..2.0) end row(2.0..3.0) do with(:average_rating, 2.0..3.0) end row(3.0..4.0) do with(:average_rating, 3.0..4.0) end row(4.0..5.0) do with(:average_rating, 4.0..5.0) end end end # e.g., # Number of posts with rating within 1.0..2.0: 2 # Number of posts with rating within 2.0..3.0: 1 search.facet(:average_rating).rows.each do |facet| puts "Number of posts with rating within #{facet.value}: #{facet.count}" end

Range Facets

# Posts faceted by range of average ratings Sunspot.search(Post) do facet :average_rating, :range => 1..5, :range_interval => 1 end

Json Facets

The json facet can be used with the following syntaxt:

Sunspot.search(Post) do json_facet(:title) end

There are some options you can pass to the json facet:

:limit :minimum_count :sort :prefix

Some examples

# limit the results to 10 Sunspot.search(Post) do json_facet(:title, limit: 10) end # returns only the results with a minimum count of 10 Sunspot.search(Post) do json_facet(:title, minimum_count: 10) end # sort by count Sunspot.search(Post) do json_facet(:title, sort: :count) end # filter titles by prefix 't' Sunspot.search(Post) do json_facet(:title, prefix: 't') end

Json Facet Distinct

The json facet count distinct can be used with the following syntaxt:

# Get posts with distinct title # available stategies: :unique, :hll Sunspot.search(Post) do json_facet(:blog_id, distinct: { group_by: :title, strategy: :unique }) end

Json Facet nested

The nested facets can be used with the following syntaxt:

Sunspot.search(Post) do json_facet(:title, nested: { field: :author_name } ) end

You can nest the nested facet also recursively:

Sunspot.search(Post) do json_facet(:title, nested: { field: :author_name, nested: { field: :title } ) end

Nested facets have the same options of json facets

BlockJoin Json Facet

You can use json_facet on children or parents, based on the data you are querying on.

`on_child` operator

Use this operator when searching on parent documents. Faceting is performed on child documents related to parents found in the query.

# Search for all books with specified title and facets # on the timestamp of reviews. # An additional filter on children can be specified inside the on_child operator.  Sunspot.search(Book) do fulltext 'awesome book title', fields: [:title] json_facet :review_date, block_join: (on_child(Review) do with(:review_date).greater_than(DateTime.parse('2015-01-01T00:00:00Z')) end) end

`on_parent` operator

Use this operator when searching on child documents. Faceting is performed on parents of the children found in the query.

# Search for all reviews of a particular author. # Perform faceting on the book category. Sunspot.search(Review) do with :author, 'yonik' # An empty block means no additional filters: takes all parents # of the selected children.  json_facet :category, block_join: on_parent(Book) {} end

Ordering

By default, Sunspot orders results by "score": the Solr-determined relevancy metric. Sorting can be customized with the order_by method:

# Order by average rating, descending Post.search do fulltext("pizza") order_by(:average_rating, :desc) end # Order by relevancy score and in the case of a tie, average rating Post.search do fulltext("pizza") order_by(:score, :desc) order_by(:average_rating, :desc) end # Randomized ordering Post.search do fulltext("pizza") order_by(:random) end

Solr 3.1 and above

Solr supports sorting on multiple fields using custom functions. Supported operators and more details are available on the Solr Wiki

To sort results by a custom function use the order_by_function method. Functions are defined with prefix notation:

# Order by sum of two example fields: rating1 + rating2 Post.search do fulltext("pizza") order_by_function(:sum, :rating1, :rating2, :desc) end # Order by nested functions: rating1 + (rating2*rating3) Post.search do fulltext("pizza") order_by_function(:sum, :rating1, [:product, :rating2, :rating3], :desc) end # Order by fields and constants: rating1 + (rating2 * 5) Post.search do fulltext("pizza") order_by_function(:sum, :rating1, [:product, :rating2, '5'], :desc) end # Order by average of three fields: (rating1 + rating2 + rating3) / 3 Post.search do fulltext("pizza") order_by_function(:div, [:sum, :rating1, :rating2, :rating3], '3', :desc) end

Grouping

Solr 3.3 and above

Solr supports grouping documents, similar to an SQL GROUP BY. More information about result grouping/field collapsing is available on the Solr Wiki.

Grouping is only supported on string fields that are not multivalued. To group on a field of a different type (e.g., integer), add a denormalized string type

class Post < ActiveRecord::Base searchable do # Denormalized `string` field because grouping can only be performed # on string fields string(:blog_id_str) { |p| p.blog_id.to_s } end end # Returns only the top scoring document per blog_id search = Post.search do group :blog_id_str end search.group(:blog_id_str).matches # Total number of matches to the query search.group(:blog_id_str).groups.each do |group| puts group.value # blog_id of the each document in the group # By default, there is only one document per group (the highest # scoring one); if `limit` is specified (see below), multiple # documents can be returned per group group.results.each do |result| # ... end end

Additional options are supported by the DSL:

# Returns the top 3 scoring documents per blog_id Post.search do group :blog_id_str do limit 3 ngroups false # If you don't need the total groups counter end end # Returns document ordered within each group by published_at (by # default, the ordering is score) Post.search do group :blog_id_str do order_by(:average_rating, :desc) end end # Facet count is based on the most relevant document of each group # matching the query (>= Solr 3.4) Post.search do group :blog_id_str do truncate end facet :blog_id_str, :extra => :any end

Grouping by Queries

It is also possible to group by arbitrary queries instead of on a specific field, much like using query facets instead of field facets. For example, we can group by average rating.

# Returns the top post for each range of average ratings search = Post.search do group do query("1.0 to 2.0") do with(:average_rating, 1.0..2.0) end query("2.0 to 3.0") do with(:average_rating, 2.0..3.0) end query("3.0 to 4.0") do with(:average_rating, 3.0..4.0) end query("4.0 to 5.0") do with(:average_rating, 4.0..5.0) end end end search.group(:queries).matches # Total number of matches to the queries search.group(:queries).groups.each do |group| puts group.value # The argument to query - "1.0 to 2.0", for example group.results.each do |result| # ... end end

This can also be used to query multivalued fields, allowing a single item to be in multiple groups.

# This finds the top 10 posts for each category in category_ids. search = Post.search do group do limit 10 category_ids.each do |category_id| query category_id do with(:category_id, category_id) end end end end

Geospatial

Sunspot 2.0 only

Sunspot 2.0 supports geospatial features of Solr 3.1 and above.

Geospatial features require a field defined with latlon:

class Post < ActiveRecord::Base searchable do # ... latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) } end end

Filter By Radius

# Searches posts within 100 kilometers of (32, -68) Post.search do with(:location).in_radius(32, -68, 100) end

Filter By Radius (inexact with bbox)

# Searches posts within 100 kilometers of (32, -68) with `bbox`. This is # an approximation so searches run quicker, but it may include other # points that are slightly outside of the required distance Post.search do with(:location).in_radius(32, -68, 100, :bbox => true) end

Filter By Bounding Box

# Searches posts within the bounding box defined by the corners (45, # -94) to (46, -93) Post.search do with(:location).in_bounding_box([45, -94], [46, -93]) end

Sort By Distance

# Orders documents by closeness to (32, -68) Post.search do order_by_geodist(:location, 32, -68) end

Joins

Solr 4 and above

Solr joins allow you to filter objects by joining on additional documents. More information can be found on the Solr Wiki.

class Photo < ActiveRecord::Base searchable do text :description string :caption, :default_boost => 1.5 time :created_at integer :photo_container_id end end class PhotoContainer < ActiveRecord::Base searchable do text :name join(:description, :target => Photo, :type => :text, :join => { :from => :photo_container_id, :to => :id }) join(:caption, :target => Photo, :type => :string, :join => { :from => :photo_container_id, :to => :id }) join(:photos_created, :target => Photo, :type => :time, :join => { :from => :photo_container_id, :to => :id }, :as => 'created_at_d') end end PhotoContainer.search do with(:caption, 'blah') with(:photos_created).between(Date.new(2011,3,1)..Date.new(2011,4,1)) fulltext("keywords", :fields => [:name, :description]) end # ...or PhotoContainer.search do with(:caption, 'blah') with(:photos_created).between(Date.new(2011,3,1)..Date.new(2011,4,1)) any do fulltext("keyword1", :fields => :name) fulltext("keyword2", :fields => :description) # will be joined from the Photo model end end

If your models have fields with the same name

class Tweet < ActiveRecord::Base searchable do text :keywords integer :profile_id end end class Rss < ActiveRecord::Base searchable do text :keywords integer :profile_id end end class Profile < ActiveRecord::Base searchable do text :name join(:keywords, :prefix => "tweet", :target => Tweet, :type => :text, :join => { :from => :profile_id, :to => :id }) join(:keywords, :prefix => "rss", :target => Rss, :type => :text, :join => { :from => :profile_id, :to => :id }) end end Profile.search do any do fulltext("keyword1 keyword2", :fields => [:tweet_keywords]) do minimum_match 1 end fulltext("keyword3", :fields => [:rss_keywords]) end end # ...produces: # sort: "score desc", fl: "* score", start: 0, rows: 20, # fq: ["type:Profile"], # q: (_query_:"{!join from=profile_ids_i to=id_i v=$qTweet91755700}" OR _query_:"{!join from=profile_ids_i to=id_i v=$qRss91753840}"), # qTweet91755700: _query_:"{!field f=type}Tweet"+_query_:"{!edismax qf='keywords_text' mm='1'}keyword1 keyword2", # qRss91753840: _query_:"{!field f=type}Rss"+_query_:"{!edismax qf='keywords_text'}keyword3"

Composite ID

SolrCloud only

If you use the compositeId router (the default), you can send documents with a prefix in the document ID which will be used to calculate the hash Solr uses to determine the shard a document is sent to for indexing. The prefix can be anything you’d like it to be (it doesn’t have to be the shard name, for example), but it must be consistent so Solr behaves consistently.

For example, if you want to co-locate documents for a customer, you could use the customer name or ID as the prefix. If your customer is IBM, for example, with a document with the ID 12345, you would insert the prefix into the document id field: IBM!12345. The exclamation mark (!) is critical here, as it distinguishes the prefix used to determine which shard to direct the document to.

class Post < ActiveRecord::Base searchable do id_prefix "IBM!" # ... end end

The compositeId router supports prefixes containing up to 2 levels of routing. For example: a prefix routing first by region, then by customer: USA!IBM!12345

class Post < ActiveRecord::Base searchable do id_prefix "USA!IBM!" # ... end end

Usage with Joins

This feature is also useful with joins, which require joined collections to be single-sharded. For example, if you have Blog and Post models and want to join fields from Posts when searching Blogs, you need these two collections to stay on the same shard. In this case the configuration would be:

class Blog < ActiveRecord::Base has_many :posts searchable do id_prefix "BLOGDATA!" # ... end end class Post < ActiveRecord::Base belongs_to :blog searchable do id_prefix "BLOGDATA!" # ... end end

As a result, all Blogs and Posts will be stored on a single shard. But since other Blogs will generate other prefixes Solr will distribute them evenly across the available shards.

If you have large collections that you want to use joins with and still want to utilize sharding instead of storing everything on a single shard, it's also possible to only ensure a single Blog and its associated Posts stored on a signle shard, while the whole collections could still be distributed across multiple shards. The thing is that Solr can do distributed joins across multiple shards, but the records that have to be joined should be stored on a single shard. To achieve this your configuration would look like this:

class Blog < ActiveRecord::Base has_many :posts searchable do id_prefix do "BLOGDATA#{self.id}!" end # ... end end class Post < ActiveRecord::Base belongs_to :blog searchable do id_prefix do "BLOGDATA#{self.blog_id}!" end # ... end end

This way a single Blog and its Ports have the same ID prefix and will go to a single Shard.

NOTE: Solr developers also recommend adjusting replication factor so every shard node contains replicas of all shards in the cluster. If you have 4 shards on separate nodes each of these nodes should have 4 replicas (one replica of each shard).

More information and usage examples could be found here: https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html

Highlighting

Highlighting allows you to display snippets of the part of the document that matched the query.

The fields you wish to highlight must be stored.

class Post < ActiveRecord::Base searchable do # ... text :body, :stored => true end end

Highlighting matches on the body field, for instance, can be achieved like:

search = Post.search do fulltext "pizza" do highlight :body end end # Will output something similar to: # Post #1 # I really love *pizza* # *Pizza* is my favorite thing # Post #2 # Pepperoni *pizza* is delicious search.hits.each do |hit| puts "Post ##{hit.primary_key}" hit.highlights(:body).each do |highlight| puts " " + highlight.format { |word| "*#{word}*" } end end

Stats

Solr can return some statistics on indexed numeric fields. Fetching statistics for average_rating:

search = Post.search do stats :average_rating end puts "Minimum average rating: #{search.stats(:average_rating).min}" puts "Maximum average rating: #{search.stats(:average_rating).max}"

Stats on multiple fields

search = Post.search do stats :average_rating, :blog_id end

Faceting on stats

It's possible to facet field stats on another field:

search = Post.search do stats :average_rating do facet :featured end end search.stats(:average_rating).facet(:featured).rows.each do |row| puts "Minimum average rating for featured=#{row.value}: #{row.min}" end

Take care when requesting facets on a stats field, since all facet results are returned by Solr!

Json facets stats

search = Post.search do stats :average_rating do json_facet :featured end end search.json_facet_stats(:featured).rows.each do |row| puts "Minimum average rating for featured=#{row.value}: #{row.min}" end

BlockJoin Json Facet stats

You can perform statistics on block join facets using the json_facet feature.

For example, let's say we have Books as parent documents, and Reviews on those books as child documents.

We want to know the average rating stars given by a particular user on all books from 1984.

search = Sunspot.search(Book) do with(:pub_year).greater_than(1983) # The :on parameter is needed here! # It must match the type specified in :block_join stats :stars, sort: :avg, on: Review do json_facet :author_name, block_join: (on_child(Review) do with :author_name, 'serious_reviewer1967' end) end end

Solr will execute the query, selecting all Books with pub_year from 1984.

Then, facets on the author_name values present in the Review documents that are children of the Books found.
In this case, we'll have just one facet.

At last, executes statistics on the generated facet.

Multiple stats and selective faceting

search = Post.search do stats :average_rating do facet :featured end stats :blog_id do facet :average_rating end end

Functions

Functions in Solr make it possible to dynamically compute values for each document. This gives you more flexability and you don't have to only deal with static values. For more details, please read Fuction Query documentation.

Sunspot supports functions in two ways:

You can use functions to dynamically count boosting for field:

#Posts with pizza, scored higher (square promotion field) if is_promoted Post.search do fulltext 'pizza' do boost(function {sqrt(:promotion)}) { with(:is_promoted, true) } end end

You're able to use functions for ordering (see examples for order_by_function)

Atomic updates

Atomic Updates is a feature in Solr 4.0 that allows you to update on a field level rather than on a document level. This means that you can update individual fields without having to send the entire document to Solr with the un-updated fields values. For more details, please read Atomic Update documentation.

All fields of the model must be stored, otherwise non-stored values will be lost after an update.

class Post < ActiveRecord::Base searchable do # all fields stored text :body, :stored => true string :title, :stored => true end end post1 = Post.create #... post2 = Post.create #... # atomic update on class level Post.atomic_update post1.id => {title: 'A New Title'}, post2.id => {body: 'A New Body'} # atomic update on instance level post1.atomic_update body: 'A New Body', title: 'Another New Title'

More Like This

Sunspot can extract related items using more_like_this. When searching for similar items, you can pass a block with the following options:

fields :field_1[, :field_2, ...]
minimum_term_frequency ##
minimum_document_frequency ##
minimum_word_length ##
maximum_word_length ##
maximum_query_terms ##
boost_by_relevance true/false

class Post < ActiveRecord::Base searchable do # The :more_like_this option must be set to true text :body, :more_like_this => true end end post = Post.first results = Sunspot.more_like_this(post) do fields :body minimum_term_frequency 5 end

To use more_like_this you need to have the MoreLikeThis handler enabled in solrconfig.xml.

Example handler will look like this:

<requestHandler class="solr.MoreLikeThisHandler" name="/mlt"> <lst name="defaults"> <str name="mlt.mintf">1</str> <str name="mlt.mindf">2</str> </lst> </requestHandler>

Spellcheck

Solr supports spellchecking of search results against a dictionary. Sunspot supports turning on the spellchecker via the query DSL and parsing the response. Read the solr docs for more information on how this all works inside Solr.

Solr's default spellchecking engine expects to use a dictionary comprised of values from an indexed field. This tends to work better than a static dictionary file, since it includes proper nouns in your index. The default in sunspot's solrconfig.xml is textSpell (note that buildOnCommit isn't recommended in production):

<lst name="spellchecker"> <str name="name">default</str> <!-- change field to textSpell and use copyField in schema.xml to spellcheck multiple fields --> <str name="field">textSpell</str> <str name="buildOnCommit">true</str> </lst>

Define the textSpell field in your schema.xml.

<field name="textSpell" stored="false" type="textSpell" multiValued="true" indexed="true"/>

To get some data into your spellchecking field, you can use copyField in schema.xml:

<copyField source="*_text" dest="textSpell" /> <copyField source="*_s" dest="textSpell" />

copyField works before any analyzers you have set up on the source fields. You can add your own analyzer by customizing the textSpell field type in schema.xml:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>

It's dangerous to add too much to this analyzer chain. It runs before words are inserted into the spellcheck dictionary, which means the suggestions that come back from solr are post-analyzer. With the default above, that means all spelling suggestions will be lower-case.

Once you have solr configured, you can turn it on for a given query using the query DSL (see spellcheck_spec.rb for more examples):

search = Sunspot.search(Post) do keywords 'Cofee' spellcheck :count => 3 end

Access the suggestions via the spellcheck_suggestions or spellcheck_suggestion_for (for just the top one) methods:

search.spellcheck_suggestion_for('cofee') # => 'coffee' search.spellcheck_suggestions # => [{word: 'coffee', freq: 10}, {word: 'toffee', freq: 1}]

If you've turned on collation, you can also get that result:

search = Sunspot.search(Post) do keywords 'Cofee market' spellcheck :count => 3 end search.spellcheck_collation # => 'coffee market'

Indexes In Depth

TODO

Index-Time Boosts

To specify that a field should be boosted in relation to other fields for all queries, you can specify the boost at index time:

class Post < ActiveRecord::Base searchable do text :title, :boost => 5.0 text :body end end

Stored Fields

With the schema.xml version 1.6 the useDocValuesAsStored is true by default. This means that with a small effort you can keep an original (untokenized/unanalyzed) version of their contents in Solr.

Stored fields allow data to be retrieved without also hitting the underlying database (usually an SQL server). The store option using DocValues as stored is not like having the value really stored in the index, if you want to use
highlighting and more like this queries and atomic updates, remember to change the schema.xml according to this.

Stored fields (stored="true" in the schema) come at some performance cost in the Solr index, so use them wisely.

class Post < ActiveRecord::Base searchable do text :body, :stored => true end end # Retrieving stored contents without hitting the database Post.search.hits.each do |hit| puts hit.stored(:body) end

Please note that when you have stored fields declared, they are all going to be retrieved from Solr every time, even if you don't really need them. You can reduce returned stored dataset by using field lists, or you can skip all of them entirely:

Post.search do without_stored_fields end

Hits vs. Results

Sunspot simply stores the type and primary key of objects in Solr. When results are retrieved, those primary keys are used to load the actual object (usually from an SQL database).

# Using #results pulls in the records from the object-relational # mapper (e.g., ActiveRecord + a SQL server) Post.search.results.each do |result| puts result.body end

To access information about the results without querying the underlying database, use hits:

# Using #hits gives back all information requested from Solr, but does # not load the object from the object-relational mapper Post.search.hits.each do |hit| puts hit.stored(:body) end

If you need both the result (ORM-loaded object) and Hit (e.g., for faceting, highlighting, etc...), you can use the convenience method each_hit_with_result:

Post.search.each_hit_with_result do |hit, result| # ... end

Reindexing Objects

If you are using Rails, objects are automatically indexed to Solr as a part of the save callbacks.

There are a number of ways to index manually within Ruby:

# On a class itself Person.reindex Sunspot.commit # or commit(true) for a soft commit (Solr4) # On mixed objects Sunspot.index [post1, item2] Sunspot.index person3 Sunspot.commit # or commit(true) for a soft commit (Solr4) # With autocommit Sunspot.index! [post1, item2, person3]

If you make a change to the object's "schema" (code in the searchable block), you must reindex all objects so the changes are reflected in Solr:

bundle exec rake sunspot:reindex # or, to be specific to a certain model with a certain batch size: bundle exec rake sunspot:reindex[500,Post] # some shells will require escaping [ with \[ and ] with \] # to skip the prompt asking you if you want to proceed with the reindexing: bundle exec rake sunspot:reindex[,,true] # some shells will require escaping [ with \[ and ] with \]

Use Without Rails

TODO

Threading

The default Sunspot Session is not thread-safe. If used in a multi-threaded environment (such as sidekiq), you should configure Sunspot to use the ThreadLocalSessionProxy:

Sunspot.session = Sunspot::SessionProxy::ThreadLocalSessionProxy.new

Within a Rails app, to ensure your config/sunspot.yml settings are properly setup in this session you can use Sunspot::Rails.build_session to mirror the normal Sunspot setup process:

 session = Sunspot::Rails.build_session Sunspot::Rails::Configuration.new Sunspot.session = session

Manually Adjusting Solr Parameters

To add or modify parameters sent to Solr, use adjust_solr_params:

Post.search do adjust_solr_params do |params| params[:q] += " AND something_s:more" end end

Session Proxies

TODO

Type Reference

The following FieldTypes are used in sunspot. sunspot_solr will create schema.xml file inside Project for FieldType reference.

Configuration

Configure Sunspot by creating a config/sunspot.yml file or by setting a SOLR_URL or a WEBSOLR_URL environment variable. The defaults are as follows.

development: solr: hostname: localhost port: 8982 log_level: INFO test: solr: hostname: localhost port: 8981 log_level: WARNING

You may want to use SSL for production environments with a username and password. For example, set SOLR_URL to https://username:password@production.solr.example.com/solr.

You can examine the value of Sunspot::Rails.configuration at runtime.

Running Solr in production environment

sunspot_solr gem is a convenient way to start working with Solr in development. However, it is not suitable for production use. Below are some options for deploying Solr:

Standalone or
Docker Solr setup (also a good alternative for development)
Chef (can be used with solr 7 as well)
Ansible
Kubernetes This deploys a Zookeeper cluster so you will need to convert cores to collections in order to use it.

You can also use Docker Solr for development which, regardless of how you deploy in production, will let you match the version you have deployed in production with the version you develop against. This can simplify maintenance of your cores. See the examples directory for a suitable starting point for a core you can use.

You can run solr in a docker container with the following commands:

docker pull solr:7.7.2 docker run -p 8983:8983 solr:7.7.2 #Add -d to run it in the background

Or in a docker-compose environment:

solr: image: solr:7.7.2 ports: - "8983:8983" volumes: - ./solr/init:/docker-entrypoint-initdb.d/ - data:/opt/solr/server/solr/mycores restart: unless-stopped

where the ./solr/init directory contains a shell script that does any initial setup like downloading and unzipping your cores. In both cases, the solr images by default expects cores to be placed in /opt/solr/server/solr/mycores.

Development

Running Tests

To run all the specs just call rake from the library root folder. To run specs related to individual gems, consider using one of the following commands:

GEM=sunspot ci/travis.sh GEM=sunspot_rails ci/travis.sh GEM=sunspot_solr ci/travis.sh

To run test using Solr Cloud:

SOLR_MODE=cloud GEM=sunspot ci/travis.sh SOLR_MODE=cloud GEM=sunspot_rails ci/travis.sh SOLR_MODE=cloud GEM=sunspot_solr ci/travis.sh

Generating Documentation

Install the yard and redcarpet gems:

$ gem install yard redcarpet

Uninstall the rdiscount gem, if installed:

$ gem uninstall rdiscount

Generate the documentation from topmost directory:

$ yardoc -o docs */lib/**/*.rb - README.md

Tutorials and Articles

Using Sunspot, Websolr, and Solr on Heroku (mrdanadams)
Full Text Searching with Solr and Sunspot (Collective Idea)
Full-text search in Rails with Sunspot (Tropical Software Observations)
Sunspot: A Solr-Powered Search Engine for Ruby (Linux Magazine)
Sunspot Showed Me the Light (ben koonse)
RubyGems.org — A case study in upgrading to full-text search (Websolr)
How to Implement Spatial Search with Sunspot and Solr (Code Quest)
Sunspot 1.2 with Spatial Solr Plugin 2.0 (joelmats)
rails3 + heroku + sunspot : madness (anhaminha)
heroku + websolr + sunspot (Onemorecloud)
How to get full text search working with Sunspot (Hobo Cookbook)
Full text search with Sunspot in Rails (hemju)
Using Sunspot for Free-Text Search with Redis (While I Pondered...)
Default scope with Sunspot (Cloudspace)
Index External Models with Sunspot/Solr (Medihack)
Testing with Sunspot and Cucumber (Collective Idea)
The Saga of the Switch (mrb -- includes comparison of Sunspot and Ultrasphinx)
Conditional Indexing with Sunspot (mikepack)
Introduction to Full Text Search for Rails Developers (Valve's)

Name		Name	Last commit message	Last commit date
Latest commit History 2,389 Commits
ci		ci
examples/solr7_core		examples/solr7_core
sunspot		sunspot
sunspot_rails		sunspot_rails
sunspot_solr		sunspot_solr
tools		tools
.gitignore		.gitignore
.solargraph.yml		.solargraph.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
docker-compose.yml		docker-compose.yml

License

extendi/sunspot

Folders and files

Latest commit

History

Repository files navigation

Sunspot

Quickstart with Rails

Setting Up Objects

Nested Documents support

Searching Objects

Search In Depth

Full Text

Phrases

Phrase Boosts

Scoping (Scalar Fields)

Positive Restrictions

Negative Restrictions

Empty Restrictions

Restrictions and Field List

Disjunctions and Conjunctions

Combined with Full-Text

Nested Documents search with ChildOf and ParentWhich

ChildOf operator

ParentWhich operator

Pagination

Cursor-based pagination

Faceting

Field Facets

Query Facets

Range Facets

Json Facets

Json Facet Distinct

Json Facet nested

BlockJoin Json Facet

on_child operator

on_parent operator

Ordering

Grouping

Grouping by Queries

Geospatial

Filter By Radius

Filter By Radius (inexact with bbox)

Filter By Bounding Box

Sort By Distance

Joins

If your models have fields with the same name

Composite ID

Highlighting

Stats

Stats on multiple fields

Faceting on stats

Json facets stats

BlockJoin Json Facet stats

Multiple stats and selective faceting

Functions

Atomic updates

More Like This

Spellcheck

Indexes In Depth

Index-Time Boosts

Stored Fields

Hits vs. Results

Reindexing Objects

Use Without Rails

Threading

Manually Adjusting Solr Parameters

Session Proxies

Type Reference

Configuration

Running Solr in production environment

Development

Running Tests

Generating Documentation

Tutorials and Articles

License

About

Resources

License

Uh oh!

Nested Documents search with `ChildOf` and `ParentWhich`

`ChildOf` operator

`ParentWhich` operator

`on_child` operator

`on_parent` operator

Packages