SOLVINGTEXT SEARCH PROBLEMS WITH RUBY ON RAILS by Andrii Gladkyi
INTRODUCTION
ME • 15 years of dev experience • Desktop (Delphi/C#/C++) • Web (Ruby/JavaScript) • 5 Ruby projects having text search solutions
AGENDA • Full text search • Phrase match • Filters/Facets
FULLTEXT SEARCH • Search for a phrase within document • Partial matches • Order by relevance • Phrase highlights • Similar matches • Typos correction
SOLUTIONS • RDBMS search • Sphinx • ElasticSearch
RDBMS SEARCH • No external dependencies • Relatively slow • Provides only basic FTS features (PostgreSQL)
EXAMPLE ALTER TABLE documents ADD COLUMN fts_col tsvector; CREATE INDEX fts_idx ON documents USING GIN (fts_col); UPDATE documents SET fts_col = to_tsvector(title || ' ' || content); SELECT * FROM documents WHERE fts_col @@ to_tsquery('text to find'); Requirement: Search within document's title and content Implementation:
PG_SEARCH GEM github.com/casecommons/pg_search
PG_SEARCH HIGHLIGHTS • Actively maintained • 99% test coverage • Dependencies:ActiveRecord 4.2+,ActiveSupport • Single/multimodel search
QUICK START class Document < ActiveRecord::Base include PgSearch pg_search_scope :search_full_text, against: { title: 'A', content: 'B' } end Document.search_full_text('text to find') Single model search
REVIEW • Simple setup • No external dependencies • AR-compatible output • PostgreSQL extensions • Order by relevance • PostgreSQL only :) • Multimodel indexex need to be rebuilt • Only basic FTS features Pros: Cons:
SPHINX SEARCH ENGINE • RDBMS connections • MySQL storage engine option • SQL-like queries • Facets
THINKING_SPHINX GEM github.com/pat/thinking-sphinx
THINKING_SPHINX HIGHLIGHTS • Very mature (~10 years) project • Supports ActiveRecord 3.1+ • Well documented • Requires mysql gem to be installed
QUICK START ThinkingSphinx::Index.define :document, with: :real_time do indexes title indexes content end after_save ThinkingSphinx::RealTime.callback_for(:document) rake ts:regenerate Document.search('text to find')
REVIEW • Field weights • Facets • Advanced filters • Different indexing strategies (realtime and SQL) • Deltas for SQL-backed indexes • Delta indexes may cause data inconsistency Pros: Cons:
ELASTICSEARCH ENGINE • REST HTTP interface • Scalable • Aggregations • Powerful mappings
SEARCHKICK GEM github.com/ankane/searchkick
SEARCHKICK HIGHLIGHTS • Developed for own needs • AR-like query language • Supports ActiveModel 4.1+ • Zero downtime reindex
QUICK START class Document < ActiveRecord::Base searchkick end Document.reindex Document.search('text to find')
REVIEW • Tons of features • Bulk document updates • Autocomplete • Facets • Very opinionated development • Documentation issues • Default setup doesn't match any practical requirements... • ... therefore a reconfiguration is a must Pros: Cons:
SEARCHKICK ALTERNATIVES github.com/elastic/elasticsearch-rails github.com/toptal/chewy
CONCLUSION • Use RDBMS search for a simple search within a small and defined set of documents • Want to scale/advanced features - try ElasticSearch or Sphinx
PHRASE MATCH • Search by exact name • No irrelevant matches
RANSACK GEM github.com/activerecord-hackery/ransack
RANSACK HIGHLIGHTS • Works on the top of RDBMS search/filtering • Case insensitive match by default • Able to build search forms • Rails 3-5.1 compatible
QUICK START def index @q = Document.ransack(params[:q]) @documents = @q.result(distinct: true) end <%= search_form_for @q do |f| %> <%= f.label :title_cont %> <%= f.search_field :title_cont %> <%= f.submit %> <% end %> SELECT * FROM documents WHERE title ILIKE '%title%'; Executes:
MATCHERS
HOWTO SPEED UPTHE SEARCH? (in PostgreSQL)
HOWTO SPEED UPTHE SEARCH? (in PostgreSQL + Ruby on Rails) class IndexDocuments < ActiveRecord::Migration[5.1] def change enable_extension 'pg_trgm' add_index(:documents, 'title gin_trgm_ops', using: :gin) end end
REVIEW • Search/filter forms • Simple AR-compatible interface • Sort helpers • Powerful matches • Good documentation • Memory consumption issues • No out-of-box ranking (may be implemented manually) Pros: Cons:
FACETS • Hide irrelevant facets • Count documents • Filter by facet value
FACETS IMPLEMENTATION • Sphinx, ElasticSearch - built in • Must implement manually when RDBMS search
CONCLUSIONS
QUESTIONS

Solving text search problems with Ruby on Rails