Elasticsearch And Ruby Karel Minařík
http://karmi.cz Elasticsearch and Ruby
{elasticsearch in a nutshell} Built on top of Apache Lucene Searching and analyzing big data Scalability REST API, JSON DSL Great fit for dynamic languages and web-oriented workflows / architectures http://www.elasticsearch.org Elasticsearch and Ruby
{ } Elasticsearch and Ruby
{ } It all started in this gist… (< 200 LOC) Elasticsearch and Ruby
{ } Elasticsearch and Ruby
Example class Results include Enumerable attr_reader :query, :curl, :time, :total, :results, :facets def initialize(search) response = JSON.parse( Slingshot.http.post("http://localhost:9200/#{search.indices}/_search", search.to_json) ) @query = search.to_json @curl = %Q|curl -X POST "http://localhost:9200/#{search.indices}/_search?pretty" -d '#{@query}'| @time = response['took'] @total = response['hits']['total'] @results = response['hits']['hits'] @facets = response['facets'] end def each(&block) @results.each(&block) end end Elasticsearch plays nicely with Ruby… Elasticsearch and Ruby
elasticsearch’s Query DSL curl  -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "filtered"  :  {            "filter"  :  {                "range"  :  {                    "date"  :  {                        "from"  :  "2012-­‐01-­‐01",                        "to"      :  "2012-­‐12-­‐31"                    }                }            },            "query"  :  {                "bool"  :  {                    "must"  :  {                        "terms"  :  {                            "tags"  :  [  "ruby",  "python"  ]                        }                    },                    "must"  :  {                        "match"  :  {                            "title"  :  {                                "query"  :  "conference",                                "boost"  :  10.0                            }                        }                    }                }            }        }    } }
Example Tire.search('articles') do query do boolean do must { terms :tags, ['ruby', 'python'] } must { string 'published_on:[2011-01-01 TO 2011-01-02]' } end end end Elasticsearch and Ruby
Example tags_query = lambda do |boolean| boolean.must { terms :tags, ['ruby', 'python'] } end published_on_query = lambda do |boolean| boolean.must { string 'published_on:[2011-01-01 TO 2011-01-02]' } end Tire.search('articles') do query { boolean &tags_query } end Tire.search('articles') do query do boolean &tags_query boolean &published_on_query end end Elasticsearch and Ruby
Example search = Tire.search 'articles' do query do string 'title:T*' end filter :terms, tags: ['ruby'] facet 'tags', terms: tags sort { by :title, 'desc' } end search = Tire::Search::Search.new('articles') search.query { string('title:T*') } search.filter :terms, :tags => ['ruby'] search.facet('tags') { terms :tags } search.sort { by :title, 'desc' } Elasticsearch and Ruby
TEH PROBLEM Designing the Tire library as domain-specific language, from the higher level, and consequently doing a lot of mistakes in the lower levels. ‣ Class level settings (Tire.configure); cannot connect to two elasticsearch clusters in one codebase ‣ Inconsistent access (methods vs Hashes) ‣ Not enough abstraction and separation of concerns Elasticsearch and Ruby
”Blocks with arguments” (alternative DSL syntax) Tire.search do query do text :name, params[:q] end end Tire.search do |search| search.query do |query| query.text :name, params[:q] end end Elasticsearch and Ruby
The Git(Hub) (r)evolution ‣ Lots of contributions... but less feedback ‣ Many contributions focus on specific use case ‣ Many contributions don’t take the bigger picture and codebase conventions into account ‣ Almost every patch needs to be processed, polished, amended ‣ Maintainer: lots of curation, less development — even on this small scale (2K LOC, 7K LOT) ‣ Contributors very eager to code, but a bit afraid to talk
Tire’s Ruby on Rails integration $  rails  new  myapp        -­‐m  "https://raw.github.com/karmi/tire/master/examples/rails-­‐application-­‐template.rb" ‣ Generate a fully working Rails application with a single command ‣ Downloads elasticsearch if not running, creates the application, commits every step, seeds the example data, launches the application on a free port, … ‣ Tire::Results::Item fully compatible with Rails view / URL helpers ‣ Any ActiveModel compatible OxM supported ‣ Rake task for importing data (using pagination libraries) Elasticsearch and Ruby
Rails integration baked in ‣ No proper separation of concerns / layers ‣ People expect everything to be as easy as that ‣ Tire::Results::Item baked in, not opt-in, masquerades as models ‣ People consider ActiveRecord the only OxM in the world Elasticsearch and Ruby
… Persistence extension Rails extensions ActiveRecord extensions ActiveModel integration The Ruby DSL Base library (HTTP, JSON, API)
https://rubygems.org https://github.com/rubygems/rubygems.org/pull/455
“Search” class Rubygem < ActiveRecord::Base # ... def self.search(query) conditions = <<-SQL versions.indexed and (upper(name) like upper(:query) or upper(translate(name, '#{SPECIAL_CHARACTERS}', '#{' ' * SPECIAL_CHARACTERS.length}')) like upper(:query)) SQL where(conditions, {:query => "%#{query.strip}%"}). includes(:versions). by_downloads end end https://github.com/rubygems/rubygems.org/blob/master/app/models/rubygem.rb Elasticsearch and Ruby
1 2 3 4 5 6 Adding search to an existing application
https://github.com/karmi/rubygems.org/compare/search-steps
“Hello Cloud” with Chef Server http://git.io/chef-hello-cloud ‣ Deploy Rubygems.org on EC2 (or locally with Vagrant) from a “zero state” ‣ 1 load balancer (HAproxy), 3 application servers (Thin+Nginx) ‣ 1 database node (PostgreSQL, Redis) ‣ 2 elasticsearch nodes ‣ Install Ruby 1.9.3 via RVM ‣ Clone the application from GitHub repository ‣ init.d scripts and full configuration for every component ‣ Restore data from backup (database dump) and import into search index ‣ Monitor every part of the stack Elasticsearch and Ruby
Thanks! d

Elasticsearch And Ruby [RuPy2012]

  • 1.
  • 2.
    http://karmi.cz Elasticsearch and Ruby
  • 3.
    {elasticsearch in anutshell} Built on top of Apache Lucene Searching and analyzing big data Scalability REST API, JSON DSL Great fit for dynamic languages and web-oriented workflows / architectures http://www.elasticsearch.org Elasticsearch and Ruby
  • 4.
    { } Elasticsearch and Ruby
  • 5.
    { } It all started in this gist… (< 200 LOC) Elasticsearch and Ruby
  • 6.
    { } Elasticsearch and Ruby
  • 7.
    Example class Results include Enumerable attr_reader :query, :curl, :time, :total, :results, :facets def initialize(search) response = JSON.parse( Slingshot.http.post("http://localhost:9200/#{search.indices}/_search", search.to_json) ) @query = search.to_json @curl = %Q|curl -X POST "http://localhost:9200/#{search.indices}/_search?pretty" -d '#{@query}'| @time = response['took'] @total = response['hits']['total'] @results = response['hits']['hits'] @facets = response['facets'] end def each(&block) @results.each(&block) end end Elasticsearch plays nicely with Ruby… Elasticsearch and Ruby
  • 8.
    elasticsearch’s Query DSL curl -­‐X  POST  "http://localhost:9200/articles/_search?pretty=true"  -­‐d  ' {    "query"  :  {        "filtered"  :  {            "filter"  :  {                "range"  :  {                    "date"  :  {                        "from"  :  "2012-­‐01-­‐01",                        "to"      :  "2012-­‐12-­‐31"                    }                }            },            "query"  :  {                "bool"  :  {                    "must"  :  {                        "terms"  :  {                            "tags"  :  [  "ruby",  "python"  ]                        }                    },                    "must"  :  {                        "match"  :  {                            "title"  :  {                                "query"  :  "conference",                                "boost"  :  10.0                            }                        }                    }                }            }        }    } }
  • 9.
    Example Tire.search('articles') do query do boolean do must { terms :tags, ['ruby', 'python'] } must { string 'published_on:[2011-01-01 TO 2011-01-02]' } end end end Elasticsearch and Ruby
  • 10.
    Example tags_query =lambda do |boolean| boolean.must { terms :tags, ['ruby', 'python'] } end published_on_query = lambda do |boolean| boolean.must { string 'published_on:[2011-01-01 TO 2011-01-02]' } end Tire.search('articles') do query { boolean &tags_query } end Tire.search('articles') do query do boolean &tags_query boolean &published_on_query end end Elasticsearch and Ruby
  • 11.
    Example search =Tire.search 'articles' do query do string 'title:T*' end filter :terms, tags: ['ruby'] facet 'tags', terms: tags sort { by :title, 'desc' } end search = Tire::Search::Search.new('articles') search.query { string('title:T*') } search.filter :terms, :tags => ['ruby'] search.facet('tags') { terms :tags } search.sort { by :title, 'desc' } Elasticsearch and Ruby
  • 12.
    TEH PROBLEM Designing the Tire library as domain-specific language, from the higher level, and consequently doing a lot of mistakes in the lower levels. ‣ Class level settings (Tire.configure); cannot connect to two elasticsearch clusters in one codebase ‣ Inconsistent access (methods vs Hashes) ‣ Not enough abstraction and separation of concerns Elasticsearch and Ruby
  • 13.
    ”Blocks with arguments” (alternative DSL syntax) Tire.search do query do text :name, params[:q] end end Tire.search do |search| search.query do |query| query.text :name, params[:q] end end Elasticsearch and Ruby
  • 14.
    The Git(Hub) (r)evolution ‣Lots of contributions... but less feedback ‣ Many contributions focus on specific use case ‣ Many contributions don’t take the bigger picture and codebase conventions into account ‣ Almost every patch needs to be processed, polished, amended ‣ Maintainer: lots of curation, less development — even on this small scale (2K LOC, 7K LOT) ‣ Contributors very eager to code, but a bit afraid to talk
  • 15.
    Tire’s Ruby onRails integration $  rails  new  myapp        -­‐m  "https://raw.github.com/karmi/tire/master/examples/rails-­‐application-­‐template.rb" ‣ Generate a fully working Rails application with a single command ‣ Downloads elasticsearch if not running, creates the application, commits every step, seeds the example data, launches the application on a free port, … ‣ Tire::Results::Item fully compatible with Rails view / URL helpers ‣ Any ActiveModel compatible OxM supported ‣ Rake task for importing data (using pagination libraries) Elasticsearch and Ruby
  • 16.
    Rails integration bakedin ‣ No proper separation of concerns / layers ‣ People expect everything to be as easy as that ‣ Tire::Results::Item baked in, not opt-in, masquerades as models ‣ People consider ActiveRecord the only OxM in the world Elasticsearch and Ruby
  • 17.
    … Persistence extension Rails extensions ActiveRecordextensions ActiveModel integration The Ruby DSL Base library (HTTP, JSON, API)
  • 18.
  • 19.
    “Search” class Rubygem <ActiveRecord::Base # ... def self.search(query) conditions = <<-SQL versions.indexed and (upper(name) like upper(:query) or upper(translate(name, '#{SPECIAL_CHARACTERS}', '#{' ' * SPECIAL_CHARACTERS.length}')) like upper(:query)) SQL where(conditions, {:query => "%#{query.strip}%"}). includes(:versions). by_downloads end end https://github.com/rubygems/rubygems.org/blob/master/app/models/rubygem.rb Elasticsearch and Ruby
  • 20.
    1 2 3 4 5 6 Adding search to an existing application
  • 21.
  • 22.
    “Hello Cloud” withChef Server http://git.io/chef-hello-cloud ‣ Deploy Rubygems.org on EC2 (or locally with Vagrant) from a “zero state” ‣ 1 load balancer (HAproxy), 3 application servers (Thin+Nginx) ‣ 1 database node (PostgreSQL, Redis) ‣ 2 elasticsearch nodes ‣ Install Ruby 1.9.3 via RVM ‣ Clone the application from GitHub repository ‣ init.d scripts and full configuration for every component ‣ Restore data from backup (database dump) and import into search index ‣ Monitor every part of the stack Elasticsearch and Ruby
  • 23.