Introduction to basics of Search and Relevancy with Apache Solr FEATURING: Mark Bennett, CTO
Agenda • Prerequisites: Browser Tricks • Web “Command Line” • The DisMax Parser • Boosting Formula • Explaining “Explain” • Check Your Index! • Q&A • Resources / About NIE 12/2/2009 Lucid Imagination, Inc. 2
Prerequisite: Some Browser Tricks 12/2/2009 Lucid Imagination, Inc. 3
Browsers Matter – install them all! Firefox: IE and Safari: • Default XML Rendering • Better “Explain” • (also some versions of IE) copy & paste • Lots of Plugins maintains line breaks • Better table copy and paste 12/2/2009 Lucid Imagination, Inc. 4
Larger Firefox “Command Line” Customize the Firefox URL box as a command line in 3 easy steps 1. Toolbar: Right Click 2. Customize… Add New Toolbar 3. URL bar ->CLICK and DRAG Lucid Imagination, Inc. 5
Turn off Solr HTTP Caching • Change in solrconfig.xml • Disable the http304 section • Turn it back on before you deploy! 12/2/2009 Lucid Imagination, Inc. 6
Understanding Solr’s “Web Command Line” 12/2/2009 Lucid Imagination, Inc. 7
The “Web Command Line” CLI CONCEPT SOLR EQUIVALENT • Command Prompt URL bar • -o or --foo bar ? or & and = • (spaces) + • some punctuation %nn • output XML or HTML • Command line “adapter” Curl • Script files can call URLs • Not built into Windows – try cygwin 12/2/2009 Lucid Imagination, Inc. 8
Solr “Command Line” • Typical Base URL • http://localhost:8983/solr/select?... • Basic Input (not counting dismax) • q = query, fq = filter query • df = default field • qt = query type (standard / dismax) • Controlling Output (lots more!!!) • debugQuery = true • wt = “what type” (actually “writer type”) • standard/XML, xslt (with tr=), javabin, json… • fl = *,score (which fields) 12/2/2009 Lucid Imagination, Inc. 9
Example: search for “solr” http://localhost:8983/solr/select?q=solr&debugQuery=true With Firefox you get XML output you can expand and collapse With MSIE* and Safari, not so much * Some versions 12/2/2009 Lucid Imagination, Inc. 10
Detailed Debug & Explain Output http://localhost:8983/solr/select?q=solr&debugQuery=true <str name="parsedquery">text:solr</str> … <lst name="explain"> <str name="SOLR1000"> 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=text, doc=13) </str> </lst> 12/2/2009 Lucid Imagination, Inc. 11
A look at the DisMax query parser 12/2/2009 Lucid Imagination, Inc. 12
Solr DisMax: Defined • What is it? • Dis-joint text (Multiple fields) • Max-imum match (score) • How do you get it? • Configured in: • solrconfig.xml and schema.xml • Called with: • qt=dismax • Adjusted with: • mm, bf, qf, pf, qs, ps, tie 12/2/2009 Lucid Imagination, Inc. 13
Solr DisMax: Pros and Cons General Benefits • Multiple Fields • Multiple Relevancy Rules • Great for Freshness / Popularity Issues to be Aware of • Tie-in between schema.xml & solrconfig.xml • Trouble with some CJK (Chinese, Japanese, Korean) • Limited wildcard / field / range support • Difficult to customize and debug • Trouble with shingles • Understand mm! Lucid Imagination, Inc. 14
About the “dis” and the “max” Distributed across multiple fields • Breakup query into words • Each part becomes field clause • Like an OR but with extra credit Takes the Maximum of each set • Word 1 had highest score in Title • Word 2 very dense in the doc body • Adds in Tie breaker if in multiple fields Lucid Imagination, Inc. 15
Coming soon: Extended DisMax Improvements • Flexible case Boolean ops: AND/and, OR/or • Auto-escape punctuation & -> &, etc. • Improved Proximity Boosting (via word bigrams) • Other changes in stop words, relevancy calc, URL arguments How to get it • Post 1.4 patch, planned for 1.5 • Details + Patch in JIRA: SOLR-1553 http://issues.apache.org/jira/browse/SOLR-1553 • TBD: change URL option qt=edismax (or qt=dismax ) Lucid Imagination, Inc. 16
Boosting Formulas 12/2/2009 Lucid Imagination, Inc. 17
Boost Functions in Dismax High Level Feature • Numeric functions for scoring • sum(), product(), sqrt(), log(), etc. • Boost on recent dates, user popularity Good Combination: Reverse-Ordinal & Reciprocal • Position in index : ord(), reverse is: rord() • Larger y for smaller x: recip() How to get it • URL parameter bf = “boost function” • Configured in solrconfig.xml • See http://wiki.apache.org/solr/FunctionQuery Lucid Imagination, Inc. 18
“Freshness”: Boosting Recent Dates mx+c a / mx+c WIKI EXAMPLE: Position N-Position Linear Date ord() rord() (x,m,c) recip(x,m,a,c) recip( rord(creationDate), 1, 1000, 1000 ) slope m 1 1/1/2000 1 120 1120 0.89286 numerator a 1000 2/1/2000 2 119 1119 0.89366 intercept c 1000 (aka "b") 3/1/2000 3 118 1118 0.89445 1.000 … … … … … 1/1/2005 61 60 1060 0.94340 0.980 … … … … … 1/1/2009 109 12 1012 0.98814 0.960 2/1/2009 110 11 1011 0.98912 3/1/2009 111 10 1010 0.99010 0.940 4/1/2009 112 9 1009 0.99108 0.920 5/1/2009 113 8 1008 0.99206 6/1/2009 114 7 1007 0.99305 0.900 7/1/2009 115 6 1006 0.99404 8/1/2009 116 5 1005 0.99502 0.880 9/1/2009 117 4 1004 0.99602 10/1/2009 118 3 1003 0.99701 11/1/2009 119 2 1002 0.99800 12/1/2009 120 1 1001 0.99900 Lucid Imagination, Inc. 19
Sifting through Solr’s “Explain” output 12/2/2009 Lucid Imagination, Inc. 20
DisMax Example for “solr” INPUT: http://localhost:8983/solr /select?q=solr&debugQuery=true&qt=dismax DEBUG OUTPUT: (1 OF 2) <str name="parsedquery"> +DisjunctionMaxQuery((id:solr^10.0 | text:solr^0.5 | cat:solr^1.4 | manu:solr^1.1 | name:solr^1.2 | features:solr | sku:solr^1.5)~0.01) DisjunctionMaxQuery((manu_exact:solr^1.9 | features:solr^1.1 | text:solr^0.2 | manu:solr^1.4 | name:solr^1.5)~0.01) FunctionQuery((top(ord(popularity)))^0.5) FunctionQuery((1000.0/(1.0*float(top(rord(price)))+1000.0))^0.3) </str> 12/2/2009 Lucid Imagination, Inc. 21
DisMax explain output for a single word query <lst name="explain"> 3.6026897 = (MATCH) fieldWeight(sku:solr in 13), product of: 0.125 = fieldNorm(field=text, doc=13) <str name="SOLR1000"> 1.0 = tf(termFreq(sku:solr)=1) 0.22260013 = (MATCH) weight(name:solr^1.5 0.74609417 = (MATCH) sum of: 3.6026897 = idf(docFreq=1, numDocs=26) in 13), product of: 0.4476144 = (MATCH) max plus 0.01 times others of: 1.0 = fieldNorm(field=sku, doc=13) 0.12357441 = queryWeight(name:solr^1.5), 0.026233677 = (MATCH) weight(text:solr^0.5 in 13), product of: 1.0 = tf(termFreq(features:solr)=1) product of: 0.04119147 = queryWeight(text:solr^0.5), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 1.5 = boost 0.5 = boost 0.125 = fieldNorm(field=features, doc=13) 3.6026897 = idf(docFreq=1, numDocs=26) 3.6026897 = idf(docFreq=1, numDocs=26) 0.44520026 = (MATCH) weight(sku:solr^1.5 in 13), product of: 0.022867065 = queryNorm 0.022867065 = queryNorm 0.12357441 = queryWeight(sku:solr^1.5), product of: 1.8013449 = (MATCH) fieldWeight(name:solr 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 1.5 = boost in 13), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.6026897 = idf(docFreq=1, numDocs=26) 1.0 = tf(termFreq(name:solr)=1) 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=text, doc=13) 3.6026897 = (MATCH) fieldWeight(sku:solr in 13), product of: 0.5 = fieldNorm(field=name, doc=13) 0.17808011 = (MATCH) weight(name:solr^1.2 in 13), product of: 1.0 = tf(termFreq(sku:solr)=1) 0.06860119 = (MATCH) 0.09885953 = queryWeight(name:solr^1.2), product of: 3.6026897 = idf(docFreq=1, numDocs=26) FunctionQuery(top(ord(popularity))), 1.2 = boost 1.0 = fieldNorm(field=sku, doc=13) product of: 3.6026897 = idf(docFreq=1, numDocs=26) 0.22311316 = (MATCH) max plus 0.01 times others of: 6.0 = ord(popularity)=6 0.022867065 = queryNorm 0.040810023 = (MATCH) weight(features:solr^1.1 in 13), 0.5 = boost 1.8013449 = (MATCH) fieldWeight(name:solr in 13), product of: product of: 0.022867065 = queryNorm 1.0 = tf(termFreq(name:solr)=1) 0.09062123 = queryWeight(features:solr^1.1), product of: 0.0067654043 = (MATCH) 3.6026897 = idf(docFreq=1, numDocs=26) 1.1 = boost FunctionQuery(1000.0/(1.0*float(top(ror 0.5 = fieldNorm(field=name, doc=13) 3.6026897 = idf(docFreq=1, numDocs=26) d(price)))+1000.0)), product of: 0.03710002 = (MATCH) weight(features:solr in 13), product of: 0.022867065 = queryNorm 0.9861933 = 0.08238294 = queryWeight(features:solr), product of: 0.45033622 = (MATCH) fieldWeight(features:solr in 13), 1000.0/(1.0*float(rord(price)=14)+1000.0 3.6026897 = idf(docFreq=1, numDocs=26) product of: ) 0.022867065 = queryNorm 1.0 = tf(termFreq(features:solr)=1) 0.3 = boost 0.45033622 = (MATCH) fieldWeight(features:solr in 13), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 1.0 = tf(termFreq(features:solr)=1) 0.125 = fieldNorm(field=features, doc=13) </str> 3.6026897 = idf(docFreq=1, numDocs=26) 0.01049347 = (MATCH) weight(text:solr^0.2 in 13), product of: </lst> 0.125 = fieldNorm(field=features, doc=13) 0.016476588 = queryWeight(text:solr^0.2), product of: 0.44520026 = (MATCH) weight(sku:solr^1.5 in 13), product of: 0.2 = boost 0.12357441 = queryWeight(sku:solr^1.5), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 1.5 = boost 0.022867065 = queryNorm 3.6026897 = idf(docFreq=1, numDocs=26) 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 0.022867065 = queryNorm 1.4142135 = tf(termFreq(text:solr)=2) 3.6026897 = idf(docFreq=1, numDocs=26) 12/2/2009 Lucid Imagination, Inc. 22
“Explain” example: ... 0.026233677 = (MATCH) weight(text:solr^0.5 in 13), product of: 0.04119147 = queryWeight(text:solr^0.5), product of: 0.5 = boost 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 1.4142135 = tf(termFreq(text:solr)=2) tf (termFreq(text:solr )=2) 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=text, doc=13) 0.17808011 = (MATCH) weight(name:solr^1.2 in 13), product of: idf (docFreq=1,numDocs=26) 0.09885953 = queryWeight(name:solr^1.2), product of: 1.2 = boost 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 1.8013449 = (MATCH) fieldWeight(name:solr in 13), product of: 1.0 = tf(termFreq(name:solr)=1) 3.6026897 = idf(docFreq=1, numDocs=26) 0.5 = fieldNorm(field=name, doc=13) 0.03710002 = (MATCH) weight(features:solr in 13), product of: 0.08238294 = queryWeight(features:solr), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 0.45033622 = (MATCH) fieldWeight(features:solr in 13), product of: 1.0 = tf(termFreq(features:solr)=1) 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=features, doc=13) ... 12/2/2009 Lucid Imagination, Inc. 23
Solr’s XSLT “debugger” http://localhost:8983/solr/select? q=solr &debugQuery=true &wt=xslt &tr=example.xsl &fl=*,score &qt=dismax 12/2/2009 Lucid Imagination, Inc. 24
Another way to view Explain data • Solr1.4 has Solritas • Various features, including toggle explain display • “Some assembly required…” http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/ Lucid Imagination, Inc. 25
Checking your Index and IDF 12/2/2009 Lucid Imagination, Inc. 26
Checking what got Indexed Bad Index = Bad Search • Check Upper / lower case and Punctuation • Bad Fields / Meta Data = Bad Facets, Filters, Sorting Use built-in Schema Browser: • Check each field • Common words = • IDF “Inverse Document Frequency” Lucid Imagination, Inc. 27
Check IDF w/ the Schema Browser Start at the Admin Screen: http://localhost:8983/solr/admin Schema Browser • select a field • change # to see more Lucid Imagination, Inc.
About NIE New Idea Engineering 12/2/2009 Lucid Imagination, Inc. 29
NIE Resources Newsletter & Whitepapers: Search Dev Newsgroup: www.ideaeng.com/current www.SearchDev.org Blogs: EnterpriseSearchBlog.com SearchComponentsOnline.com 12/2/2009 Lucid Imagination, Inc. 30
Finish Line / Q & A Review & Questions Mark Bennett mbennett@ideaeng.com main 408-446-3460 cell 408-829-6513 12/2/2009 Lucid Imagination, Inc. 31
Q&A These slides and a recorded presentation are available at bit.ly/SolrRelevancy 12/2/2009 Lucid Imagination, Inc.

An Introduction to Basics of Search and Relevancy with Apache Solr

  • 1.
    Introduction to basicsof Search and Relevancy with Apache Solr FEATURING: Mark Bennett, CTO
  • 2.
    Agenda • Prerequisites: Browser Tricks • Web “Command Line” • The DisMax Parser • Boosting Formula • Explaining “Explain” • Check Your Index! • Q&A • Resources / About NIE 12/2/2009 Lucid Imagination, Inc. 2
  • 3.
    Prerequisite: Some Browser Tricks 12/2/2009 Lucid Imagination, Inc. 3
  • 4.
    Browsers Matter –install them all! Firefox: IE and Safari: • Default XML Rendering • Better “Explain” • (also some versions of IE) copy & paste • Lots of Plugins maintains line breaks • Better table copy and paste 12/2/2009 Lucid Imagination, Inc. 4
  • 5.
    Larger Firefox “CommandLine” Customize the Firefox URL box as a command line in 3 easy steps 1. Toolbar: Right Click 2. Customize… Add New Toolbar 3. URL bar ->CLICK and DRAG Lucid Imagination, Inc. 5
  • 6.
    Turn off SolrHTTP Caching • Change in solrconfig.xml • Disable the http304 section • Turn it back on before you deploy! 12/2/2009 Lucid Imagination, Inc. 6
  • 7.
    Understanding Solr’s “Web Command Line” 12/2/2009 Lucid Imagination, Inc. 7
  • 8.
    The “Web CommandLine” CLI CONCEPT SOLR EQUIVALENT • Command Prompt URL bar • -o or --foo bar ? or & and = • (spaces) + • some punctuation %nn • output XML or HTML • Command line “adapter” Curl • Script files can call URLs • Not built into Windows – try cygwin 12/2/2009 Lucid Imagination, Inc. 8
  • 9.
    Solr “Command Line” • Typical Base URL • http://localhost:8983/solr/select?... • Basic Input (not counting dismax) • q = query, fq = filter query • df = default field • qt = query type (standard / dismax) • Controlling Output (lots more!!!) • debugQuery = true • wt = “what type” (actually “writer type”) • standard/XML, xslt (with tr=), javabin, json… • fl = *,score (which fields) 12/2/2009 Lucid Imagination, Inc. 9
  • 10.
    Example: search for“solr” http://localhost:8983/solr/select?q=solr&debugQuery=true With Firefox you get XML output you can expand and collapse With MSIE* and Safari, not so much * Some versions 12/2/2009 Lucid Imagination, Inc. 10
  • 11.
    Detailed Debug &Explain Output http://localhost:8983/solr/select?q=solr&debugQuery=true <str name="parsedquery">text:solr</str> … <lst name="explain"> <str name="SOLR1000"> 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=text, doc=13) </str> </lst> 12/2/2009 Lucid Imagination, Inc. 11
  • 12.
    A look atthe DisMax query parser 12/2/2009 Lucid Imagination, Inc. 12
  • 13.
    Solr DisMax: Defined • What is it? • Dis-joint text (Multiple fields) • Max-imum match (score) • How do you get it? • Configured in: • solrconfig.xml and schema.xml • Called with: • qt=dismax • Adjusted with: • mm, bf, qf, pf, qs, ps, tie 12/2/2009 Lucid Imagination, Inc. 13
  • 14.
    Solr DisMax: Prosand Cons General Benefits • Multiple Fields • Multiple Relevancy Rules • Great for Freshness / Popularity Issues to be Aware of • Tie-in between schema.xml & solrconfig.xml • Trouble with some CJK (Chinese, Japanese, Korean) • Limited wildcard / field / range support • Difficult to customize and debug • Trouble with shingles • Understand mm! Lucid Imagination, Inc. 14
  • 15.
    About the “dis”and the “max” Distributed across multiple fields • Breakup query into words • Each part becomes field clause • Like an OR but with extra credit Takes the Maximum of each set • Word 1 had highest score in Title • Word 2 very dense in the doc body • Adds in Tie breaker if in multiple fields Lucid Imagination, Inc. 15
  • 16.
    Coming soon: ExtendedDisMax Improvements • Flexible case Boolean ops: AND/and, OR/or • Auto-escape punctuation & -> &, etc. • Improved Proximity Boosting (via word bigrams) • Other changes in stop words, relevancy calc, URL arguments How to get it • Post 1.4 patch, planned for 1.5 • Details + Patch in JIRA: SOLR-1553 http://issues.apache.org/jira/browse/SOLR-1553 • TBD: change URL option qt=edismax (or qt=dismax ) Lucid Imagination, Inc. 16
  • 17.
    Boosting Formulas 12/2/2009 Lucid Imagination, Inc. 17
  • 18.
    Boost Functions inDismax High Level Feature • Numeric functions for scoring • sum(), product(), sqrt(), log(), etc. • Boost on recent dates, user popularity Good Combination: Reverse-Ordinal & Reciprocal • Position in index : ord(), reverse is: rord() • Larger y for smaller x: recip() How to get it • URL parameter bf = “boost function” • Configured in solrconfig.xml • See http://wiki.apache.org/solr/FunctionQuery Lucid Imagination, Inc. 18
  • 19.
    “Freshness”: Boosting RecentDates mx+c a / mx+c WIKI EXAMPLE: Position N-Position Linear Date ord() rord() (x,m,c) recip(x,m,a,c) recip( rord(creationDate), 1, 1000, 1000 ) slope m 1 1/1/2000 1 120 1120 0.89286 numerator a 1000 2/1/2000 2 119 1119 0.89366 intercept c 1000 (aka "b") 3/1/2000 3 118 1118 0.89445 1.000 … … … … … 1/1/2005 61 60 1060 0.94340 0.980 … … … … … 1/1/2009 109 12 1012 0.98814 0.960 2/1/2009 110 11 1011 0.98912 3/1/2009 111 10 1010 0.99010 0.940 4/1/2009 112 9 1009 0.99108 0.920 5/1/2009 113 8 1008 0.99206 6/1/2009 114 7 1007 0.99305 0.900 7/1/2009 115 6 1006 0.99404 8/1/2009 116 5 1005 0.99502 0.880 9/1/2009 117 4 1004 0.99602 10/1/2009 118 3 1003 0.99701 11/1/2009 119 2 1002 0.99800 12/1/2009 120 1 1001 0.99900 Lucid Imagination, Inc. 19
  • 20.
    Sifting through Solr’s “Explain” output 12/2/2009 Lucid Imagination, Inc. 20
  • 21.
    DisMax Example for“solr” INPUT: http://localhost:8983/solr /select?q=solr&debugQuery=true&qt=dismax DEBUG OUTPUT: (1 OF 2) <str name="parsedquery"> +DisjunctionMaxQuery((id:solr^10.0 | text:solr^0.5 | cat:solr^1.4 | manu:solr^1.1 | name:solr^1.2 | features:solr | sku:solr^1.5)~0.01) DisjunctionMaxQuery((manu_exact:solr^1.9 | features:solr^1.1 | text:solr^0.2 | manu:solr^1.4 | name:solr^1.5)~0.01) FunctionQuery((top(ord(popularity)))^0.5) FunctionQuery((1000.0/(1.0*float(top(rord(price)))+1000.0))^0.3) </str> 12/2/2009 Lucid Imagination, Inc. 21
  • 22.
    DisMax explain output for a single word query <lst name="explain"> 3.6026897 = (MATCH) fieldWeight(sku:solr in 13), product of: 0.125 = fieldNorm(field=text, doc=13) <str name="SOLR1000"> 1.0 = tf(termFreq(sku:solr)=1) 0.22260013 = (MATCH) weight(name:solr^1.5 0.74609417 = (MATCH) sum of: 3.6026897 = idf(docFreq=1, numDocs=26) in 13), product of: 0.4476144 = (MATCH) max plus 0.01 times others of: 1.0 = fieldNorm(field=sku, doc=13) 0.12357441 = queryWeight(name:solr^1.5), 0.026233677 = (MATCH) weight(text:solr^0.5 in 13), product of: 1.0 = tf(termFreq(features:solr)=1) product of: 0.04119147 = queryWeight(text:solr^0.5), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 1.5 = boost 0.5 = boost 0.125 = fieldNorm(field=features, doc=13) 3.6026897 = idf(docFreq=1, numDocs=26) 3.6026897 = idf(docFreq=1, numDocs=26) 0.44520026 = (MATCH) weight(sku:solr^1.5 in 13), product of: 0.022867065 = queryNorm 0.022867065 = queryNorm 0.12357441 = queryWeight(sku:solr^1.5), product of: 1.8013449 = (MATCH) fieldWeight(name:solr 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 1.5 = boost in 13), product of: 1.4142135 = tf(termFreq(text:solr)=2) 3.6026897 = idf(docFreq=1, numDocs=26) 1.0 = tf(termFreq(name:solr)=1) 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=text, doc=13) 3.6026897 = (MATCH) fieldWeight(sku:solr in 13), product of: 0.5 = fieldNorm(field=name, doc=13) 0.17808011 = (MATCH) weight(name:solr^1.2 in 13), product of: 1.0 = tf(termFreq(sku:solr)=1) 0.06860119 = (MATCH) 0.09885953 = queryWeight(name:solr^1.2), product of: 3.6026897 = idf(docFreq=1, numDocs=26) FunctionQuery(top(ord(popularity))), 1.2 = boost 1.0 = fieldNorm(field=sku, doc=13) product of: 3.6026897 = idf(docFreq=1, numDocs=26) 0.22311316 = (MATCH) max plus 0.01 times others of: 6.0 = ord(popularity)=6 0.022867065 = queryNorm 0.040810023 = (MATCH) weight(features:solr^1.1 in 13), 0.5 = boost 1.8013449 = (MATCH) fieldWeight(name:solr in 13), product of: product of: 0.022867065 = queryNorm 1.0 = tf(termFreq(name:solr)=1) 0.09062123 = queryWeight(features:solr^1.1), product of: 0.0067654043 = (MATCH) 3.6026897 = idf(docFreq=1, numDocs=26) 1.1 = boost FunctionQuery(1000.0/(1.0*float(top(ror 0.5 = fieldNorm(field=name, doc=13) 3.6026897 = idf(docFreq=1, numDocs=26) d(price)))+1000.0)), product of: 0.03710002 = (MATCH) weight(features:solr in 13), product of: 0.022867065 = queryNorm 0.9861933 = 0.08238294 = queryWeight(features:solr), product of: 0.45033622 = (MATCH) fieldWeight(features:solr in 13), 1000.0/(1.0*float(rord(price)=14)+1000.0 3.6026897 = idf(docFreq=1, numDocs=26) product of: ) 0.022867065 = queryNorm 1.0 = tf(termFreq(features:solr)=1) 0.3 = boost 0.45033622 = (MATCH) fieldWeight(features:solr in 13), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 1.0 = tf(termFreq(features:solr)=1) 0.125 = fieldNorm(field=features, doc=13) </str> 3.6026897 = idf(docFreq=1, numDocs=26) 0.01049347 = (MATCH) weight(text:solr^0.2 in 13), product of: </lst> 0.125 = fieldNorm(field=features, doc=13) 0.016476588 = queryWeight(text:solr^0.2), product of: 0.44520026 = (MATCH) weight(sku:solr^1.5 in 13), product of: 0.2 = boost 0.12357441 = queryWeight(sku:solr^1.5), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 1.5 = boost 0.022867065 = queryNorm 3.6026897 = idf(docFreq=1, numDocs=26) 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 0.022867065 = queryNorm 1.4142135 = tf(termFreq(text:solr)=2) 3.6026897 = idf(docFreq=1, numDocs=26) 12/2/2009 Lucid Imagination, Inc. 22
  • 23.
    “Explain” example: ... 0.026233677 =(MATCH) weight(text:solr^0.5 in 13), product of: 0.04119147 = queryWeight(text:solr^0.5), product of: 0.5 = boost 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 0.6368716 = (MATCH) fieldWeight(text:solr in 13), product of: 1.4142135 = tf(termFreq(text:solr)=2) tf (termFreq(text:solr )=2) 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=text, doc=13) 0.17808011 = (MATCH) weight(name:solr^1.2 in 13), product of: idf (docFreq=1,numDocs=26) 0.09885953 = queryWeight(name:solr^1.2), product of: 1.2 = boost 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 1.8013449 = (MATCH) fieldWeight(name:solr in 13), product of: 1.0 = tf(termFreq(name:solr)=1) 3.6026897 = idf(docFreq=1, numDocs=26) 0.5 = fieldNorm(field=name, doc=13) 0.03710002 = (MATCH) weight(features:solr in 13), product of: 0.08238294 = queryWeight(features:solr), product of: 3.6026897 = idf(docFreq=1, numDocs=26) 0.022867065 = queryNorm 0.45033622 = (MATCH) fieldWeight(features:solr in 13), product of: 1.0 = tf(termFreq(features:solr)=1) 3.6026897 = idf(docFreq=1, numDocs=26) 0.125 = fieldNorm(field=features, doc=13) ... 12/2/2009 Lucid Imagination, Inc. 23
  • 24.
    Solr’s XSLT “debugger” http://localhost:8983/solr/select? q=solr &debugQuery=true &wt=xslt &tr=example.xsl &fl=*,score &qt=dismax 12/2/2009 Lucid Imagination, Inc. 24
  • 25.
    Another way toview Explain data • Solr1.4 has Solritas • Various features, including toggle explain display • “Some assembly required…” http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/ Lucid Imagination, Inc. 25
  • 26.
    Checking your Indexand IDF 12/2/2009 Lucid Imagination, Inc. 26
  • 27.
    Checking what gotIndexed Bad Index = Bad Search • Check Upper / lower case and Punctuation • Bad Fields / Meta Data = Bad Facets, Filters, Sorting Use built-in Schema Browser: • Check each field • Common words = • IDF “Inverse Document Frequency” Lucid Imagination, Inc. 27
  • 28.
    Check IDF w/the Schema Browser Start at the Admin Screen: http://localhost:8983/solr/admin Schema Browser • select a field • change # to see more Lucid Imagination, Inc.
  • 29.
    About NIE New Idea Engineering 12/2/2009 Lucid Imagination, Inc. 29
  • 30.
    NIE Resources Newsletter &Whitepapers: Search Dev Newsgroup: www.ideaeng.com/current www.SearchDev.org Blogs: EnterpriseSearchBlog.com SearchComponentsOnline.com 12/2/2009 Lucid Imagination, Inc. 30
  • 31.
    Finish Line /Q & A Review & Questions Mark Bennett mbennett@ideaeng.com main 408-446-3460 cell 408-829-6513 12/2/2009 Lucid Imagination, Inc. 31
  • 32.
    Q&A These slides and a recorded presentation are available at bit.ly/SolrRelevancy 12/2/2009 Lucid Imagination, Inc.