16

I am implementing a SOLR search. When I type in e.g Richard Chase I get all the Richards in the index and all the Chases, like Johnny Chase etc.. when actually I only want to return all the names that match BOTH Richard AND Chase.

my config settings are

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> 

and my query searches text field

text:Richard Chase

any ideas what I'm doing wrong?

2
  • just one more thing...the search needs to match cases e.g Richard John Chase or Mr Richard Chase Commented Aug 14, 2013 at 6:25
  • It is a bit misleading to say on the one hand that you want exact matches, but then accept "Richard John Chase". "Richard Chase" != "Richard John Chase". For exact matching in Solr please see stackoverflow.com/a/29105025/1389219 Commented Feb 6, 2020 at 16:05

4 Answers 4

14

You are using StandardTokenizerFactory, which adheres to Word Boundary rules.

This would mean that your words get split on spaces.

if you want a real exact match, i.e

Richard Chase to return documents containing only Richard Chase exactly, then you should you KeywordTokenizerFactory.

But as you mention, you want Richard John Chase but not Johnny Chase, it tells me that you want matches for Richard and Chase.

You could either search for Richard AND Chase or change your default operator in schema.xml to be AND instead of OR. Beware that this setting is global.

Sign up to request clarification or add additional context in comments.

1 Comment

yes thats it - i will split my search term and then build my query using AND. thanks!
9

You have to use PhraseQuery (text:"Richard Chase") to get documents where both Ricahard and Chase are near to each other. If you want also to find, say, Richard X. Chase you can use text:"richard chase"~1.

See http://www.solrtutorial.com/solr-query-syntax.html

2 Comments

This will not return exact matches as results like "Richard Chase Jr" would be returned.
@vegemite4me, of course it would also return such documents. There's always trade-off between precision and recall in any full text search system. If you want to find exact match without any other tokens nearby you always can put named entities to another field with KeywordTokenizer or just StrField.
3

For exact match you can set mm(Minimum "Should" Match) parameter of your query parser to 100% in your solrconfig.xml

<str name="mm">100%</str> 

This specifies a minimum number of clauses that must match in a query. Or you can override this parameter (q.mm) at query-time in request

Comments

3

Another option is to use a copyField to copy the value of text to a string typed field,

<field name="text_orig" type="string" /> <copyField source="text" dest="text_orig" maxChars="1024"/> 

when you need to do only exact match, use the text_orig field in query:

text_orig:"Richard Chase" 

Since string types won't be analysed and will be stored as is, only exact queries will match them.

1 Comment

This is how we solved it. Elegant and straight forward.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.