0

I have a field configured like

 <fieldType name="gtext" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <!--Needed for efficient trailling wildcard queries--> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" side="front"/> <filter class="solr.ReversedWildcardFilterFactory" withOriginal="true" maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="1" stemEnglishPossessive="1" catenateAll="0" preserveOriginal="1" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="1" stemEnglishPossessive="1" catenateAll="0" preserveOriginal="1" /> </analyzer> </fieldType> 

So when I search for example fun, it will also return funny. How can I avoid this behavior and have only fun matched? Is it because of reverse wildcards?

1 Answer 1

2

This is cause of the EdgeNGramFilterFactory filter

<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" side="front"/> 

EdgeNGramFilterFactory generates edge grams for the token e.g.

funny would generate -> f, fu, fun, funn, funny .....

So when you search for fun, documents with funny would match

ReversedWildcardFilterFactory does not cause this issue, it will only enhance the prefix query search.

for e.g. funny would be stored as ynnuf

And prefix queries *nny would be converted to ynn* which is more good for performance.

Sign up to request clarification or add additional context in comments.

2 Comments

Are you sure? ngram filter is supposed to make trailing wildcard queries more efficient. Shall I get rid of it?
you need ngram for wildcards and revere for prefix queries. However, the issues mentioned by you is cause of ngrams as it would cause partial matches as well. You can use different fields with and without ngrams. w/o will not cause partial matches.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.