As I understand Solr's scoring function, the following two queries should be equivalent.
Namely, score(q1, d) = score(q2, d) for each docuement d in the corpus.
Query 1: evolution OR selection OR germline OR dna OR rna OR mitochondria
Query 2: (evolution OR selection OR germline) OR (dna OR rna OR mitochondria)
The queries are obviously logically equivalent (they both return the same set of documents). Also, both queries consist of the same 6 terms, and each term has a boost of 1 in both queries. Hence each term is supposed to have the same contribution to the total score (same TF, same IDF, same boost).
In spite of that, the queries don't give the same scores.
In general, a conjunction of terms (a OR b OR c OR d) is not the same as a conjunction of queries ((a OR b) OR (c OR d)). What is the semantic difference between the two types of queries? What is causing them to result in different scorings?
The reason I'm asking is that I'm building a custom request handler in which I construct the second type of query (conjunction of queries) while I might actually need to construct the first type of query (conjunction of terms). In other words, this is what I'm doing:
Query q1 = ... //conjunction of terms evolution, selection, germline Query q2 = ... //conjunction of terms dna, rna, mitochondria Query conjunctionOfQueries = new BooleanQuery(); conjunctionOfQueries.add(q1, BooleanClause.Occure.SHOULD); conjunctionOfQueries.add(q2, BooleanClause.Occure.SHOULD); while maybe I should actually do:
List<String> terms = ... //extract all 6 terms from q1 and q2 List<TermQuery> termQueries = ... //create a new TermQuery from each term in terms Query conjunctionOfTerms = new BooleanQuery(); for (TermQuery t : termQueries) { conjunctionOfTerms.add(t, BooleanClause.Occure.SHOULD); }