I need to index 3 levels (or more) of child-parent. For example, the levels might be an author, a book, and characters from that book.
However, when indexing more than two-levels there is a problem with has_child and has_parent queries and filters. If I have 5 shards, I get about one fifth of the results when running a "has_parent" query on the lowest level (characters) or a has_child query on the second level(books).
My guess is that a book gets indexed to a shard by it's parent id and so will reside together with his parent (author), but a character gets indexed to a shard based on the hash of the book id, which does not necessarily complies with the actual shard the book was indexed on.
And so, this means that all character of books of the same author do not necessarily reside in the same shard (kind of crippling the whole child-parent advantage really).
Am I doing something wrong? How can I resolve this, as I am in real need for complex queries such as "what authors wrote books with female characters" for example.
I mad a gist showing the problem, at: https://gist.github.com/eranid/5299628
Bottom line is, that if I have a mapping:
"author" : { "properties" : { "name" : { "type" : "string" } } }, "book" : { "_parent" : { "type" : "author" }, "properties" : { "title" : { "type" : "string" } } }, "character" : { "_parent" : { "type" : "book" }, "properties" : { "name" : { "type" : "string" } } } and a 5 shards index, I cannot make queries with "has_child" and "has_parent"
The query:
curl -XPOST 'http://localhost:9200/index1/character/_search?pretty=true' -d '{ "query": { "bool": { "must": [ { "has_parent": { "parent_type": "book", "query": { "match_all": {} } } } ] } } }' returns only a fifth (approximately) of the characters.