Bug introduced in 10.0.2 and fixed in 10.2
The 10.0.2 Dataset type system complication breaks a recursive trie constructor that worked in 0.1:
byPrefix = Query[ GroupBy[First], If[First@# == {}, First@Keys@#, byPrefix[#]] &, Rest]; Test on FDA NDC product dataset ~4k sorted nonproprietary drug names:
drugNames = {"levetiracetam", "prednisone", "lamotrigine", "alprazolam", "amlodipine besylate", "amoxicillin", "diltiazem hydrochloride", "bupropion hydrochloride", "topiramate", "quetiapine fumarate", "carvedilol"} // AssociationMap[{"SOS"}~Join~ Characters[#]~Join ~ {"EOF"} & ] // Dataset Then,
drugNames[byPrefix] warns:

Element with head Association is not of the form _String
but the normal form is OK:
drugNames[byPrefix] // Normal 
It's not just a warning, it breaks lookup, even though note All Keys are String, not composite as implicated in this 10.0.2 bug:
drugNames[byPrefix]["SOS", Keys] or drugNames[byPrefix]["SOS", "p"] raise msg:
Cannot take part ... of expression of the form _String
while, again, with normal form workaround:
drugNames[byPrefix] // Normal // Dataset // Query["SOS", Keys]
{"l", "p", "a", "d", "b", "t", "q", "c"}
The normal form of an expression should be the same as the normalizing twice. That's violated here.
EDIT // Timing study
Although this question is not about performance, adding a basic benchmark to address comments, which also reveals another consequence of this bug for StringLength > 146:
randomTestWord[len_] := RandomChoice[ CharacterRange["a", "z"], {len}] // <|StringJoin[#] -> #|> &; Using
randomWordData[len_, n_] := Table[randomTestWord[len], {n}] // Association // Dataset; test along two axes: by array length fixing StringLength = 8
words8Test = Range[100, 1000, 100] // AssociationMap[ randomWordData[8, #] &] // Dataset; and by StringLength = {2, 4, 8, 16, 32, 64, 128} fixing array length = 100:
words100Test = 2^Range[7] // AssociationMap[makeDataset[#, 100] &] // Dataset; Both look linear in input:
words8Test[All, First @ AbsoluteTiming @ byPrefix[#]&] words100Test[All, First @ AbsoluteTiming @ byPrefix[#]&]

BTW, another aspect of this bug:
randomTestWord[145] // Dataset // Query[byPrefix] (* ok *) But longer strings:
randomTestWord[146] // Dataset // Query[byPrefix] // Normal (* ... <|"m" -> <|"c" -> <|"j" -> <|"b" -> <|"h" -> <|"j" -> <|"b" -> <|"s" \ -> <|"l" -> <|"r" -> <|"i" -> <|"x" -> <|"i" -> <|"k" -> <|"g" -> \ <|"z" -> <|"u" -> Dataset`Query`PackagePrivate`query[<|\ "elhhqbrjgenygqvgmlnnksmttunqfwbsgfifnhmxhirrybnrziyzodaboabnycxhawjsf\ ydvzrkfokuvekhoofakpatqijqvocuauicuvhgqoliqyhntxyyjuwitbhpdmcjbhjbslri\ xikgzue" -> {"e"}|>]|>|> ... *)
randomTestWord[145]returns immediately on 10.0.2 but runs indefinitely in 10.0.1.randomTestWord[146]runs indefinitely for me in both versions. I can replicate your results forwords8Testandwords100Testin both versions. I suppose that they dodge the exponential behaviour as they do not nest queries. Note: I assumed thatmakeDataset[#, 100]inwords100Testshould actually berandomWordData[100, #]. $\endgroup$