Timeline for Can a Trie be implemented efficiently?
Current License: CC BY-SA 3.0
33 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Feb 10, 2017 at 17:11 | comment | added | Leonid Shifrin | @alancalvitti Thanks for the info / link. | |
| Feb 9, 2017 at 22:23 | comment | added | alancalvitti | @LeonidShifrin, fyi > mathematica.stackexchange.com/a/137422/801 | |
| Feb 1, 2017 at 2:13 | comment | added | Daniel Lichtblau | So, five years later, I will ask...what have you, err, tried? (I crack me up.) | |
| Feb 1, 2017 at 1:20 | answer | added | Anton Antonov | timeline score: 8 | |
| Dec 2, 2014 at 13:01 | comment | added | Leonid Shifrin | @alancalvitti This only means the Import is grossly inefficient. Also, list of Associations is by itself not the most efficient format. It has nothing to do with data structures, they must be implemented as efficiently as possible. | |
| Dec 1, 2014 at 21:39 | comment | added | alancalvitti | @LeonidShifrin, I don't expect a reply, or restart the argument but FYI, it took ~340sec to Import and thread Keys for some time series data, and only ~18 sec to make (small Depth ~10) Trie recursively - so in this case your approach doesn't necessarily speed up the overall analytic workflow - though I recognize this is outside OPs question. | |
| Nov 29, 2014 at 1:03 | comment | added | alancalvitti | @LeonidShifrin, re "...defines its own query language. Which data structures it uses under the hood is its own business" - seems an opportunity to insert an interface b/w the syntax and the engine. Knuth & assembly language are long gone. | |
| Nov 28, 2014 at 23:56 | comment | added | Leonid Shifrin | @alancalvitti ... one should not be mixing them. Dataset is a kind of a database, and defines its own query language. Which data structures it uses under the hood is its own business, and the end user could not care less, as long as the speed and convenience is satisfactory. Trie and others are particular data structures. They are constructed to be used in very specific algorithms. They are not general by definition. Asking your database engine to be at the same time a collection of data structures / algorithms doesn't strike me as a good idea. This has been my last comment here. | |
| Nov 28, 2014 at 23:51 | comment | added | Leonid Shifrin | @alancalvitti You seem to not be getting my point, despite my numerous attempts to make it clear. I do not dispute the usefulness of uniform wrapper. And I don't claim my code (or the approach to data structures implementation and design I advocate) to provide generic wrappers. I was just saying that a question about a design of a particular data structure like a trie (the one asked in this page) is largely unrelated to the one of designing uniform wrappers to incorporate many data structures under some uniform syntactic umbrella. These are different layers of abstraction, and ... | |
| Nov 28, 2014 at 22:15 | comment | added | alancalvitti | @LeonidShifrin, re your last comment above - my point was Dataset is a uniform wrapper - unlike ad hoc code below. Let's see how long it takes you to refactor your method to manage Tallys for instance. | |
| Nov 28, 2014 at 18:08 | comment | added | Leonid Shifrin | @alancalvitti I understand your perspective as being a user who wants compact syntax, but for any realistic problem this is not the only dimension. What I have described is a developer's view on how to structure the functionality so that it works well for both performance and convenience dimensions. From this viewpoint, Dataset certainly is a wrapper (in the sense I explained above). Once again: I am not saying that convenience isn't important, I am saying that it is a different aspect, from pure data structure implementation. | |
| Nov 28, 2014 at 18:05 | comment | added | Leonid Shifrin | @alancalvitti ... type-checks data, etc.). But, I insist on the separation of data structures and syntactic layer. Association itself is a new efficient data structure, and it was implemented in C for speed. Even when we implement new data structures in Mathematica (top-level), they must be made as fast as possible. Only then, one adds syntactic layer on top, as a separate step. Both are important, but the core data structures should not IMO be concerned with syntactic convenience - they just need to be fast and have a clear well-defined API. | |
| Nov 28, 2014 at 18:02 | comment | added | Leonid Shifrin | @alancalvitti The reality of Mathematica is that every generalization potentially leads to a serious loss of performance. The language is so high-level, that we don't gain the performance, we can only not lose or lose minimally. Therefore, the utmost concern of the implementor of a generic functionality in Mathematica is performance. I speak from personal experience here. Now, wrapping data structures into wrappers like Dataset, which provide convenient syntax, is a separate step. I don't mean to say that Dataset is just a wrapper, because it is not (it generates powerful queries, ... | |
| Nov 28, 2014 at 17:32 | comment | added | alancalvitti | @LeonidShifrin, I understand the distinction but the reality in statistical analysis is that queries will vary w/ the specifics of the task and (often noSQL) data structures provided. Dataset, Query and even operator form have already reduced my codebase & dev time >5x over V9. It's not a "wrapper". The ideal solution is Dataset syntax w/ fast algorithms. For example, how easily can you modify your code to handle frequencies like here: mathematicaforprediction.wordpress.com/2013/12/06/… | |
| Nov 28, 2014 at 16:42 | comment | added | Leonid Shifrin | @alancalvitti My point has been that it is important to distinguish different aspects. Trie by itself is a specialized data structure, and as such, should be "mean and lean": efficient in the first place, and having some simple well-defined API. How to make it syntactically more pleasant is a different question. One may use Dataset, or some custom wrappers, or whatever. But when things get mixed, the result is that it is much harder to understand performance. In the context of Mathematica, this is particularly important. | |
| Nov 28, 2014 at 14:43 | comment | added | alancalvitti | @LeonidShifrin, in an increasingly agile world, dev time is the bottleneck vs computing time - yet worth optimizing. Based on real data ~250m LeafCount (time series, etc) it often takes longer just to Import and thread keys to construct Associations than to execute the recursive tries, which I use routinely b/c they are extremely useful in indexing & analysis. | |
| Nov 27, 2014 at 16:12 | comment | added | Leonid Shifrin | @alancalvitti I have updated my post below. As to Dataset, I think it will be an overkill and actually not an appropriate thing to do to use it for a trie. It will also be much slower. | |
| Nov 26, 2014 at 19:11 | comment | added | alancalvitti | @LeonidShifrin, when you have the chance, could you look at the link above - btw, for some applications the query can be reduced to a single If- currently testing. Association and Dataset provide very compact syntax - so it's worth optimizing. | |
| Nov 26, 2014 at 19:03 | comment | added | alancalvitti | @Mr.Wizard, will do. Have a good break - hope you'll be back. | |
| Nov 26, 2014 at 17:03 | comment | added | Mr.Wizard | @alancalvitti I am taking a bit of a break from Mathematica and I haven't explored Dataset as an alternative to Leonid's code. I suggest you ask Leonid his opinion of this. | |
| Nov 23, 2014 at 18:38 | comment | added | alancalvitti | @Mr.Wizard, related mathematica.stackexchange.com/questions/8111/building-a-tree. What do you think of the approach via Dataset in terms of performance? | |
| May 28, 2012 at 7:18 | history | edited | Szabolcs | edited tags | |
| May 11, 2012 at 18:33 | comment | added | Leonid Shifrin | Now I finally have a definite answer to the second part of your question: yes, there are non-trivial cases where trie implementation will work much faster than, say, DictionaryLookup, due to the nature of the problem. I added a link to one such in my edit. | |
| Mar 27, 2012 at 3:43 | comment | added | Mr.Wizard | @István add tags as you see fit. | |
| Mar 26, 2012 at 12:39 | comment | added | István Zachar | Strange I don't remember this question at all... Admit that moderators above 10K are allowed to use SE time machines. Also, I wonder whether a tree or tree-graph tag would be useful, as it would covere some questions like this, would fit nicely between list-manipulation and graphs-and-networks (though being a subset of the latter) and (as my major reason) would have helped me to find this post earlier. | |
| Mar 26, 2012 at 11:33 | comment | added | Mr.Wizard | @István well, it seems that it's clear to you. I'll read your answer in detail tomorrow. :-) | |
| Mar 26, 2012 at 11:32 | comment | added | Mr.Wizard | @István no, look at the date on this. It is not clear to me how this structure would be used to solve that problem, though it does appear related. (back to lurking now...) | |
| Mar 26, 2012 at 8:32 | answer | added | István Zachar | timeline score: 12 | |
| Mar 26, 2012 at 8:06 | comment | added | István Zachar | Did this arise because of this question? If so, then once again, thank you for providing the exact formal name of the problem! :) | |
| Mar 25, 2012 at 22:27 | history | edited | nixeagle | edited tags | |
| Jan 28, 2012 at 23:57 | vote | accept | Mr.Wizard | ||
| Jan 21, 2012 at 15:55 | answer | added | Leonid Shifrin | timeline score: 37 | |
| Jan 21, 2012 at 13:22 | history | asked | Mr.Wizard | CC BY-SA 3.0 |