Timeline for File-backed lists/variables for handling large data
Current License: CC BY-SA 3.0
26 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jun 6, 2017 at 6:53 | comment | added | b3m2a1 | @Szabolcs I know this is question is mad old, but it turns out WDX can support access to certain explicit positions. I dredged this up when invesitgating how to work with data paclets. See this: mathematica.stackexchange.com/a/146139/38205 for a quick rundown of the layout. Unless, of course, you already knew this and there's a subtlety I'm missing. If so please do let me know because it's always good to learn these things. | |
| May 23, 2017 at 12:35 | history | edited | CommunityBot | replaced http://stackoverflow.com/ with https://stackoverflow.com/ | |
| Feb 3, 2016 at 21:45 | comment | added | Athanassios | Sure, I understand, and many thanks for sharing this with the rest of us. As I said playing at the data model level is certainly easier, but you have to rely on the DBMS data storage. This is how I have been researching solutions on data modeling. I believe Mathematica has to be enhanced with a similar data structure, in-memory processing as that of Qlikview. That will boost popularity and it will make it super efficient with large volumes of data. Then you only have to combine this with a similar type DBMS. | |
| Feb 3, 2016 at 21:30 | comment | added | Leonid Shifrin | @Athanassios Well, I wasn't setting too ambitious goals for this answer, I just tried to get a minimal framework to address the basic needs specific to Mathematica workflows. | |
| Feb 3, 2016 at 21:17 | comment | added | Athanassios | @LeonidShifrin answer and comments like the one from telefunkenvf14 suggest that the direction of research for such problems is that of database technology. We are in the era of NoSQL databases and if you are not going to reinvent the wheel on the I/O low-level details of such a DBMS then at a higher level you have to switch the way you normally think, i.e from records (rows) to fields (columns) to values (cells). In data modeling terms the problem you face is that of redundancy. Single-instance storage and associative technology like that in Qlikview are in the right direction. | |
| Sep 3, 2015 at 21:39 | history | edited | Leonid Shifrin | edited tags | |
| S Dec 12, 2012 at 17:15 | history | bounty ended | whuber | ||
| S Dec 12, 2012 at 17:15 | history | notice removed | whuber | ||
| Dec 5, 2012 at 22:16 | comment | added | Szabolcs | @Chris Unfortunately that is very very slow with large data, and also produces huge files. | |
| Dec 5, 2012 at 16:36 | comment | added | Chris Degnen | Rather than DumpSave["mydata.mx", mydata] try Save["mydata.sav", mydata]. I find it very useful. | |
| S Dec 5, 2012 at 16:28 | history | bounty started | whuber | ||
| S Dec 5, 2012 at 16:28 | history | notice added | whuber | Reward existing answer | |
| Dec 3, 2012 at 11:44 | history | edited | Mechanical snail | edited tags | |
| Jan 25, 2012 at 23:56 | history | tweeted | twitter.com/#!/StackMma/status/162323088837591042 | ||
| Jan 25, 2012 at 17:33 | vote | accept | Szabolcs | ||
| Jan 18, 2012 at 20:36 | answer | added | Leonid Shifrin | timeline score: 116 | |
| Jan 18, 2012 at 4:33 | history | edited | Mike Bailey | edited tags | |
| Jan 18, 2012 at 1:22 | comment | added | Mike Honeychurch | @Szabolcs as an FYI I had previously stored all my (economic/financial) data as WDX prior to switching it over to MySQL a couple of years ago. It has made life so much easier now to update and retrieve. For your problem it seems to me that databases are designed for these sorts of tasks. Also see Sal Mangano's talk about kdb+ if extracting columns rather than rows is better for what you specifically want to do. The advantages appear to be many orders of magnitude speed enhancement. I don't have links handy but should be easy to find. | |
| Jan 18, 2012 at 1:18 | comment | added | acl | @MikeB that doesn't scale though, and is inconvenient. I generally do the same (having access to machines with 512GB of RAM helps), but I'd like to know how to do things in a more reasonable way. | |
| Jan 18, 2012 at 0:14 | comment | added | Szabolcs | @MikeHoneychurch You're right, I only need chunks. I asked about such a large amount of data to get a good feel about the loading speed. Loading the data will hopefully not be the bottleneck. (Compare WDX loading speed to MX, there's huge difference) | |
| Jan 17, 2012 at 23:31 | comment | added | Mike Bailey | This is one of the times that I've taken the brute force approach: Just throw more memory at the problem. On my last update I upgraded my machine to 12 GB of RAM. On some simulations I was running it was easily eating up in excess of 2 GB per kernel (= 8 GB total). It would be convenient to have some way of streaming data in and out of kernels though. | |
| Jan 17, 2012 at 23:20 | comment | added | Mike Honeychurch | @Szabolcs I haven't worked with stuff that large and I would imagine that you will run into Mma limitations. From your background and question I thought you only wanted to bring into Mma "chunks" of data on demand. In other words do you really need an entire 2GB or can you do some SQL operations to pick out what you need? | |
| Jan 17, 2012 at 23:12 | comment | added | Szabolcs | @Mike I have never done that, it's good to hear experiences in how well that works. E.g. how long would it take to load 2 GB of data into Mathematica, compared to MX files? | |
| Jan 17, 2012 at 23:11 | comment | added | Mike Honeychurch | Personally I find life so much easier by having data in a database and linking to Mma. | |
| Jan 17, 2012 at 22:57 | comment | added | Simon | Using databases in Mathematica was discussed in Using Mathematica in MySQL databases. I know that QLink is used in some fairly large Feynman diagram calculations... | |
| Jan 17, 2012 at 22:36 | history | asked | Szabolcs | CC BY-SA 3.0 |