Timeline for File-backed lists/variables for handling large data

Current License: CC BY-SA 3.0

26 events

when toggle format	what		by	license	comment
Jun 6, 2017 at 6:53	comment	added	b3m2a1		@Szabolcs I know this is question is mad old, but it turns out WDX can support access to certain explicit positions. I dredged this up when invesitgating how to work with data paclets. See this: mathematica.stackexchange.com/a/146139/38205 for a quick rundown of the layout. Unless, of course, you already knew this and there's a subtlety I'm missing. If so please do let me know because it's always good to learn these things.
May 23, 2017 at 12:35	history	edited	CommunityBot		replaced http://stackoverflow.com/ with https://stackoverflow.com/
Feb 3, 2016 at 21:45	comment	added	Athanassios		Sure, I understand, and many thanks for sharing this with the rest of us. As I said playing at the data model level is certainly easier, but you have to rely on the DBMS data storage. This is how I have been researching solutions on data modeling. I believe Mathematica has to be enhanced with a similar data structure, in-memory processing as that of Qlikview. That will boost popularity and it will make it super efficient with large volumes of data. Then you only have to combine this with a similar type DBMS.
Feb 3, 2016 at 21:30	comment	added	Leonid Shifrin		@Athanassios Well, I wasn't setting too ambitious goals for this answer, I just tried to get a minimal framework to address the basic needs specific to Mathematica workflows.
Feb 3, 2016 at 21:17	comment	added	Athanassios		@LeonidShifrin answer and comments like the one from telefunkenvf14 suggest that the direction of research for such problems is that of database technology. We are in the era of NoSQL databases and if you are not going to reinvent the wheel on the I/O low-level details of such a DBMS then at a higher level you have to switch the way you normally think, i.e from records (rows) to fields (columns) to values (cells). In data modeling terms the problem you face is that of redundancy. Single-instance storage and associative technology like that in Qlikview are in the right direction.
Sep 3, 2015 at 21:39	history	edited	Leonid Shifrin		edited tags
S Dec 12, 2012 at 17:15	history	bounty ended	whuber
S Dec 12, 2012 at 17:15	history	notice removed	whuber
Dec 5, 2012 at 22:16	comment	added	Szabolcs		@Chris Unfortunately that is very very slow with large data, and also produces huge files.
Dec 5, 2012 at 16:36	comment	added	Chris Degnen		Rather than `DumpSave["mydata.mx", mydata]` try `Save["mydata.sav", mydata]`. I find it very useful.
S Dec 5, 2012 at 16:28	history	bounty started	whuber
S Dec 5, 2012 at 16:28	history	notice added	whuber		Reward existing answer
Dec 3, 2012 at 11:44	history	edited	Mechanical snail		edited tags
Jan 25, 2012 at 23:56	history	tweeted			twitter.com/#!/StackMma/status/162323088837591042
Jan 25, 2012 at 17:33	vote	accept	Szabolcs
Jan 18, 2012 at 20:36	answer	added	Leonid Shifrin		timeline score: 116
Jan 18, 2012 at 4:33	history	edited	Mike Bailey		edited tags
Jan 18, 2012 at 1:22	comment	added	Mike Honeychurch		@Szabolcs as an FYI I had previously stored all my (economic/financial) data as WDX prior to switching it over to MySQL a couple of years ago. It has made life so much easier now to update and retrieve. For your problem it seems to me that databases are designed for these sorts of tasks. Also see Sal Mangano's talk about kdb+ if extracting columns rather than rows is better for what you specifically want to do. The advantages appear to be many orders of magnitude speed enhancement. I don't have links handy but should be easy to find.
Jan 18, 2012 at 1:18	comment	added	acl		@MikeB that doesn't scale though, and is inconvenient. I generally do the same (having access to machines with 512GB of RAM helps), but I'd like to know how to do things in a more reasonable way.
Jan 18, 2012 at 0:14	comment	added	Szabolcs		@MikeHoneychurch You're right, I only need chunks. I asked about such a large amount of data to get a good feel about the loading speed. Loading the data will hopefully not be the bottleneck. (Compare WDX loading speed to MX, there's huge difference)
Jan 17, 2012 at 23:31	comment	added	Mike Bailey		This is one of the times that I've taken the brute force approach: Just throw more memory at the problem. On my last update I upgraded my machine to 12 GB of RAM. On some simulations I was running it was easily eating up in excess of 2 GB per kernel (= 8 GB total). It would be convenient to have some way of streaming data in and out of kernels though.
Jan 17, 2012 at 23:20	comment	added	Mike Honeychurch		@Szabolcs I haven't worked with stuff that large and I would imagine that you will run into Mma limitations. From your background and question I thought you only wanted to bring into Mma "chunks" of data on demand. In other words do you really need an entire 2GB or can you do some SQL operations to pick out what you need?
Jan 17, 2012 at 23:12	comment	added	Szabolcs		@Mike I have never done that, it's good to hear experiences in how well that works. E.g. how long would it take to load 2 GB of data into Mathematica, compared to MX files?
Jan 17, 2012 at 23:11	comment	added	Mike Honeychurch		Personally I find life so much easier by having data in a database and linking to Mma.
Jan 17, 2012 at 22:57	comment	added	Simon		Using databases in Mathematica was discussed in Using Mathematica in MySQL databases. I know that QLink is used in some fairly large Feynman diagram calculations...
Jan 17, 2012 at 22:36	history	asked	Szabolcs	CC BY-SA 3.0

toggle format