4
$\begingroup$

I have some data with 19000 sublists such as :

{" 7.9080000e+01 1.9283193e+04"} 

Where the first number is the value for variable A and the second for variable B.

All my attempts to transform this format have failed so far. I think my best guess was using ToExpression unsuccessfully.

How can I transform such lists to a "plottable" format by

  • Changing the String format ?
  • Computing the e ?
  • Import the data differently ?
$\endgroup$
2
  • 5
    $\begingroup$ How did you import this data? Import supports this (very common) number format: ImportString["12e3", "Table"] $\endgroup$ Commented Jan 24, 2012 at 16:29
  • $\begingroup$ Can you test Andy Ross and Mr Wizard solutions on your real world problem and see which is faster and post? $\endgroup$ Commented Jan 24, 2012 at 22:18

3 Answers 3

11
$\begingroup$

You should be able to use ReadList on the string contents of each sublist. Here I'm just creating a small list containing three elements identical to the one you provided. The result can be plotted using ListPlot for example.

In[20]:= in = {{" 7.9080000e+01 1.9283193e+04"}, {" 7.9080000e+01 1.9283193e+04"}, {" 7.9080000e+01 1.9283193e+04"}}; In[22]:= Table[ReadList[StringToStream@First[i], Number], {i, in}] Out[22]= {{79.08, 19283.2}, {79.08, 19283.2}, {79.08, 19283.2}} 

EDIT:

Due to the comments I should point out that this Table is going to produce an array that is not packed. This means that the evaluator isn't aware ahead of time that all of the values are a particular type (namely real in this case) and so it is going to lean toward more general methods and is going to consume more memory to store the table.

As the documentation for Developer`ToPackedArray points out, using Developer`ToPackedArray will not change results generated by Mathematica, but can enhance speed of execution and reduce memory usage.

In order to pack the result we can simply use ruebenko's suggestion placing Developer`ToPackedArray@ in front of our Table.

TESTING EDIT:

I decided to test whether ImportString proposed by Mr. Wizard or the ReadList approach might be faster. In fairness I separated the ExportString out presuming that the string would already be saved somewhere for importing. It appears that ReadList is much faster at least for the fabricated example I've created here. I'd be curious to see if this is true for 500's data.

In[21]:= data = Table[" 7.9080000e+01 1.9283193e+04", {5000}]; In[22]:= Export["numbers.txt", data]; In[23]:= in = Partition[ReadList[StringToStream@Import["numbers.txt", "Plaintext"], Record], 1]; In[24]:= (andyr = Table[ReadList[StringToStream@First[i], Number] , {i, in}]); // AbsoluteTiming Out[24]= {0.0780015, Null} In[25]:= str = ExportString[in, "Table"]; In[26]:= (mrwiz = ImportString[str, "Table"]); // AbsoluteTiming Out[26]= {4.1340795, Null} In[27]:= andyr === mrwiz Out[27]= True 

I should also point out that this comparison is only fair if we assume that the data is already in memory. If not, the cost for Importing should be factored in to the ReadList approach.

$\endgroup$
15
  • 1
    $\begingroup$ You may want to add a couple of Developer`ToPackedArray@ in front of ReadList and Table. $\endgroup$ Commented Jan 24, 2012 at 16:37
  • $\begingroup$ @ruebenko, true. But, why would a novice want to do that? $\endgroup$ Commented Jan 24, 2012 at 18:14
  • 1
    $\begingroup$ @ruebenko, I was trying to get you to explain what a packed array is and does for you. $\endgroup$ Commented Jan 24, 2012 at 18:54
  • 2
    $\begingroup$ @rcollyer, misunderstood that. The reason is that the OP would possibly benefit from less memory being consumed and a possibly speedier consumption of subsequent commands. $\endgroup$ Commented Jan 24, 2012 at 20:04
  • 1
    $\begingroup$ @Mr.Wizard this question came up recently for me when dealing with a custom file format of mine. With out knowing the length of the data section (an oversight in the format :P), it is often easier to read each line in as a string and process from there. So, it isn't necessarily a one time problem, unfortunately. $\endgroup$ Commented Apr 14, 2012 at 16:33
9
$\begingroup$

As Szabolcs points out, this may be imported by Import. Here's one way to discover this: assign expr = " 7.9080000e+01 1.9283193e+04", then use ImportString on it with all possible import formats, discard those that return $Failed, and look at the results:

Grid[ DeleteCases[ Quiet[{#, ImportString[expr, #]}] & /@ $ImportFormats, x_List /; (Last[x] == $Failed) ], Frame -> All ] 

(you need Quiet because many of the $ImportFormats choke on this string). This produces a long table, in which one finds this (somewhere in the middle):

enter image description here

So Import[file,"Table"] will probably work.

That I found this was easier than finding this information in the documentation is interesting!

$\endgroup$
7
$\begingroup$

I echo Szabolcs's comment that you should probably have done this on acquisition, but now you could use this:

dat = { {" 7.9080000e+01 1.9283193e+04"}, {" 7.9080000e+01 1.9283193e+04"}, {" 7.9080000e+01 1.9283193e+04"} }; ImportString@ExportString[dat, "Table"] 
{{79.08, 19283.2}, {79.08, 19283.2}, {79.08, 19283.2}} 
$\endgroup$
4
  • $\begingroup$ I interpreted the question as "which import format should I have used?" (which you do in fact answer) $\endgroup$ Commented Jan 24, 2012 at 16:40
  • $\begingroup$ @acl, and that was actually a better question :-) Makes everything easier ! $\endgroup$ Commented Jan 24, 2012 at 16:59
  • $\begingroup$ @Mr Wizard thank You, While, as suggested, i am now acting on acquisition. ImportString@ExportString is really cool ! $\endgroup$ Commented Jan 25, 2012 at 7:34
  • $\begingroup$ @acl I answered the actual question, which I believe was asking for a quick fix to the immediate problem. $\endgroup$ Commented Jan 25, 2012 at 10:54

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.