1

I have a large set of data with small values in the order of 10^-3. However they're written without the "e"; for example, 9.993e-3 is written as 9.993-3 in the file. This creates problems in numpy. How can I add that "e" in between the last digit and the negative sign?

Some notes: The data set is contained in a file with no delimiters (number occupies a fixed width) and some of the data aren't necessarily in the order of 10^-3.

Small snippet of data:

11.752811.950003-6.973-3 11.772591.950003 -.00636 11.792371.950002-5.746-3 
2
  • You mention a fixed width, but you didn't tell us what that width was. It is 8? Commented Feb 26, 2018 at 19:06
  • 1
    Yep it's 8, sorry I left that out. Commented Feb 26, 2018 at 19:09

4 Answers 4

1

Here's a solution for what I understand the format of your data to be. Please confirm exact format if I've misunderstood.

data = '11.752811.950003-6.973-311.772591.950003 -.0063611.792371.950002-5.746-3' fixed_width = 8 numbers = [] for i in range(0, len(data), fixed_width): num = data[i:i+fixed_width] num = float(num[:2] + num[2:].replace('-', 'E-')) numbers.append(num) 
Sign up to request clarification or add additional context in comments.

6 Comments

My data has a bit of a mix. Some positive, some negative, some are in the order of 1, some are in the order of 10, some are 10^-2, etc. I've updated my question with three rows of the 72k rows I'm dealing with.
From the data given, I suspect that the fixed width is 8, not 7. (24 characters total per line ignoring the line-end characters, and 3 decimal points per line.)
Updated my answer after data sample was posted. It's a bit hacky (making an assumption that the negative sign if exists will always be in first two characters), but it works at least for the example data.
Also, this won't work if your actual data contains newline characters, which I now suspect it might from your comment saying that your data contains rows.
It does. I appreciate your input, though. This could be a step in the right direction.
|
0
s='9.993-3' s1=s[:s.index('-')] + 'e' + s[s.index('-'):] print(s1) 

Comments

0

When you reference the data you can just use:

str.replace("-", "e-") 

Or if you have some negative data values use regex:

import re for line in some_file: line = re.sub(r'(?<=[1-9]{1,})(-)', 'e-', line) 

Comments

0
import re data = '11.752811.950003-6.973-311.772591.950003 -.0063611.792371.950002-5.746-3' 1. create list of 8-digit numbers dataList = re.findall(r'.{8}', data) 2. add e to the numbers that need it >>>[re.sub(r'(-\d)$', r'e\1',number) for number in dataList] ['11.75281', '1.950003', '-6.973e-3', '11.77259', '1.950003', ' -.00636', '11.79237', '1.950002', '-5.746e-3'] 

Comments