Extract number frollowing a specific string with special chars in a large text file using python

Question

I have large data files (CSV type) that I read with pandas. Each files has a information column that has many names and numbers seperated by ;. Below how this column looks like:

0 Acid: 74.1 [°C];LeakRate [Bar/Min]: 103 ;P: ... 1 Acid: 73.9 [°C]; LeakRate [µBar/Min]: 371 ; ... 2 Acid: 73.9 [°C]; LeakRate [µBar/Min]: 107 ; ... 3 Acid: 73.9 [°C]; LeakRate [µBar/Min]: 371 ; ... 4 Acid: 74.0 [°C]; LeakRate [µBar/Min]: 107 ; ... Name: Information, dtype: object

I use string split to separate using following code line and then get for example LeakRate [µBar/Min] and corresponding measurement that is 103 in zero index above.

 df["Information"]str.split(";", expand=True)[1].str.split(":", expand=True)[1]

Unfortunately the data files that are produced are not always same, so positions are not always same. Therefore, I would like to locate specific string with special chars such as LeakRate [µBar/Min] and then get the corresponding numbers so as to be able to plot them for further analysis.

Has anyone know a easy way doing it? I am new in python, so I appreciate any help.

Thanks,

Eala

The units for LeakRate are different in the first two records. Should they be the same? And should the units be the same for all variables, for all records? — Bill Bell
– Bill Bell, Commented Jan 9, 2020 at 20:54
Perhaps you could post the first few records of the csv on pastebin? — Bill Bell
– Bill Bell, Commented Jan 9, 2020 at 20:57
You can probably use the approach offered in my answer at stackoverflow.com/a/49014385/131187. Of course you would need to adjust how you parse the csv records. — Bill Bell
– Bill Bell, Commented Jan 9, 2020 at 21:32

Trevor Siemens · Accepted Answer · 2020-01-09 21:14:14Z

This sounds like you want to figure out the column index beforehand.

This could be done as:

firstRow = ... leakRateCols = [i for i, val in enumerate(firstRow["Information"].str.split(";")) if 'LeakRate' in val] if len(leakRateCols) > 1: # Raise some error here, because there are multiple columns with LeakRate. leakRateCol = leakRateCols[0] for df in ...: ... = df["Information"].str.split(";")[leakRateCol].str.split(":")[1]

You may want to look into using the csv library though. Might be useful to you. https://docs.python.org/3/library/csv.html

Collectives™ on Stack Overflow

Extract number frollowing a specific string with special chars in a large text file using python

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related