0

I have large data files (CSV type) that I read with pandas. Each files has a information column that has many names and numbers seperated by ;. Below how this column looks like:

0 Acid: 74.1 [°C];LeakRate [Bar/Min]: 103 ;P: ... 1 Acid: 73.9 [°C]; LeakRate [µBar/Min]: 371 ; ... 2 Acid: 73.9 [°C]; LeakRate [µBar/Min]: 107 ; ... 3 Acid: 73.9 [°C]; LeakRate [µBar/Min]: 371 ; ... 4 Acid: 74.0 [°C]; LeakRate [µBar/Min]: 107 ; ... Name: Information, dtype: object 

I use string split to separate using following code line and then get for example LeakRate [µBar/Min] and corresponding measurement that is 103 in zero index above.

 df["Information"]str.split(";", expand=True)[1].str.split(":", expand=True)[1] 

Unfortunately the data files that are produced are not always same, so positions are not always same. Therefore, I would like to locate specific string with special chars such as LeakRate [µBar/Min] and then get the corresponding numbers so as to be able to plot them for further analysis.

Has anyone know a easy way doing it? I am new in python, so I appreciate any help.

Thanks,

Eala

4
  • The units for LeakRate are different in the first two records. Should they be the same? And should the units be the same for all variables, for all records? Commented Jan 9, 2020 at 20:54
  • Perhaps you could post the first few records of the csv on pastebin? Commented Jan 9, 2020 at 20:57
  • You can probably use the approach offered in my answer at stackoverflow.com/a/49014385/131187. Of course you would need to adjust how you parse the csv records. Commented Jan 9, 2020 at 21:32
  • All units are constant. It is a typo error. Commented Jan 10, 2020 at 11:59

1 Answer 1

0

This sounds like you want to figure out the column index beforehand.

This could be done as:

firstRow = ... leakRateCols = [i for i, val in enumerate(firstRow["Information"].str.split(";")) if 'LeakRate' in val] if len(leakRateCols) > 1: # Raise some error here, because there are multiple columns with LeakRate. leakRateCol = leakRateCols[0] for df in ...: ... = df["Information"].str.split(";")[leakRateCol].str.split(":")[1] 

You may want to look into using the csv library though. Might be useful to you. https://docs.python.org/3/library/csv.html

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.