2

Found a solution using .fillna

As you can guess, my title is already confusing, and so am I! I have a dataframe like this

Index Values 0 NaN 1 NaN ................... 230 350.21 231 350.71 ................... 1605 922.24 

Between 230 and 1605 I have values, but not for the first 229 entries. So I calculated the slope to approximate the missing data and stored it in 'slope'.

Y1 = df['Values'].min() X1ID = df['Values'].idxmin() Y2 = df['Values'].max() X2ID = df['Values'].idxmax() slope = (Y2 - Y1)/(X2ID - X1ID) 

In essence I want to get the .min from Values, subtract the slope and insert the new value in the index before the previous .min. However, I am completely lost, I tried something like this:

 df['Values2'] = df['Values'].min().apply(lambda x: x.min() - slope) 

But that is obviously rubbish. I would greatly appreciate some advise

EDIT

So after trying multiple ways I found a crude solution that at least works for me.

loopcounter = 0 missingValue = [] missingindex = [] missingindex.append(loopcounter) missingValue.append(Y1) for minValue in missingValue: minValue = minValue-slopeave missingValue.append(minwavelength) loopcounter +=1 missingindex.append(loopcounter) if loopcounter == 230: break del missingValue[0] missingValue.reverse() del missingindex[-1] 

First I created two lists, one is for the missing values and the other for the index. Afterwards I added my minimum Value (Y1) to the list and started my loop. I wanted the loop to stop after 230 times (the amount of missing Values) Each loop would subtract the slope from the items in the list, starting with the minimum value while also adding the counter to the missingindex list.

Deleting the first value and reversing the order transformed the list into the correct order.

missValue = dict(zip(missingindex,missingValue)) 

I then combined the two lists into a dictionary

df['Values'] = df['Values'].fillna(missValue) 

Afterwards I used the .fillna function to fill up my dataframe with the dictionary.

This worked for me, I know its probably not the most elegant solution...

I would like to thank everyone that invested their time in trying to help me, thanks a lot.

3 Answers 3

2

Check this. However, I feel you would have to put this is a loop, as the insertion and min calculation has to do the re-calculation

import pandas as pd import numpy as np df = pd.DataFrame(columns=('Values',),data= [ np.nan, np.nan, 350.21, 350.71, 922.24 ]) Y1 = df['Values'].min() X1ID = df['Values'].idxmin() Y2 = df['Values'].max() X2ID = df['Values'].idxmax() slope = (Y2 - Y1)/(X2ID - X1ID) line = pd.DataFrame(data=[Y1-slope], columns=('Values',), index=[X1ID]) df2 = pd.concat([df.ix[:X1ID-1], line, df.ix[X1ID:]]).reset_index(drop=True) print df2 

The insert logic is provided here Is it possible to insert a row at an arbitrary position in a dataframe using pandas?

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your reply. Its likely to be necessary to be looped. Ill look into it
1

I think you can use loc with interpolate:

print df Values Index 0 NaN 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 229 NaN 230 350.21 231 350.71 1605 922.24 #add value 0 to index = 0 df.at[0, 'Values'] = 0 #add value Y1 - slope (349.793978) to max NaN value df.at[X1ID-1, 'Values'] = Y1 - slope print df Values Index 0 0.000000 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 229 349.793978 230 350.210000 231 350.710000 1605 922.240000 
print df.loc[0:X1ID-1, 'Values'] Index 0 0.000000 1 NaN 2 NaN 3 NaN 4 NaN 5 NaN 6 NaN 229 349.793978 Name: Values, dtype: float64 #filter values by indexes and interpolate df.loc[0:X1ID-1, 'Values'] = df.loc[0:X1ID-1, 'Values'].interpolate(method='linear') print df Values Index 0 0.000000 1 49.970568 2 99.941137 3 149.911705 4 199.882273 5 249.852842 6 299.823410 229 349.793978 230 350.210000 231 350.710000 1605 922.240000 

2 Comments

This seems to technically work but it somehow are not expected values, it seems to decrease very fast. Sadly the slope can only be calculated over longer ranges because its more like stairs and not strictly linear.
Maybe try change interpolate method.
1

I will revise this a little bit:

df['Values2'] = df['Values'] df.ix[df.Values2.isnull(), 'Values2'] = (Y1 - slope) 

EDIT

Or try to put this in a loop like below. This will recursively fill in all values until it reaches the end of the series:

def fix_rec(series): Y1 = series.min() X1ID = series.idxmin() ##; print(X1ID) Y2 = series.max() X2ID = series.idxmax() slope = (Y2 - Y1) / (X2ID - X1ID); if X1ID == 0: ## termination condition return series series.loc[X1ID-1] = Y1 - slope return fix_rec(series) 

call it like this:

df['values2'] = df['values'] fix_rec(df.values2) 

I hope that helps!

1 Comment

Thanks for your suggestion. Now we get the same value for all 229 entries but they should decrease further and further the closer we get to 0. Maybe we need to loop because technically the .min of values (y1) should always be a new one after each subtraction.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.