1

I am not sure what I am doing wrong here, I am simply trying to call a function with a if-then-else filter in it and apply to a dataframe.

In [7]: df.dtypes Out[7]: Filler float64 Spot float64 Strike float64 cp object mid float64 vol float64 usedvol float64 dtype: object In [8]: df.head() Out[8]: Filler Spot Strike cp mid vol 0 0.0 100 50 c 0.0 25.0 1 0.0 100 50 p 0.0 25.0 2 1.0 100 55 c 1.0 24.5 3 1.0 100 55 p 1.0 24.5 4 2.5 100 60 c 2.5 24.0 

I have the below function:

def badvert(df): if df['vol']>24.5: df['vol2'] = df['vol']*2 else: df['vol2'] = df['vol']/2 return(df['vol2']) 

Which I call here:

df['vol2']=badvert(df) 

Which generates this error message:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-14-bbf7a11a17c9> in <module>() ----> 1 df['vol2']=badvert(df) <ipython-input-13-6132a4be33ca> in badvert(df) 1 def badvert(df): ----> 2 if df['vol']>24.5: 3 df['vol2'] = df['vol']*2 4 else: 5 df['vol2'] = df['vol']/2 C:\Users\camcompco\AppData\Roaming\Python\Python34\site-packages\pandas\core\generic.py in __nonzero__(self) 712 raise ValueError("The truth value of a {0} is ambiguous. " 713 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." --> 714 .format(self.__class__.__name__)) 715 716 __bool__ = __nonzero__ ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

My gut tells me that this is a simple "syntax" issue but I am at a loss. Any help would be greatly appreciated

2 Answers 2

3

You want to apply it to each row, this will do what you want using apply and a lambda:

df["vol2"] = df.apply(lambda row: row['vol'] * 2 if row['vol'] > 24.5 else row['vol'] / 2, axis=1) print(df) 

Which should output something like:

 Filler Spot Strike cp mid vol vol2 0 0.0 100 50 c 0.0 25.0 50.00 1 0.0 100 50 p 0.0 25.0 50.00 2 1.0 100 55 c 1.0 24.5 12.25 3 1.0 100 55 p 1.0 24.0 12.00 4 2.5 100 60 c 2.5 24.0 12.00 

Or using your own function:

def badvert(df): if df['vol']>24.5: df['vol2'] = df['vol']*2 else: df['vol2'] = df['vol']/2 return df['vol2'] df["vol2"] = df.apply(badvert,axis=1) 

axis=0 applies the function to each column, axis=1 applies function to each row.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much! And +1 for the multiple solutions
3

df.apply has performance comparable to a Python for-loop. Sometimes using apply or a for-loop to compute row-by-row is unavoidable, but in this case a quicker alternative would be express the calculation as one done on whole columns.

Because of the way the underlying data is stored in a DataFrame, and since there are usually many more rows than columns, calculations done on whole columns is usually quicker than calculations done row-by-row:

df['vol2'] = np.where(df['vol']>24.5, df['vol']*2, df['vol']/2) 

1 Comment

thanks for the added color . . . .I just checked, in my application, this is 5.3 times faster than my "def" solution and 2.8 times faster than the df,apply approach. Thank you

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.