3

I have two dataframes (df_1, df_2):

df_1 = pd.DataFrame({'O' : [1,2,3], 'M' : [2,8,3]}) df_2 = pd.DataFrame({'O' : [1,1,1, 2,2,2, 3,3,3], 'M' : [9,2,4, 6,7,8, 5,3,4], 'X' : [2,4,6, 4,8,7, 3,1,9], 'Y' : [3,6,1, 4,6,5, 1,0,7], 'Z' : [2,4,8, 3,5,4, 7,5,1]}) 

and a function (fun):

# Index df_1 = df_1.set_index('O') df_1_M = df_1.M df_1_M = df_1_M.sort_index() # Fun def fun(z, *params): A,B,C = z # Score df_2['S'] = df_2['X']*A + df_2['Y']*B + df_2['Z']*C # Top score df_Sort = df_2.sort_values(['S', 'X', 'M'], ascending=[False, True, True]) df_O = df_Sort.set_index('O') M_Top = df_O[~df_O.index.duplicated(keep='first')].M M_Top = M_Top.sort_index() # Compare the top scoring row for each O to df_1 df_1_R = df_1_M.reindex(M_Top.index) # Nan T_N_T = M_Top == df_1_R # Record the results for the given values of A,B,C df_Res = pd.DataFrame({'it_is':T_N_T}) # is this row of df_1 the same as this row of M_Top? # p_hat = TP / (TP + FP) p_hat = df_Res.sum() / len(df_Res.index) return -p_hat 

I can optimise it using brute force:

from scipy.optimize import brute # Range min_ = -2 max_ = 2 step = .5 ran_ge = slice(min_, max_+step, step) ranges = (ran_ge,ran_ge,ran_ge) # Params params = (df_1, df_2) # Brute resbrute = brute(fun,ranges,args=params,full_output=True,finish=None) print('Global maximum ', resbrute[0]) print('Function value at global maximum ',-resbrute[1]) 

Which gives:

Global maximum [-2. 0.5 1.5] Function value at global maximum 0.6666666666666666 

But that takes too long when the dimensionality and the resolution increase. To save time, I would like to optimise it by differential evolution (DE) please. I tried:

from scipy.optimize import differential_evolution # Bounds min_ = -2 max_ = 2 ran_ge = (min_, max_) bounds = [ran_ge,ran_ge,ran_ge] # Params params = (df_1, df_2) # DE DE = differential_evolution(fun,bounds,args=params) 

But I got:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). 

Any ideas why it works by brute force but not by differential evolution please? How do I get it working by differential evolution?

1 Answer 1

2
+50

Looking at the code, the fun(z, *params) functions return a series, and differential_evolution doesn't know how to handle it.

# pandas.core.series.Series type(p_hat) 

Changed the return value of fun(z, *params) to:

return -p_hat[0] 

We get the correct answer:

# Function value at global maximum 0.6666666666666666 print('Function value at global maximum ',-DE.fun) 

Code Fix:

import pandas as pd df_1 = pd.DataFrame({'O' : [1,2,3], 'M' : [2,8,3]}) df_2 = pd.DataFrame({'O' : [1,1,1, 2,2,2, 3,3,3], 'M' : [9,2,4, 6,7,8, 5,3,4], 'X' : [2,4,6, 4,8,7, 3,1,9], 'Y' : [3,6,1, 4,6,5, 1,0,7], 'Z' : [2,4,8, 3,5,4, 7,5,1]}) # Index df_1 = df_1.set_index('O') df_1_M = df_1.M df_1_M = df_1_M.sort_index() # Fun def fun(z, *params): A, B, C = z # Score df_2['S'] = df_2['X'] * A + df_2['Y'] * B + df_2['Z'] * C # Top score df_Sort = df_2.sort_values(['S', 'X', 'M'], ascending=[False, True, True]) df_O = df_Sort.set_index('O') M_Top = df_O[~df_O.index.duplicated(keep='first')].M M_Top = M_Top.sort_index() # Compare the top scoring row for each O to df_1 df_1_R = df_1_M.reindex(M_Top.index) # Nan T_N_T = M_Top == df_1_R # Record the results for the given values of A,B,C df_Res = pd.DataFrame({'it_is': T_N_T}) # is this row of df_1 the same as this row of M_Top? # p_hat = TP / (TP + FP) p_hat = df_Res.sum() / len(df_Res.index) return -p_hat[0] from scipy.optimize import differential_evolution # Bounds min_ = -2 max_ = 2 ran_ge = (min_, max_) bounds = [ran_ge,ran_ge,ran_ge] # Params params = (df_1, df_2) # DE DE = differential_evolution(fun,bounds,args=params) print('Function value at global maximum ',-DE.fun) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.