2

The following code transforms a given pandas column FEAT into a new, binary feature named STREAM. The program works as long as there are no NaN values in the original dataframe. If that is the case, the following exception occurs: ValueError: Length of values does not match length of index. I need to push the NaN values to the new column. Is it doable? Here is the code option that fails:

import pandas as pd import numpy as np data = { 'FEAT': [8, 15, 7, np.nan, 5, 2, 11, 15] } customer = pd.DataFrame(data) customer = pd.DataFrame(data, index=['June', 'Robert', 'Lily', 'David', 'Bob', 'Sally', 'Mia', 'Luis']) #create binary variable STREAM 0:mainstream 1:avantgarde stream_0 = [1, 3, 5, 8, 10, 12, 14] stream_1 = [2, 4, 6, 7, 9, 11, 13, 15] # convert FEAT to list_0 list_0 = customer['FEAT'].values.tolist() # create a list of length = len(customer) whose elements are: # 0 if the value of 'FEAT' is in stream_0 # 1 if the value of 'FEAT' is in stream_1 L = [] for i in list_0: if i in stream_0: L.append(0) elif i in stream_1: L.append(1) # convert the list to a new column of customer df customer['STREAM'] = L print(customer) 
1
  • 1
    what's value does it get in the else block, which is missing (i.e. not in stream_0 or stream_1) Commented May 13, 2020 at 15:41

1 Answer 1

2

The issue is you are missing an else block so when a value (like NaN) is in neither stream_0 nor stream_1 you do nothing which then causes L to have fewer elements than the number of rows in customer.

Looping here is unnecessary, np.select can handle the column creation. The default argument will handle the else block.

customer['STREAM'] = np.select([customer.FEAT.isin(stream_0), customer.FEAT.isin(stream_1)], [0, 1], default=np.NaN) FEAT STREAM June 8.0 0.0 Robert 15.0 1.0 Lily 7.0 1.0 David NaN NaN Bob 5.0 0.0 Sally 2.0 1.0 Mia 11.0 1.0 Luis 15.0 1.0 

You could also map the few values, everything not in either is NaN

d = {key: value for l,value in zip([stream_0, stream_1], [0,1]) for key in l} customer['STREAM'] = customer['FEAT'].map(d) 

The dict uses a comprehension to create the key value pairs. For every key in stream_0 we assign it a value of 0, for every key in stream_1 we assign a value of 1. The comprehension is a bit complicated, so a more easy to understand method which accomplishes the same would be to create each dictionary separately, then combine them.

d_1 = {k: 0 for k in stream_0} d_2 = {k: 1 for k in stream_1} d = {**d_1, **d_2} # Combine #{1: 0, 2: 1, 3: 0, 4: 1, 5: 0, 6: 1, 7: 1, # 8: 0, 9: 1, 10: 0, 11: 1, 12: 0, 13: 1, 14: 0, 15: 1} 
Sign up to request clarification or add additional context in comments.

4 Comments

the second solution WORKS. However, it is very cryptic. Can you explain a bit more or provide an easier-to-understand code. Thanks
@josephpareti I added some explanation, and also a more straight forward way to create the dictionary. It's a bit more typing but I think clearer. map uses that dict to transform the values
thank-you, I understand more the option with the 2 dictionaries, but I still do not understand d = {**d_1, **d_2}
@josephpareti See stackoverflow.com/questions/38987/…. In python 3.5+ it's one very concise way to merge dictionaries otherwise you can use dict.update

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.