1

I'm using the titanic dataset and have a created a series Famsize. I'd like to create a second series that outputs 'single' if famsize =1, 'small' if 1 < famsize < 5 and 'large' if famsize >=5.

 Famsize FamsizeDisc 1 single 2 small 5 large 

I've tried using np.where but as I have three outputs I haven't been able to find a solution.

Any suggestions?

1
  • 1
    do share what you've attempted so far. Commented Oct 5, 2017 at 11:00

2 Answers 2

2

Its called binning so use pd.cut i.e

df['new'] = pd.cut(df['Famsize'],bins=[0,1,4,np.inf],labels=['single','small','large']) 

Output:

 Famsize FamsizeDisc new 0 1 single single 1 2 small small 2 5 large large 
Sign up to request clarification or add additional context in comments.

Comments

1

Either you could create a function which does the mapping:

def get_sizeDisc(x): if x == 1: return 'single' elif x < 5: return 'small' elif x >= 5: return 'large' df['FamsizeDisc'] = df.Famsize.apply(get_sizeDisc) 

Or you could use .loc

df.loc[df.Famsize==1, 'FamsizeDisc'] = 'single' df.loc[df.Famsize.between(1,5, inclusive = False), 'FamsizeDisc'] = 'small' df.loc[df.Famsize>=5, 'FamsizeDisc'] = 'large' 

1 Comment

My bad, hadn't reloaded the page to see your answer. I'll remove it from my answer and upvote yours, as it's clearly the more concise solution :D

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.