1

I have read this link: Check which columns in DataFrame are Categorical

I have a dataframe where salaries are mentioned with a $ prepended to it. It is also being shown as categorical data.

Moreover suppose my nominal data is not in form of strings such as 'F','M' etc. Then how do we classify which columns are numeric, categorical (with strings) and nominal?

Say my data looks like this:

ID Gender Salary HasPet 1 M $250 0 2 F $5000 0 3 M $4500 1 
7
  • Can you add Minimal, Complete, and Verifiable example? Commented Apr 24, 2016 at 11:45
  • Thank you for adding sample. What is desired output? Commented Apr 24, 2016 at 11:55
  • @jezrael I want to know which columns are numeric,categorical(with strings) and nominal data.In the link given, they have found the numeric data. But what in case of salary, due to the $ sign, it is being shown as non numeric and hence tagged as categorical data Commented Apr 24, 2016 at 12:05
  • 1
    Suppose your DataFrame is df, how about df.dtypes or df.info()? Commented Apr 24, 2016 at 12:30
  • for salary it will give me object I guess. And for nominal data having 0-1 it will show me int64 Commented Apr 24, 2016 at 12:41

1 Answer 1

5

You are confusing categorical data type with strings (pandas shows it as object).

Numbers can't contain $ dollar sign by their nature and because of that pandas consider Salary column as string and this is correct behavior!

You can easily convert your salary column to integer/float if you want though:

In [180]: df Out[180]: Gender Salary 0 F $3283 1 M $6958 2 F $3721 3 F $7732 4 M $7198 5 F $5475 6 F $7410 7 M $8673 8 F $8582 9 M $4115 10 F $8658 11 F $6331 12 M $6174 13 F $6261 14 M $6212 In [181]: df.dtypes Out[181]: Gender object Salary object dtype: object 

let's remove leading $ and convert Salary to int:

In [182]: df.Salary = df.Salary.str.lstrip('$').astype(int) In [183]: df.dtypes Out[183]: Gender object Salary int32 dtype: object 

and your Gender column to categorical:

In [186]: df.Gender = df.Gender.astype('category') In [187]: df.dtypes Out[187]: Gender category Salary int32 dtype: object 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.