0

Some features are numerical such as "graduation rate from school", while other features are categorical like the name of the school. I used a label encoder on the features that are categorical to transform them into integers.

I now have a dataframe with both floats and integers, representing numerical features and categorical features(transformed with label encoder) respectively.

I am unsure how to proceed with a learner, do I need to use one hot encoding? And if so, how can I do so? I cannot simply pass the dataframe to the sklearn OneHotEncoder since there are floats, according to my current understanding. Do I just apply the label encoder to all features to solve the issue?

Sample data from my dataframe. OPEID and opeid6 were transformed using a label encoder

2

1 Answer 1

0

Just use the OneHotEncoder categorical_features argument to select with features are categorical:

categorical_features: “all” or array of indices or mask :

Specify what features are treated as categorical.

  • ‘all’ (default): All features are treated as categorical.
  • array of indices: Array of categorical feature indices.
  • mask: Array of length n_features and with dtype=bool.

    Non-categorical features are always stacked to the right of the matrix.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.