I would like to implement Onehot encoding and label encoding to my dataset using Pipeline into my random forest model. I have created a function that utilize pipeline from scikit learn together with OneHotEncoder and LabelEncoder.
def create_pipeline(self, train_feature, train_label, encoding_method, model): if encoding_method == EncodingMethod.ONE_HOT: categorical_cols = [col for col in train_feature.columns if train_feature[col].dtype == 'object'] categorical_transformer = Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse=False))]) label_encoder = Pipeline(steps=[('label', LabelEncoder())]) preprocessor = ColumnTransformer(transformers=[('category', categorical_transformer, categorical_cols), ('label', label_encoder, [train_label.name])], remainder='passthrough') elif encoding_method == EncodingMethod.LABEL: categorical_cols = [col for col in train_feature.columns if train_feature[col].dtype == 'object'] categorical_transformer = Pipeline(steps=[('label', LabelEncoder())]) preprocessor = ColumnTransformer(transformers=[('category', categorical_transformer, categorical_cols), ('label', categorical_transformer, [train_label.name])], remainder='passthrough') pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', model)]) return pipeline Using the function above, I would pass into my model script (using iris dataset, code as below) and expect the y_train (species column) will be encoded as 0,1,2 etc but when I print the output, it is still categorical values.
partial script:
df = self._dataset.as_dataframe() train_feature = df[self._train_configs.feature_cols] train_label = df[self._train_configs.target_col] self._model = **self.create_pipeline(train_feature, train_label, self._train_configs.encoding_method, self._model)** print("\n") print("This is model") print(self._model) X_train, X_test, y_train, y_test = train_test_split(train_feature, train_label, random_state=0, train_size=0.8) print("\n") print("This is y_train") print(y_train) output of print y_train:
137 Iris-virginica 84 Iris-versicolor 27 Iris-setosa 127 Iris-virginica 132 Iris-virginica ... 9 Iris-setosa 103 Iris-virginica 67 Iris-versicolor 117 Iris-virginica 47 Iris-setosa