An Example of ColumnTransformer might help you:
# FOREGOING TRANSFORMATIONS ON 'data' ... # filter data data = data[data['county'].isin(COUNTIES_OF_INTEREST)] # define the feature encoding of the data impute_and_one_hot_encode = Pipeline([ ('impute', SimpleImputer(strategy='most_frequent')), ('encode', OneHotEncoder(sparse=False, handle_unknown='ignore')) ]) featurisation = ColumnTransformer(transformers=[ ("impute_and_one_hot_encode", impute_and_one_hot_encode, ['smoker', 'county', 'race']), ('word2vec', MyW2VTransformer(min_count=2), ['last_name']), ('numeric', StandardScaler(), ['num_children', 'income']) ]) # define the training pipeline for the model neural_net = KerasClassifier(build_fn=create_model, epochs=10, batch_size=1, verbose=0, input_dim=109) pipeline = Pipeline([ ('features', featurisation), ('learner', neural_net)]) # train-test split train_data, test_data = train_test_split(data, random_state=0) # model training model = pipeline.fit(train_data, train_data['label'])
You can find the entire code under: https://github.com/stefan-grafberger/mlinspect/blob/19ca0d6ae8672249891835190c9e2d9d3c14f28f/example_pipelines/healthcare/healthcare.py