TL;DR: for your use case, jump to the last section
One ColumnTransformer, many Pipelines
This is what I was suggesting in the comments and I've advanced elsewhere on this site. We use a single ColumnTransformer, each of whose transformers is a Pipeline. There is one pipeline for each combination of preprocessing steps you would like to perform. This has the advantage of being able to specify columns by name for each transformer. Downsides include having lots of different copies of the scaler, so if you wanted to hyperparameter-tune something about a transformer you'd have to change it in many places; also, if you have a lot of different unique preprocessing step combinations, this would take a lot of code to specify, but there are some ways to partially mitigate that.
pipe_target_encode = Pipeline([ ("te", TargetEncoder()), ("sc", StandardScaler()), ]) pipe_impute = Pipeline([ ("imp", SimpleImputer()), ("sc", StandardScaler()), ]) ColumnTransformer([ ("target_enc", pipe_target_encode, ["c1", "c2", "c3"]), ("impute", pipe_impute, ["c4"]), ("scale", StandardScaler(), ["c5", "c6", "c7", "c8"]), ("ohe", OneHotEncoder(), ["c9", "c10"]), ])
One Pipeline, many ColumnTransformers
This one will be more readily possible when dataframes-out is accomplished, but if you can keep track of column ordering it can be done now.
target_enc = ColumnTransformer( [("target_enc", TargetEncoder(), [3])], # c4 remainder="passthrough", ) impute = ColumnTransformer( [("impute", SimpleImputer(), [1, 2, 3])], # c4 is now first; c1, c2, c3 remainder="passthrough", ) scale = ColumnTransformer( [("scale", StandardScaler(), [0, 1, 2, 3, 4, 5, 6, 7])], #c1-3 are first, then c4, then c5-8 remainder="passthrough", ) ohe = ColumnTransformer( [("ohe", OneHotEncoder(), [8, 9])], remainder="passthrough", ) pipe = Pipeline([ ("target_enc", target_enc), ("impute", impute), ("scale", scale), ("ohe", ohe), ])
The columns will be output in order 8-dummies, 9-dummies, 1, 2, 3, 4, 5, 6, 7. You could move the steps around to try to get the columns into the better order, but since OHE will produce an apriori-unknown number of columns, it might be tough to get the column indices right.
Hybrid
The best for this particular case, it's the cleanest and most semantically correct. Because your transformers operate in a hierarchical way, we can get away with all column specifications being strings; if we had to specify things in a ColumnTransformer after any sklearn step, we'd have input arrays and would have to resort to index specification as above (again, until pandas-out is a thing).
step1 = ColumnTransformer( [ ("target_enc", TargetEncoder(), ["c1", "c2", "c3"]), ("impute", SimpleImputer(), ["c4"]), ], remainder="passthrough", ) num_pipe = Pipeline([ ("prep", step1), ("scale", StandardScaler()) ]) preproc = ColumnTransformer([ ("num", num_pipe, ["c1", "c2", "c3", "c4", "c5", "c6", "c7", "c8"]), ("ohe", OneHotEncoder(), ["c9", "c10"]), ])
Run this snippet for the Hybrid approach's diagram, and see the overflow answer for diagrams of the other two approaches.
<style>#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c {color: black;background-color: white;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c pre{padding: 0;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-toggleable {background-color: white;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-estimator:hover {background-color: #d4ebff;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-item {z-index: 1;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-parallel-item:only-child::after {width: 0;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c div.sk-text-repr-fallback {display: none;}</style>
<div id="sk-5d6832b8-0986-4e65-92ae-040d8a1ce75c" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>ColumnTransformer(transformers=[('num', Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('target_enc', TargetEncoder(), ['c1', 'c2', 'c3']), ('impute', SimpleImputer(), ['c4'])])), ('scale', StandardScaler())]), ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8']), ('ohe', OneHotEncoder(), ['c9', 'c10'])])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="8b1cfddf-93ac-47a5-8d5f-46f4b91f1dbd" type="checkbox" ><label for="8b1cfddf-93ac-47a5-8d5f-46f4b91f1dbd" class="sk-toggleable__label sk-toggleable__label-arrow">ColumnTransformer</label><div class="sk-toggleable__content"><pre>ColumnTransformer(transformers=[('num', Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('target_enc', TargetEncoder(), ['c1', 'c2', 'c3']), ('impute', SimpleImputer(), ['c4'])])), ('scale', StandardScaler())]), ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8']), ('ohe', OneHotEncoder(), ['c9', 'c10'])])</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="b42df77f-364c-47b4-81dc-2591b43c0201" type="checkbox" checked ><label for="b42df77f-364c-47b4-81dc-2591b43c0201" class="sk-toggleable__label sk-toggleable__label-arrow">num</label><div class="sk-toggleable__content"><pre>['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8']</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-serial"><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="27c515f2-9c65-498d-ad8b-1e7942530ec9" type="checkbox" ><label for="27c515f2-9c65-498d-ad8b-1e7942530ec9" class="sk-toggleable__label sk-toggleable__label-arrow">prep: ColumnTransformer</label><div class="sk-toggleable__content"><pre>ColumnTransformer(remainder='passthrough', transformers=[('target_enc', TargetEncoder(), ['c1', 'c2', 'c3']), ('impute', SimpleImputer(), ['c4'])])</pre></div></div></div><div class="sk-parallel"><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="7d4f168c-5ce5-484d-bb05-ba4986ba97d5" type="checkbox" checked ><label for="7d4f168c-5ce5-484d-bb05-ba4986ba97d5" class="sk-toggleable__label sk-toggleable__label-arrow">target_enc</label><div class="sk-toggleable__content"><pre>['c1', 'c2', 'c3']</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="f645cf10-e9b2-49d8-8822-a37d9c67b337" type="checkbox" ><label for="f645cf10-e9b2-49d8-8822-a37d9c67b337" class="sk-toggleable__label sk-toggleable__label-arrow">TargetEncoder</label><div class="sk-toggleable__content"><pre>TargetEncoder()</pre></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="aac2d6a2-5b8c-4907-8bc9-4a6d99244755" type="checkbox" checked ><label for="aac2d6a2-5b8c-4907-8bc9-4a6d99244755" class="sk-toggleable__label sk-toggleable__label-arrow">impute</label><div class="sk-toggleable__content"><pre>['c4']</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="e085f2bd-c170-490d-a432-685977ff2741" type="checkbox" ><label for="e085f2bd-c170-490d-a432-685977ff2741" class="sk-toggleable__label sk-toggleable__label-arrow">SimpleImputer</label><div class="sk-toggleable__content"><pre>SimpleImputer()</pre></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="c9b092a5-aa9f-46e1-a7e3-5c97c4d6001e" type="checkbox" ><label for="c9b092a5-aa9f-46e1-a7e3-5c97c4d6001e" class="sk-toggleable__label sk-toggleable__label-arrow">remainder</label><div class="sk-toggleable__content"><pre></pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="b89a30e1-bcd2-4677-8071-99e6857ffb5f" type="checkbox" ><label for="b89a30e1-bcd2-4677-8071-99e6857ffb5f" class="sk-toggleable__label sk-toggleable__label-arrow">passthrough</label><div class="sk-toggleable__content"><pre>passthrough</pre></div></div></div></div></div></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="591ec259-41a7-403e-bc5d-4f45fa612895" type="checkbox" ><label for="591ec259-41a7-403e-bc5d-4f45fa612895" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div></div></div></div></div></div><div class="sk-parallel-item"><div class="sk-item"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="410b46ee-6c96-4ddd-84bc-73577e0c2538" type="checkbox" checked ><label for="410b46ee-6c96-4ddd-84bc-73577e0c2538" class="sk-toggleable__label sk-toggleable__label-arrow">ohe</label><div class="sk-toggleable__content"><pre>['c9', 'c10']</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="5c4492e0-ba44-446b-a82d-36caa3f1cccb" type="checkbox" ><label for="5c4492e0-ba44-446b-a82d-36caa3f1cccb" class="sk-toggleable__label sk-toggleable__label-arrow">OneHotEncoder</label><div class="sk-toggleable__content"><pre>OneHotEncoder()</pre></div></div></div></div></div></div></div></div></div></div>
ColumnTransformer. That won't preserve column order, butget_feature_names_outshould give you the ability to reorder the columns as desired.