96 questions
2 votes
1 answer
55 views
How to fit scaler for different subsets of rows depending on group variable and include it in a Pipeline?
I have a data set like the following and want to scale the data using any of the scalers in sklearn.preprocessing. Is there an easy way to fit this scaler not over the whole data set, but per group? ...
1 vote
1 answer
624 views
Pipeline FutureWarning: This Pipeline instance is not fitted yet [closed]
I am working on a fairly simple machine learning problem in the form of a practicum. I am using the following code to preprocess the data: from preprocess.date_converter import DateConverter from ...
1 vote
1 answer
199 views
Pass parameters to custom transformer in sklearn
I am trying to pass a parameter DummyTransformer__feature_index_sec to my sklearn custom transformer via a pipeline. It seems like I need to implement metadata routing in order to do this. However, I ...
-2 votes
1 answer
64 views
Error in Pipeline code in ScikitLearn using Python
In below code of pipeline. Even though i have encoded the sex column, i am getting string to float error. from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from ...
0 votes
1 answer
53 views
Does the pipeline approach with StandardScaler generalize to tree-based ensembles or neural networks?
I’m using a Pipeline in scikit-learn to combine feature scaling with a classifier. This works well for logistic regression, but I’m curious if this approach would generalize effectively to more ...
1 vote
1 answer
115 views
How to pass parameters to this sklearn Cox model in a Pipeline?
If I run the following Python code it works well: target = 'churn' tranOH = ColumnTransformer([ ('one', OneHotEncoder(drop='first', dtype='int'), make_column_selector(dtype_include='category', ...
0 votes
0 answers
82 views
Pipeline does not apply the functions when I add a scaler
I am trying to deploy the model as a .pkl file. When making the pipeline, i am facing some problems. Here is the code that causes no trouble: from sklearn.pipeline import FunctionTransformer, ...
0 votes
0 answers
42 views
How to implement pipeline into machine learning model
I would like to implement Onehot encoding and label encoding to my dataset using Pipeline into my random forest model. I have created a function that utilize pipeline from scikit learn together with ...
0 votes
0 answers
68 views
Sklearn transformer output returns more columns with some columns not having the transformation
I am building a scikit-learn pipeline. I downloaded a dataset from an online ML repository and generated descriptive stats for it. I am using the processed.cleveland.data dataset found here: https://...
0 votes
0 answers
207 views
Combining sequential feature selection with column transformer
I am trying to implement a pipeline with sklearn combining a column transformer for numeric and categorical data and sequential feature selection. The issue is when doing the complete pipeline it gets ...
0 votes
1 answer
384 views
Dropping a column in sklearn Pipeline after using it to create new features
I have an example data, where one column contains string values (e.g "34 12"). I created two new columns during the preprocessing step, storing the right and left integers of the string ...
1 vote
1 answer
122 views
What methods of its last estimator does a scikit-learn pipeline have?
I'm trying to understand scikit-learn Pipelines. According to a Note in the scikit user guide a Pipeline "has all the methods that the last estimator in the pipeline has". So I wrote my own ...
-1 votes
1 answer
222 views
Why is my double underscore notation not working with nested pipelines in scikit-learn? [closed]
I'm trying to build a pipeline that contains a pre-processing transformer (it simply removes columns from the data) and an LDA classifier. I wanted to tweak hyperparameters for each, and from looking ...
2 votes
3 answers
806 views
How can I use sklearn's make_column_selector to select all valid datetime columns?
I want to select columns based on their datetime data types. My DataFrame has for example columns with types np.dtype('datetime64[ns]'), np.datetime64 and 'datetime64[ns, UTC]'. Is there a generic way ...
0 votes
1 answer
347 views
Value error using scikit-learn transformers
I am having trouble with a piece of code I am writing. Specifically a pipeline. The data is a simple numerical dataframe (firewall logs) which is being split in X_train and X_test very commonly. After ...