0

I am trying to generate a pipeline using sklearn, and am not really sure how to go about it. Here is a minimal example:

def numFeat(data): return data[['AGE', 'WASTGIRF']] def catFeat(data): return pd.get_dummies(data[['PAI', 'smokenow1']]) features = FeatureUnion([('f1',FunctionTransformer(numFeat)), ('f2',FunctionTransformer(catFeat)) ] ) pipeline = Pipeline( [('f', features), ('lm',LinearRegression())] ) data = pd.DataFrame({'AGE':[1,2,3,4], 'WASTGIRF': [23,5,43,1], 'PAI':['a','b','a','d'], 'smokenow1': ["lots", "some", "none", "some"]}) pipeline.fit(data, y) print pipeline.transform(data) 

In the above example, data is a Pandas DataFrame that contains the columns ['AGE', 'WASTGIRF', 'PAI', 'smokenow1'] among others.

Of course, in the FeatureUnion example, I want to supply many more transformation operations, but, all of them take a Pandas DataFrame and return another Pandas DataFrame. So in effect, I want to do something like this ...

data --+-->num features-->num transforms--+-->FeatureUnion-->model | | +-->cat features-->cat transforms--+ 

How do I go about doing this?

For the example above, the error i get is ...

TypeError: float() argument must be a string or a number 

1 Answer 1

1

You need to initialise FunctionTransformer with validate=False (IMO this is a bad default that should be changed):

features = FeatureUnion([('f1',FunctionTransformer(numFeat, validate=False)), ('f2',FunctionTransformer(catFeat, validate=False))] ) 

See also sklearn pipeline - how to apply different transformations on different columns

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.