Here is an example of how to get pandas and sklearn to play nice
say you have 2 columns that are both strings and you wish to vectorize - but you have no idea which vectorization params will result in the best downstream performance.
create the vectorizer
to_vect = Pipeline([('vect',CountVectorizer(min_df =1,max_df=.9,ngram_range=(1,2),max_features=1000)), ('tfidf', TfidfTransformer())]) create the DataFrameMapper obj.
full_mapper = DataFrameMapper([ ('col_name1', to_vect), ('col_name2',to_vect), ('col_name3',None) ]) this is the full pipeline
full_pipeline = Pipeline([('mapper',full_mapper),('clf', SGDClassifier(n_iter=15, warm_start=True))]) define the params you want to scan consider
full_params = {'clf__alpha': [1e-2,1e-3,1e-4], 'clf__loss':['modified_huber','hinge'], 'clf__penalty':['l2','l1'], 'mapper__features':[[('cell',deepcopy(to_vect)), ('fname_str',deepcopy(to_vect))], [('cell',deepcopy(to_vect).set_params(vect__analyzer= 'char_wb')), ('fname_str',deepcopy(to_vect))]]} Thats it! - note however that mapper_features are a single item in this dictionary - so use a for loop or itertools.product to generate a FLAT list of all to_vect options you wish to consider - but that is separate sklearn,pandas decoupled task.
Go on to create the optimal classifier or whatever else your pipeline ends with
gs_clf = GridSearchCV(full_pipe, full_params, n_jobs=-1)