Here is an example of how to get pandas and sklearn to play nice
say you have 2 columns that are both strings and you wish to vectorize - but you have no idea which vectorization params will result in the best downstream performance.
create the vectorizer
to_vect = Pipeline([('vect',CountVectorizer(min_df =1,max_df=.9,ngram_range=(1,2),max_features=1000)), ('tfidf', TfidfTransformer())]) create the DataFrameMapper obj.
full_mapper = DataFrameMapper([ ('col_name1', to_vect), ('col_name2',to_vect) ]) this is the full pipeline
full_pipeline = Pipeline([('mapper',full_mapper),('clf', SGDClassifier(n_iter=15, warm_start=True))]) define the params you want tothe scan to consider
full_params = {'clf__alpha': [1e-2,1e-3,1e-4], 'clf__loss':['modified_huber','hinge'], 'clf__penalty':['l2','l1'], 'mapper__features':[[('col_name1',deepcopy(to_vect)), ('col_name2',deepcopy(to_vect))], [('col_name1',deepcopy(to_vect).set_params(vect__analyzer= 'char_wb')), ('col_name2',deepcopy(to_vect))]]} Thats it! - note however that mapper_features are a single item in this dictionary - so use a for loop or itertools.product to generate a FLAT list of all to_vect options you wish to consider - but that is a separate task outside the scope of the question.
Go on to create the optimal classifier or whatever else your pipeline ends with
gs_clf = GridSearchCV(full_pipe, full_params, n_jobs=-1)