Learn from given data and apply it on new data

Question

I'm a beginner to machine learning and scikit-learn so this might be a stupid question..

I'm trying to do something like this:

features = [['adam'], ['james'], ['amy']] labels = ['hello adam', 'hello james', 'hello amy'] clf = clf.fit(features, labels) print clf.predict(['john']) # This should give out 'hello john'

Is this possible using scikit-learn?

Thanks in advance!

elyase · Accepted Answer · 2017-03-15 21:28:39Z

The principled way to solve this would be to do sequence to sequence learning which is a more complicated beast and outside of scikit-learn's scope.

With enough feature engineering and correct problem formulation you can still help a simpler algorithm like the ones in scikit learn achieve this task. There are two main difficulties that need to be tackled:

how to convert your features and your labels into a numeric representation (one-hot, embeddings, ...)
how to encode a variable length sequence into a fixed length vector that can be feed to scikit-learn algorithms (bag of word, mean pooling, rnn).

Collectives™ on Stack Overflow

Learn from given data and apply it on new data

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related