2

I'm a beginner to machine learning and scikit-learn so this might be a stupid question..

I'm trying to do something like this:

features = [['adam'], ['james'], ['amy']] labels = ['hello adam', 'hello james', 'hello amy'] clf = clf.fit(features, labels) print clf.predict(['john']) # This should give out 'hello john' 

Is this possible using scikit-learn?

Thanks in advance!

1 Answer 1

2

The principled way to solve this would be to do sequence to sequence learning which is a more complicated beast and outside of scikit-learn's scope.

With enough feature engineering and correct problem formulation you can still help a simpler algorithm like the ones in scikit learn achieve this task. There are two main difficulties that need to be tackled:

  • how to convert your features and your labels into a numeric representation (one-hot, embeddings, ...)
  • how to encode a variable length sequence into a fixed length vector that can be feed to scikit-learn algorithms (bag of word, mean pooling, rnn).
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.