I am having an issue training my Naive Bayes Classifier. I have a feature set and targets that I want to use but I keep getting errors. I've had a look at other people who have similar problems but I can't seem to figure out the issue. I'm sure there's a simple solution but I'm yet to find it.
Here's an example of the structure of the data that I'm trying to use to train the classifier.
In [1] >> train[0] Out[1] ({ u'profici': [False], u'saver': [False], u'four': [True], u'protest': [False], u'asian': [True], u'upsid': [False], . . . u'captain': [False], u'payoff': [False], u'whose': [False] }, 0) Where train[0] is the first tuple in a list and contains:
A dictionary of features and boolean values to indicate the presence or absence of words in document[0]
The target label for the binary classification of document[0]
Obviously, the rest of the train list has the features and labels for the other documents that I want to classify.
When running the following code
from nltk.classify.scikitlearn import SklearnClassifier from sklearn.naive_bayes import MultinomialNB MNB_clf = SklearnClassifier(MultinomialNB()) MNB_clf.train(train) I get the error message:
TypeError: float() argument must be a string or a number Edit:
features are created here. From a dataframe post_sent that contains the posts in column 1 and the sentiment classification in column 2.
stopwords = set(stopwords.words('english')) tokenized = [] filtered_posts = [] punc_tokenizer = RegexpTokenizer(r'\w+') # tokenizing and removing stopwords for post in post_sent.post: tokenized = [word.lower() for word in. punc_tokenizer.tokenize(post)] filtered = ([w for w in tokenized if not w in stopwords]) filtered_posts.append(filtered) # stemming tokened_stemmed = [] for post in filtered_posts: stemmed = [] for w in post: stemmed.append(PorterStemmer().stem_word(w)) tokened_stemmed.append(stemmed) #frequency dist all_words =. list(itertools.chain.from_iterable(tokened_stemmed)) frequency = FreqDist(all_words) # Feature selection word_features = list(frequency.keys())[:3000] # IMPORTANT PART ####################### #------ featuresets creation --------- def find_features(list_of_posts): features = {} wrds = set(post) for w in word_features: features[w] = [w in wrds] return features # zipping inputs with targets words_and_sent = zip(tokened_stemmed, post_sent.sentiment) # IMPORTANT PART ########################## # feature sets created here featuresets = [(find_features(words), sentiment) for words, sentiment in words_and_sent]
[False]. Instead, they probably should directly be the boolean valuesTrue/False, without being wrapped in a list.AttributeError: 'list' object has no attribute 'iteritems'