GausianNB: Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

Question

I'm beginner in python so please bare with me. I'm trying to solve one machine learning problem using GaussianNB. I've certain fields which are not in proper date format, so I converted it into UNIX format. For example column state_changed_at has value in csv as 1449619185. I'm converting it into proper date format.

Now the problem is, when I'm selecting those date features to train my model, it gives me an error:

Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

import pandas as pd import numpy as np from sklearn import metrics from sklearn.naive_bayes import BernoulliNB from sklearn.naive_bayes import MultinomialNB import time from sklearn.naive_bayes import GaussianNB train = pd.read_csv("datasets/train2.csv") test = pd.read_csv("datasets/test.csv") train.head() import time # state_changed_at,deadline,created_at,launched_at are date time fields # and I'm converting it into unix format unix_cols = ['deadline','state_changed_at','launched_at','created_at'] for x in unix_cols: train[x] = train[x].apply(lambda k: time.ctime(k)) test[x] = test[x].apply(lambda k: time.ctime(k)) # state_changed_at,deadline,created_at,launched_at are date time fields. cols_to_use = ['keywords_len' ,'keywords_count','state_changed_at','deadline','created_at','launched_at'] target = train['final_status'] # data for modeling k_train = train[cols_to_use] k_test = test[cols_to_use] gnb = GaussianNB() model = MultinomialNB() model.fit(k_train, target) # this lines gives me error saying: could not convert string to float: 'Thu Apr 16 23:58:58 2015' expected = target predicted = model.predict(k_test) print(model.score(k_test, predicted, sample_weight=None))

Any help would be really appreciated. Thank you

Tarek · Accepted Answer · 2018-05-22 08:07:46Z

To cast that column of your data-frame as type float try:

k_train = train['cols_to_use'].astype(float) target = train['final_status'].astype(float)

More documentation can be found here or you can cast them when loading the csv file sep=',' maybe usful assuming that your data is separated by a , in your CSV file

train = pd.read_csv("datasets/train2.csv", dtype={'cols_to_use': float , 'final_status': float})

Note that converting unix TimeStamp to readable date is very simple with the datetime library example:

import datetime datetime.datetime.fromtimestamp(1526972723).strftime('%Y-%m-%d %H:%M:%S')

I hope it's helpful.

$\begingroup$ Add few lines explaining what you did and why... $\endgroup$

Aditya
– Aditya

2018-05-21 16:04:17 +00:00
Commented May 21, 2018 at 16:04 — Aditya
– Aditya, Commented May 21, 2018 at 16:04
$\begingroup$ It's done just updated my answer :) $\endgroup$

Tarek
– Tarek

2018-05-22 08:08:26 +00:00
Commented May 22, 2018 at 8:08 — Tarek
– Tarek, Commented May 22, 2018 at 8:08

Stack Exchange Network

GausianNB: Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

1 Answer 1

Hot Network Questions

GausianNB: Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

1 Answer 1

Related

Hot Network Questions