1
$\begingroup$

I'm beginner in python so please bare with me. I'm trying to solve one machine learning problem using GaussianNB. I've certain fields which are not in proper date format, so I converted it into UNIX format. For example column state_changed_at has value in csv as 1449619185. I'm converting it into proper date format.

Now the problem is, when I'm selecting those date features to train my model, it gives me an error:

Could not convert string to float: 'Thu Apr 16 23:58:58 2015'

import pandas as pd import numpy as np from sklearn import metrics from sklearn.naive_bayes import BernoulliNB from sklearn.naive_bayes import MultinomialNB import time from sklearn.naive_bayes import GaussianNB train = pd.read_csv("datasets/train2.csv") test = pd.read_csv("datasets/test.csv") train.head() import time # state_changed_at,deadline,created_at,launched_at are date time fields # and I'm converting it into unix format unix_cols = ['deadline','state_changed_at','launched_at','created_at'] for x in unix_cols: train[x] = train[x].apply(lambda k: time.ctime(k)) test[x] = test[x].apply(lambda k: time.ctime(k)) # state_changed_at,deadline,created_at,launched_at are date time fields. cols_to_use = ['keywords_len' ,'keywords_count','state_changed_at','deadline','created_at','launched_at'] target = train['final_status'] # data for modeling k_train = train[cols_to_use] k_test = test[cols_to_use] gnb = GaussianNB() model = MultinomialNB() model.fit(k_train, target) # this lines gives me error saying: could not convert string to float: 'Thu Apr 16 23:58:58 2015' expected = target predicted = model.predict(k_test) print(model.score(k_test, predicted, sample_weight=None)) 

Any help would be really appreciated. Thank you

$\endgroup$

1 Answer 1

2
$\begingroup$

To cast that column of your data-frame as type float try:

k_train = train['cols_to_use'].astype(float) target = train['final_status'].astype(float) 

More documentation can be found here or you can cast them when loading the csv file sep=',' maybe usful assuming that your data is separated by a , in your CSV file

train = pd.read_csv("datasets/train2.csv", dtype={'cols_to_use': float , 'final_status': float}) 

Note that converting unix TimeStamp to readable date is very simple with the datetime library example:

import datetime datetime.datetime.fromtimestamp(1526972723).strftime('%Y-%m-%d %H:%M:%S') 

I hope it's helpful.

$\endgroup$
2
  • $\begingroup$ Add few lines explaining what you did and why... $\endgroup$ Commented May 21, 2018 at 16:04
  • 1
    $\begingroup$ It's done just updated my answer :) $\endgroup$ Commented May 22, 2018 at 8:08

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.