4
$\begingroup$

I want to make simple predictions with Keras and I'm not really sure if I am doing it right. My data looks like this:

col1,col2 1.68,237537 1.69,240104 1.70,244885 1.71,246196 1.72,246527 1.73,254588 1.74,255112 1.75,259035 1.76,267229 1.77,267314 1.78,268931 1.79,273497 1.80,273900 1.81,277132 1.82,278066 

Now, I want to predict col2 by col1 and this is how I'm doing it:

 df = pandas.read_csv('data.csv', usecols=[0, 1], header=None) X = df.iloc[:, :-1].values.astype(np.float64) y = df.iloc[:, -1:].values.astype(np.float64) scalarX, scalarY = MinMaxScaler(), MinMaxScaler() scalarX.fit(X) scalarY.fit(y.reshape(len(y),1)) X = scalarX.transform(X) y = scalarY.transform(y.reshape(len(y),1)) model = Sequential() model.add(Dense(4, input_dim=1, activation='relu')) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='linear')) model.compile(loss='mse', optimizer='adam') model.fit(x=X, y=y, epochs=3, verbose=1) for num in range(1, 21): Xnew = np.array([[float(Decimal('2.{}'.format(num)))]]) ynew = model.predict(Xnew) print("X=%s, Predicted=%s" % (Xnew[0], ynew[0])) 
$\endgroup$
10
  • $\begingroup$ How many instances do you have? $\endgroup$ Commented May 21, 2018 at 7:44
  • $\begingroup$ about 18'000'000 $\endgroup$ Commented May 21, 2018 at 7:47
  • $\begingroup$ And you want to predict an output which is stored int he second column using a single input variable. So the network will be doing a 1-to-1 mapping? $\endgroup$ Commented May 21, 2018 at 7:52
  • $\begingroup$ Whats your question exactly? $\endgroup$ Commented May 21, 2018 at 7:52
  • 1
    $\begingroup$ @MonteCristo, so you have data (hits) coming in ever 0.01 seconds, or whatever unit, and you want to predict the future hits? $\endgroup$ Commented May 21, 2018 at 8:04

1 Answer 1

5
$\begingroup$

What you are trying to do here is forecast the future values of a time series. This is a predictive problem and the future values will depend on a number of latent factors. I will assume all we have access to is historical data from the series as your question indicates.

If you want to predict a future value for the time series, you should not only use the current value as an input, but rather you should use a chunk of the historical data. Since you have 18,000,000 instances, this is a lot, you can make your network quite complex in order to capture some latent trends hidden inside your data which can help predict the future value. To predict a value at time $t$ we will use the $k$ previous values. This hyper-parameter needs to be effectively tuned.

Restructure the data

We will structure the data such that the features $X$ are the $k$ previous time measurements, and the output target $Y$ is the current time measurement. The one that is being estimated by the model.

k = 3 X, Y = [], [] for i in range(len(col1) - k): X.append(col2[i:i+k]) Y.append(col2[i+k]) X = np.asarray(X) Y = np.asarray(Y) 

Split your data

from sklearn.cross_validation import train_test_split x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33) 

Using the data in a Keras model

This is a simple Keras model which should work as a first iteration step. However, due to the small amount of data you provided us I cannot get any meaningful results after training.

import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Dropout, Flatten from keras.layers import Conv1D, MaxPooling1D, Reshape from keras.callbacks import ModelCheckpoint from keras.models import model_from_json from keras import backend as K x_train = x_train.reshape(len(x_train), k, ) x_test = x_test.reshape(len(x_test), k, ) input_shape = (k,) model = Sequential() model.add(Dense(32, activation='tanh', input_shape=input_shape)) model.add(Dense(32, activation='tanh')) model.add(Dense(1, activation='linear')) model.compile(loss=keras.losses.mean_squared_error, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy']) model.summary() epochs = 10 batch_size = 128 # Fit the model weights. history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) 
$\endgroup$
6
  • $\begingroup$ Thx, but how I do now the prediction: ValueError: Error when checking : expected dense_1_input to have shape (3,) but got array with shape (1,) $\endgroup$ Commented May 21, 2018 at 8:48
  • $\begingroup$ I got also a very high loss: Epoch 10/10 1218417/1218417 [==============================] - 16s 13us/step - loss: 3020611384714833.0000 - acc: 0.0000e+00 - val_loss: 3027083056611234.5000 - val_acc: 0.0000e+00 $\endgroup$ Commented May 21, 2018 at 8:54
  • $\begingroup$ @MonteCristo, you need to restructure your data as mentioned above. $\endgroup$ Commented May 21, 2018 at 9:13
  • $\begingroup$ @MonteCristo, Can you give us access to your data please. $\endgroup$ Commented May 21, 2018 at 9:13
  • $\begingroup$ like this? col1, col2 = [], [] for row in df.values: col1.append(row[0]) col2.append(row[1]) $\endgroup$ Commented May 21, 2018 at 9:14

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.