Debugging a simple 1-D CNN for solving a simple classification problem

Question

I have a rather simple classification problem that I am trying to solve. Each instance in my problem is a list of 1024 bytes (each byte is represented by a digit between 0 and 255). There are 2 classes (say, Class 'A' and 'B'). In class 'A', all instances have a common feature. There exists a particular 2-byte pattern say: "200 180" that is present in high frequency among all instances in Class 'A'. However, in class 'B', this is not the case. In class 'B', the presence of the byte string '200 180' is more of an artefact of randomness rather than anything else.

I would like my CNN to differentiate between A and B classes without explicitly feeding it the byte pattern. I am using the following code to try to do that:

model = models.Sequential() model.add(layers.Conv1D(filters=1, kernel_size=2, activation='relu', input_shape=(1024, 1))) model.add(layers.MaxPooling1D(pool_size=64)) model.add(layers.Flatten()) model.add(layers.Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

When I use some simple heuristics to gauge the "distinguishability" between the two classes, I can see that it's possible to easily get more than 90% accuracy rates by simply looking for the frequency of the byte pattern. However, my above CNN fails miserably at doing this. For example, here is a 5-epoch training attempt result for my CNN:

Epoch 1/5 1563/1563 [==============================] - 65s 41ms/step - loss: 0.5297 - accuracy: 0.8468 Epoch 2/5 1563/1563 [==============================] - 61s 39ms/step - loss: 0.7534 - accuracy: 0.5000 Epoch 3/5 1563/1563 [==============================] - 61s 39ms/step - loss: 0.7100 - accuracy: 0.5029 Epoch 4/5 1563/1563 [==============================] - 61s 39ms/step - loss: 0.6977 - accuracy: 0.5216 Epoch 5/5 1563/1563 [==============================] - 62s 40ms/step - loss: 0.6934 - accuracy: 0.5303

As you can see, the result is more like a random guess than anything else. I wanted to know if you have any advice on how you think this task can be handled better.

Is there any similarity between the byte 280 and 279 ? If not you could try to represent your input not as a sequence of integers (1024, 1) but as a sequence of vectors (1024, 256), with each vector being full of zeros except for the byte position — mprouveur
– mprouveur, Commented Oct 12, 2020 at 14:06
I imagine you will see better performance by increasing the number of filters for Conv1D. It's very unusual in deep learning to use only one filter. — bogovicj
– bogovicj, Commented Oct 12, 2020 at 15:27
@mprouveur Very interesting suggestion! I will try this out some time. There is no difference between byte 280 and 279 - their values have no meaning. We had great success by introducing an embedding layer here (>99% accuracy). I would imagine this will have the same effect as well. — Phani
– Phani, Commented Oct 12, 2020 at 18:26
@Phani great ! I hadn't thought of the embedding layer but I believe this is mathematically equivalent but maybe the embedding is less memory intensive as you don't store a sparse vec but the integer itself. Please let us know if you do try it out and get similar results or not. — mprouveur
– mprouveur, Commented Oct 13, 2020 at 12:03

Rina · Accepted Answer · 2020-09-08 09:16:44Z

you can add dropout to the model It will change your model accuracy.

from keras.layers import Dropout model = models.Sequential() model.add(layers.Conv1D(filters=1, kernel_size=2, activation='relu', input_shape=(1024, 1))) model.add(layers.MaxPooling1D(pool_size=64)) model.add(layers.Flatten()) model.add(Dropout(0.5)) model.add(layers.Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Stack Exchange Network

Debugging a simple 1-D CNN for solving a simple classification problem

1 Answer 1

Hot Network Questions

Debugging a simple 1-D CNN for solving a simple classification problem

1 Answer 1

Related

Hot Network Questions