I have a rather simple classification problem that I am trying to solve. Each instance in my problem is a list of 1024 bytes (each byte is represented by a digit between 0 and 255). There are 2 classes (say, Class 'A' and 'B'). In class 'A', all instances have a common feature. There exists a particular 2-byte pattern say: "200 180" that is present in high frequency among all instances in Class 'A'. However, in class 'B', this is not the case. In class 'B', the presence of the byte string '200 180' is more of an artefact of randomness rather than anything else.
I would like my CNN to differentiate between A and B classes without explicitly feeding it the byte pattern. I am using the following code to try to do that:
model = models.Sequential() model.add(layers.Conv1D(filters=1, kernel_size=2, activation='relu', input_shape=(1024, 1))) model.add(layers.MaxPooling1D(pool_size=64)) model.add(layers.Flatten()) model.add(layers.Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) When I use some simple heuristics to gauge the "distinguishability" between the two classes, I can see that it's possible to easily get more than 90% accuracy rates by simply looking for the frequency of the byte pattern. However, my above CNN fails miserably at doing this. For example, here is a 5-epoch training attempt result for my CNN:
Epoch 1/5 1563/1563 [==============================] - 65s 41ms/step - loss: 0.5297 - accuracy: 0.8468 Epoch 2/5 1563/1563 [==============================] - 61s 39ms/step - loss: 0.7534 - accuracy: 0.5000 Epoch 3/5 1563/1563 [==============================] - 61s 39ms/step - loss: 0.7100 - accuracy: 0.5029 Epoch 4/5 1563/1563 [==============================] - 61s 39ms/step - loss: 0.6977 - accuracy: 0.5216 Epoch 5/5 1563/1563 [==============================] - 62s 40ms/step - loss: 0.6934 - accuracy: 0.5303 As you can see, the result is more like a random guess than anything else. I wanted to know if you have any advice on how you think this task can be handled better.
Conv1D. It's very unusual in deep learning to use only one filter. $\endgroup$