I am trying to write my own convolutional neural network from scratch (Python) and after reading several articles and watching tutorials (on CNN) there are still a couple of issues that I am unable to understand and will appreciate it very much if someone could help clarify.
I understand the idea of the filter/kernel (and implemented it successfully) and have a trainable ANN that I wrote and works. I also have a working maxpool.
What is unclear to me is the conversion of the k-kernels (filters) with c-channels (e.g. RGB) to the eventual reduced h * w neurons for the so-called fully connected network in the end.
If I start with an h*w image, with 3 channels, and let's say for simplicity that I use 10 kernels, that would mean that I have a tensor of rank 4 weights on the convolution layer (h,w,c,k), but in all sources I could find, after flattening, all the kernels and channels are gone and only the reduced w and h remain (reduced if there is no padding and after the maxpooling).
So after this long (and hopefully clear) exposition, I don't understand what is being done with all the data from the different channels and kernels. I saw different codes, where some people apply the filter on all channels equally, then sum, but this seems like it would lose the color information. Is that indeed the solution? Are all the results of the filters and channels added together before being passed to the next layer? if so, is this done before ReLU or is ReLU applied to each separately, and then they are all summed?