Say I want to implement Conv2D in keras and for each Conv2D layer, if I apply 20 filters of [2,3] filter on an input with depth of 10, then there will be 20*(2*3*10+1) = 1220 trainable weights.
the value of L1 norm would proportionally increase the more trainable weights there are. Similarly for L2 norm.
So shouldn't the lambda, as in kernel_regularizer=l1(lambda), be inversely be proportional to the number of trainable weights?
intuitively for me, if lambda of 0.1 worked for 10,000 weights, then applying the same or bigger lambda for 1 million weights doesn't make sense to me.