I noticed this remark inTypically, the subgradient is defined for convex functions. And convex functions are continuous.
However, DeepMind's VQ-VAE paper defines a model with a discontinuous vector quantization (VQ) layer, resulting in a discontinuous objective function. Still, the authors remark:
One could also use the subgradient through the quantisation operation, but this simple estimator worked well for the initial experiments in this paper.
How would one useIs there a more general definition of the subgradient that would make sense here? The quantization operation is discontinuous. Isn't the subgradient of the objective function empty because of this?