Typically, the subgradient is defined for convex functions. And convex functions are continuous.
However, DeepMind's VQ-VAE paper defines a model with a discontinuous vector quantization (VQ) layer, resulting in a discontinuous objective function. Still, the authors remark:
One could also use the subgradient through the quantisation operation, but this simple estimator worked well for the initial experiments in this paper.
Is there a more general definition of the subgradient that would make sense here?