I have figured out the solution. Let $L^{(1)}, L^{(2)}$ denote the "inner" and "outer" losses respectively. In the example given in the question, we have $$ L(y, \hat{y}) = \begin{cases} L^{(1)}(y, \hat{y}) = (y-\hat{y})^2 & if |y-\hat{y}| \geq \delta \\ L^{(2)}(y,\hat{y}) = \delta^2 & else \end{cases} $$$$ L(y, \hat{y}) = \begin{cases} L^{(1)}(y, \hat{y}) = (y-\hat{y})^2 & if |y-\hat{y}| \leq \delta \\ L^{(2)}(y,\hat{y}) = \delta^2 & else \end{cases} $$ The trick here is as follows. Instead of viewing this thresholded loss as a piecewise function, we may think of it as \begin{align*} L(y,\hat{y}) &= \min\left( L^{(1)}(y,\hat{y}), L^{(2)}(y,\hat{y}) \right) \\ &= -\max\left( -L^{(1)}(y,\hat{y}), -L^{(2)}(y,\hat{y}) \right) \end{align*} We can then use various "soft-max" operations from the ML literature, for example log-sum-exp: $$ L(y,\hat{y}) \approx -\log\left(\exp(-L^{(1)}(y,\hat{y}) + \exp(-L^{(2)}(y,\hat{y})\right) $$ In the example above, a smooth proxy is then $$ L(y,\hat{y}) \approx -\log\left(e^{-(y-\hat{y})^2)} + e^{-\delta^2}\right) $$
I have figured out the solution. Let $L^{(1)}, L^{(2)}$ denote the "inner" and "outer" losses respectively. In the example given in the question, we have $$ L(y, \hat{y}) = \begin{cases} L^{(1)}(y, \hat{y}) = (y-\hat{y})^2 & if |y-\hat{y}| \geq \delta \\ L^{(2)}(y,\hat{y}) = \delta^2 & else \end{cases} $$ The trick here is as follows. Instead of viewing this thresholded loss as a piecewise function, we may think of it as \begin{align*} L(y,\hat{y}) &= \min\left( L^{(1)}(y,\hat{y}), L^{(2)}(y,\hat{y}) \right) \\ &= -\max\left( -L^{(1)}(y,\hat{y}), -L^{(2)}(y,\hat{y}) \right) \end{align*} We can then use various "soft-max" operations from the ML literature, for example log-sum-exp: $$ L(y,\hat{y}) \approx -\log\left(\exp(-L^{(1)}(y,\hat{y}) + \exp(-L^{(2)}(y,\hat{y})\right) $$ In the example above, a smooth proxy is then $$ L(y,\hat{y}) \approx -\log\left(e^{-(y-\hat{y})^2)} + e^{-\delta^2}\right) $$
I have figured out the solution. Let $L^{(1)}, L^{(2)}$ denote the "inner" and "outer" losses respectively. In the example given in the question, we have $$ L(y, \hat{y}) = \begin{cases} L^{(1)}(y, \hat{y}) = (y-\hat{y})^2 & if |y-\hat{y}| \leq \delta \\ L^{(2)}(y,\hat{y}) = \delta^2 & else \end{cases} $$ The trick here is as follows. Instead of viewing this thresholded loss as a piecewise function, we may think of it as \begin{align*} L(y,\hat{y}) &= \min\left( L^{(1)}(y,\hat{y}), L^{(2)}(y,\hat{y}) \right) \\ &= -\max\left( -L^{(1)}(y,\hat{y}), -L^{(2)}(y,\hat{y}) \right) \end{align*} We can then use various "soft-max" operations from the ML literature, for example log-sum-exp: $$ L(y,\hat{y}) \approx -\log\left(\exp(-L^{(1)}(y,\hat{y}) + \exp(-L^{(2)}(y,\hat{y})\right) $$ In the example above, a smooth proxy is then $$ L(y,\hat{y}) \approx -\log\left(e^{-(y-\hat{y})^2)} + e^{-\delta^2}\right) $$
I have figured out the solution. Let $L^{(1)}, L^{(2)}$ denote the "inner" and "outer" losses respectively. In the example given in the question, we have $$ L(y, \hat{y}) = \begin{cases} L^{(1)}(y, \hat{y}) = (y-\hat{y})^2 & if |y-\hat{y}| \geq \delta \\ L^{(2)}(y,\hat{y}) = \delta^2 & else \end{cases} $$ The trick here is as follows. Instead of viewing this thresholded loss as a piecewise function, we may think of it as \begin{align*} L(y,\hat{y}) &= \min\left( L^{(1)}(y,\hat{y}), L^{(2)}(y,\hat{y}) \right) \\ &= -\max\left( -L^{(1)}(y,\hat{y}), -L^{(2)}(y,\hat{y}) \right) \end{align*} We can then use various "soft-max" operations from the ML literature, for example log-sum-exp: $$ L(y,\hat{y}) \approx -\log\left(\exp(-L^{(1)}(y,\hat{y}) + \exp(-L^{(2)}(y,\hat{y})\right) $$ In the example above, a smooth proxy is then $$ L(y,\hat{y}) \approx -\log\left(e^{-(y-\hat{y})^2)} + e^{-\delta^2}\right) $$
