I understand that to solve multilabel classification problems, we can use the softmax activation function in the output layer of the neural network. The softmax function outputs probabilities of each label, and the label with highest probability is then predicted as the target label. However, I just saw in a research paper that the authors used regression function instead of softmax function, in output layer. The paper says:
Because regression classification can automatically adjust classification thresholds based on data distribution to maximize classification performance
I do not understand how can the model learn classification thresholds by itself? Are these thresholds part of the neural network architecture? Are these thresholds trained like weights of layers?
This is the link of the paper: https://www.sciencedirect.com/science/article/abs/pii/S016816991931556X
