Questions tagged [softmax]
Normalizing exponential function which transforms a numeric vector such that all its entries become between 0 and 1 and together sum to 1. It is often used as the final layer of a neural network performing a classification task.
235 questions
0 votes
0 answers
60 views
Dealing with an overconfident likelihood
Background Consider the following recursive Bayesian classifier \begin{equation} p_{t}(c)=\frac{\ell(y_t\mid c)p_{t-1}(c)}{\sum_{\nu=1}^C \ell(y_t\mid \nu)p_{t-1}(\nu)}, \qquad c=1,\dots,C \tag{1} \...
3 votes
3 answers
294 views
How might softmax cause overfit in a neural model, even treated from a Bayesian perspective?
The title is perhaps purposely provocative, but still reflects my ignorance. I am trying to understand carefully why, despite a very nice Bayesian interpretation, softmax might overfit, since I've ...
1 vote
0 answers
80 views
Why is zero initialization seems to give really good results with softmax regression?
Normally, zero initialization is really frowned upon when training neural network for classification. But today, when I decided to implement softmax regression from scratch and play around with it a ...
0 votes
0 answers
45 views
Moment based tail bound for concentration of softmax transform of i.i.d. gaussians
This question is in the same context as this one. Consider $n$ i.i.d. standard Gaussian random variables, denoted by $X_1, \ldots, X_n$, and I am trying to characterise the concentration of $$T_n = \...
1 vote
0 answers
86 views
Concentration of softmax transform of i.i.d. gaussians
Consider $n$ i.i.d. standard Gaussian random variables, denoted by $X_1, \ldots, X_n$. I am looking to characterise the concentration of functions like $\sum_{i=1}^n X_i e^{-X_i/\tau}$ and $\sum_{i=1}^...
0 votes
0 answers
56 views
Why does my uncertainty estimation not give the expected result?
I trained simple MNIST Handwritten Digit Classifier 10 classes. Where I have original test data and its corresponding 90 degree rotated version. It gave me expected result for number zero and one. ...
1 vote
1 answer
231 views
How to compute confidence and uncertainity of model without ground truth from softmax output?
Suppose I have 3 classes A,B,C. Performing: y_pred = model.predict(X) # suppose X only two sampel Returning vector with length ...
1 vote
1 answer
93 views
Reason for softmax approximation in Ian Goodfellow's deep learning book
In section 6.2.2.2 (equation 6.31) they state: Overall, unregularized maximum likelihood will drive the model to learn parameters that drive the softmax to predict the fraction of counts of each ...
1 vote
0 answers
84 views
Maxout activation function vs ReLU (Number of weights)
From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
0 votes
0 answers
66 views
Flattening a likelihood
Background Let $y_1,y_2,\dots,y_K$ be a sequence of measurements. I've derived a likelihood $\mathcal{L}(y|i)$ to solve a classification problem via the Bayesian classifier \begin{equation} p_k(i)=\...
0 votes
1 answer
85 views
Is softmax the same as vector normalization in 3D graphics?
I just found out about the softmax function in machine learning. It creates even probabilistic distribution out of a vector of numbers, which means that all numbers up to 1. It sounds a lot like the ...
0 votes
0 answers
58 views
Is this the correct hypothesis of softmax regression?
Consider a dataset of $m$ training examples, $n$ features and $K$ classes. So we have a feature matrix $\mathbf{X} \in \mathbb{M}_{m, n}(\mathbb{R})$ and a weight matrix $\boldsymbol{\Theta} \in \...
1 vote
0 answers
71 views
Compare multiclass probabilities for two classifiers
I have two algorithms that produce, for every observation, a vector of probabilities for 3 classes ...
1 vote
0 answers
138 views
How can the softmax distribution be used to detect out-of-distribution samples?
I am reading this paper and it states that - "In what follows we retrieve the maximum/predicted class probability from a softmax distribution and thereby detect whether an example is erroneously ...
6 votes
1 answer
293 views
What's the relation between the output of a neural network and a Multinomial distribution?
I am reading this paper, which has the following paragraph - "The gold standard for deep neural nets is to use the softmax operator to convert the continuous activations of the output layer to ...