Questions tagged [softmax]

Question 1

Background Consider the following recursive Bayesian classifier \begin{equation} p_{t}(c)=\frac{\ell(y_t\mid c)p_{t-1}(c)}{\sum_{\nu=1}^C \ell(y_t\mid \nu)p_{t-1}(\nu)}, \qquad c=1,\dots,C \tag{1} \...

Question 2

The title is perhaps purposely provocative, but still reflects my ignorance. I am trying to understand carefully why, despite a very nice Bayesian interpretation, softmax might overfit, since I've ...

Question 3

Normally, zero initialization is really frowned upon when training neural network for classification. But today, when I decided to implement softmax regression from scratch and play around with it a ...

Question 4

This question is in the same context as this one. Consider $n$ i.i.d. standard Gaussian random variables, denoted by $X_1, \ldots, X_n$, and I am trying to characterise the concentration of $$T_n = \...

Question 5

Consider $n$ i.i.d. standard Gaussian random variables, denoted by $X_1, \ldots, X_n$. I am looking to characterise the concentration of functions like $\sum_{i=1}^n X_i e^{-X_i/\tau}$ and $\sum_{i=1}^...

Question 6

I trained simple MNIST Handwritten Digit Classifier 10 classes. Where I have original test data and its corresponding 90 degree rotated version. It gave me expected result for number zero and one. ...

Question 7

Suppose I have 3 classes A,B,C. Performing: y_pred = model.predict(X) # suppose X only two sampel Returning vector with length ...

Question 8

In section 6.2.2.2 (equation 6.31) they state: Overall, unregularized maximum likelihood will drive the model to learn parameters that drive the softmax to predict the fraction of counts of each ...

Question 9

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...

Question 10

Background Let $y_1,y_2,\dots,y_K$ be a sequence of measurements. I've derived a likelihood $\mathcal{L}(y|i)$ to solve a classification problem via the Bayesian classifier \begin{equation} p_k(i)=\...

Question 11

I just found out about the softmax function in machine learning. It creates even probabilistic distribution out of a vector of numbers, which means that all numbers up to 1. It sounds a lot like the ...

Question 12

Consider a dataset of $m$ training examples, $n$ features and $K$ classes. So we have a feature matrix $\mathbf{X} \in \mathbb{M}_{m, n}(\mathbb{R})$ and a weight matrix $\boldsymbol{\Theta} \in \...

Question 13

I have two algorithms that produce, for every observation, a vector of probabilities for 3 classes ...

Question 14

I am reading this paper and it states that - "In what follows we retrieve the maximum/predicted class probability from a softmax distribution and thereby detect whether an example is erroneously ...

Question 15

I am reading this paper, which has the following paragraph - "The gold standard for deep neural nets is to use the softmax operator to convert the continuous activations of the output layer to ...

Stack Exchange Network

Questions tagged [softmax]

Dealing with an overconfident likelihood

How might softmax cause overfit in a neural model, even treated from a Bayesian perspective?

Why is zero initialization seems to give really good results with softmax regression?

Moment based tail bound for concentration of softmax transform of i.i.d. gaussians

Concentration of softmax transform of i.i.d. gaussians

Why does my uncertainty estimation not give the expected result?

How to compute confidence and uncertainity of model without ground truth from softmax output?

Reason for softmax approximation in Ian Goodfellow's deep learning book

Maxout activation function vs ReLU (Number of weights)

Flattening a likelihood

Is softmax the same as vector normalization in 3D graphics?

Is this the correct hypothesis of softmax regression?

Compare multiclass probabilities for two classifiers

How can the softmax distribution be used to detect out-of-distribution samples?

What's the relation between the output of a neural network and a Multinomial distribution?

Hot Network Questions