Skip to main content

Questions tagged [softmax]

Normalizing exponential function which transforms a numeric vector such that all its entries become between 0 and 1 and together sum to 1. It is often used as the final layer of a neural network performing a classification task.

0 votes
0 answers
60 views

Background Consider the following recursive Bayesian classifier \begin{equation} p_{t}(c)=\frac{\ell(y_t\mid c)p_{t-1}(c)}{\sum_{\nu=1}^C \ell(y_t\mid \nu)p_{t-1}(\nu)}, \qquad c=1,\dots,C \tag{1} \...
matteogost's user avatar
3 votes
3 answers
294 views

The title is perhaps purposely provocative, but still reflects my ignorance. I am trying to understand carefully why, despite a very nice Bayesian interpretation, softmax might overfit, since I've ...
Chris's user avatar
  • 322
1 vote
0 answers
80 views

Normally, zero initialization is really frowned upon when training neural network for classification. But today, when I decided to implement softmax regression from scratch and play around with it a ...
Minh Ngo's user avatar
0 votes
0 answers
45 views

This question is in the same context as this one. Consider $n$ i.i.d. standard Gaussian random variables, denoted by $X_1, \ldots, X_n$, and I am trying to characterise the concentration of $$T_n = \...
smako's user avatar
  • 21
1 vote
0 answers
86 views

Consider $n$ i.i.d. standard Gaussian random variables, denoted by $X_1, \ldots, X_n$. I am looking to characterise the concentration of functions like $\sum_{i=1}^n X_i e^{-X_i/\tau}$ and $\sum_{i=1}^...
smako's user avatar
  • 21
0 votes
0 answers
56 views

I trained simple MNIST Handwritten Digit Classifier 10 classes. Where I have original test data and its corresponding 90 degree rotated version. It gave me expected result for number zero and one. ...
Muhammad Ikhwan Perwira's user avatar
1 vote
1 answer
231 views

Suppose I have 3 classes A,B,C. Performing: y_pred = model.predict(X) # suppose X only two sampel Returning vector with length ...
Muhammad Ikhwan Perwira's user avatar
1 vote
1 answer
93 views

In section 6.2.2.2 (equation 6.31) they state: Overall, unregularized maximum likelihood will drive the model to learn parameters that drive the softmax to predict the fraction of counts of each ...
Philipp's user avatar
  • 11
1 vote
0 answers
84 views

From what I understood, Maxout function works quite differently from ReLU. ReLU function is max(0, x), so the input x is (W_T x + b) Maxout function has many Ws, and it is max(W1_T x + b1, W2_T x + b2,...
kite's user avatar
  • 11
0 votes
0 answers
66 views

Background Let $y_1,y_2,\dots,y_K$ be a sequence of measurements. I've derived a likelihood $\mathcal{L}(y|i)$ to solve a classification problem via the Bayesian classifier \begin{equation} p_k(i)=\...
matteogost's user avatar
0 votes
1 answer
85 views

I just found out about the softmax function in machine learning. It creates even probabilistic distribution out of a vector of numbers, which means that all numbers up to 1. It sounds a lot like the ...
jcubic's user avatar
  • 103
0 votes
0 answers
58 views

Consider a dataset of $m$ training examples, $n$ features and $K$ classes. So we have a feature matrix $\mathbf{X} \in \mathbb{M}_{m, n}(\mathbb{R})$ and a weight matrix $\boldsymbol{\Theta} \in \...
Sagnik Taraphdar's user avatar
1 vote
0 answers
71 views

I have two algorithms that produce, for every observation, a vector of probabilities for 3 classes ...
Alessandro Bitetto's user avatar
1 vote
0 answers
138 views

I am reading this paper and it states that - "In what follows we retrieve the maximum/predicted class probability from a softmax distribution and thereby detect whether an example is erroneously ...
desert_ranger's user avatar
6 votes
1 answer
293 views

I am reading this paper, which has the following paragraph - "The gold standard for deep neural nets is to use the softmax operator to convert the continuous activations of the output layer to ...
desert_ranger's user avatar

15 30 50 per page
1
2 3 4 5
16