The Bayesian bootstrap was introduced by Rubin (1981) as a Bayesian analog of the original bootstrap. Given dataset $X=\{x_1, \dots, x_N\}$, instead of drawing weights $\pi_{n}$ from the discrete set $\left\{0, \frac{1}{N}, \ldots, \frac{N}{N}\right\},$ the Bayesian approach treats the vector of weights $\pi$ as unknown parameters and derives a posterior distribution for $\pi$. Rubin (1981) used an improper, non-informative prior, $\prod_{i=1}^{n} \pi_{i}^{-1},$ which when combined with the multinomial likelihood, leads to a Dirichlet(1,...,1) distribution for the posterior distribution of $\pi$. In other words, our prior is
\begin{equation} p(\boldsymbol{\pi}) =Dirichlet(\boldsymbol{\alpha}), \quad \text{with}\ \boldsymbol{\alpha} = [0,\dots,0]. \end{equation}
and posterior is
\begin{equation} p(\boldsymbol{\pi}|\boldsymbol{x}) =Dirichlet(\boldsymbol{\alpha}), \quad \text{with}\ \boldsymbol{\alpha} = [1,\dots,1]. \end{equation}
Now my questions are:
I was asked the following questions which I was not able to answer: How can you have a posterior distribution that a/ does not depend on data and b/ is a uniform distribution?
Are both the prior and the posterior non-informative? I understand that the posterior is a uniform distribution, which is non-informative. Also, I see that the prior is referred to as non-informative prior. Does that mean it's flat?
I believe that section 5 in Rubin (1981) addresses these questions, but I do not comprehend that discussion. Any clarification or pointing out what I may be misunderstanding would be appreciated.
EDIT: I just noticed one more issue when computing the posterior. Let $d=\left(d_{1}, \ldots, d_{K}\right)$ be the vector of all possible distinct values of $X$, and let $\pi=\left(\pi_{1}, \cdots, \pi_{K}\right)$ be the associated vector of probabilities $$ P\left(X=d_{i} \mid \pi\right)=\pi_{i}, \quad \sum \pi_{i}=1 $$ Let $x_{1}, \ldots, x_{n}$ be an i.i.d. sample from the equation above and let $n_{i}$ be the number of $x_{j}$ equal to $d_{i}$. If we use improper prior above over the sampling weights $\pi$, we can compute the posterior over $\pi$
\begin{align*} p(\boldsymbol{\pi}|X) &\propto p(X|\boldsymbol{\pi})p(\boldsymbol{\pi})\\ & \propto \prod_{i}\pi_i^{n_i}\prod_{i}\pi_{i}^{\alpha_i-1}\\ & \propto \prod_{i}\pi_i^{n_i}\prod_{i}\pi_{i}^{-1}\\ & \propto \prod_i\pi_i^{n_i-1}. \end{align*} How does this yield a flat Dirichlet posterior? Are we assuming $n_i=1$ for $i=1,\dots,K$? In that case, is the vector of all possible observations $d=\left(d_{1}, \ldots, d_{K}\right)$ (the original sample that we resample from) our observation?