3
$\begingroup$

I try to understand parameter estimation and learning problems at Graphical Models, especially in directed ones (Bayesian Networks). But first of all, I try to understand what exactly a parameter means in a Bayesian Network. The separation between a "variable" and a "parameter" becomes blurry when describing learning problems especially in Bayesian Learning, where the parameters are also variables.

So generally, we have a stochastic process, which is represented by the graph $G$. This graph consists of $S$ random variables, $X_1,X_2,...,X_S$ and the probability distribution over these variables is $P(X_1,X_2,...,X_S)=\prod_{i=1}^{S} P(X_i|parents(X_i))$. Now, where do the parameters fit here? From what I have understood so far:

  1. By the $P(X_1,X_2,...,X_S)=\prod_{i=1}^{S} P(X_i|parents(X_i))$ equation, we implicitly assume that a parameter set $\theta$ is already given and we actually mean $P(X_1,X_2,...,X_S|\theta_{1},\theta_{2},...,\theta_{S})=\prod_{i=1}^{S} P(X_i|parents(X_i),\theta_{i})$. If we run the process $G$, $N$ times in a i.i.d. fashion, then it is either: There is a single correct set for $\theta$ and each time we run the process, the variables are instantiated according to this one and only true $\theta$ (maximum likelihood view) or $\theta$ has a prior distribution of $P(\theta)$ and the "nature" draws a $\theta$ a priori and then our all $N$ previous and any future samples are generated with this $\theta$ (Bayesian view). Therefore, we integrate over all possible $\theta$ when making inference about a new sample. Is this thought pattern correct?
  2. If we think of this as a generative process, then can we say that for a variable $X_i$, an instantiation of its parents, $parents(X_{i})=\pi$ selects a subset of $\theta_{i}$ as $\theta_{i}^{\pi}$ and then generates $X_{i}$ according to the distribution $P(X_{i}|parents(X_{i})=\pi , \theta_{i}^{\pi})$? I think of this subset thing because $X_{i}$ has different distributions conditioned on different values of its parents and each of these distributions can have different parameters (or shared ones of course) among $\theta_{i}$. Again, is this correct or am I missing or misunderstood something completely?
  3. In the graphical model, we add for each $X_{i}$ a new node $\theta_{i}$ which is a new parent of $X_{i}$ and does not have any parents for itself. (There can be shared parameters among random variables, as well.) So, for each sample of the process $G$, we have a new directed graph $G_{n}$ whose nodes are connected to the corresponding $\theta_{i}$ nodes and given all $\theta$ nodes, these $N$ samples are independent from each other. (Of course this is from a theoretical point of view, this "plate notation" thingy is used for real applications as far as I know.) Finally, is this correct or wrong?

What I aim with this question is to verify or invalidate my current understanding of the "parameter" concept in Graphical Models. I need a solid understanding since I need to cope with advanced concepts like learning with the EM Algorithm, etc.

Thanks in advance.

$\endgroup$
2
  • 1
    $\begingroup$ Selam Ufuk, 1) and 2) sound correct. In 3), I did not quite get what you mean by "So, for each sample of the process G, we have a new directed graph Gn whose nodes are connected to the corresponding θi nodes and given all θ nodes, these N samples are independent from each other." $\endgroup$ Commented Dec 2, 2013 at 9:11
  • $\begingroup$ Selamlar Zhubarb, both for you and @Daniel, what I meant in 3) was an understanding of how a Bayesian Network can be parameterized in a general way. For example if have a model like: Plate Model in which $A$ generates different $B$s and each $B$ then generates a series of $C$ and $D$s, which is a nested plate model, how can we determine the parameters? I have an intuition for a process which is repeated N times for example, but in the link, there are nested repetations in the model itself, as well. This is where I get confused. $\endgroup$ Commented Dec 2, 2013 at 11:13

1 Answer 1

1
$\begingroup$
  1. Yes: $$p(\theta| X) \propto p(X | \theta) p(\theta)$$
  2. Yes.
  3. You can create any model that you want. The question is that, is that a good model for the process you model or not?
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.