Understanding conditional expectation using measure-theoretical definition

Question

Definition: For a random variables $X\in\mathbb R^{d_1}$ and $Y\in\mathbb R^{d_2}$, we define a conditional expectation of $X$ given $Y$ by any random variable $Z$ satisfying:

there exists $g:\mathbb R^{d_2}\rightarrow\mathbb R^{d_1}$ such that $Z=g(Y)$ and
$\mathbb E\left[Z\unicode{x1D7D9}_{\{Y\in A\}}\right]=\mathbb E\left[X\unicode{x1D7D9}_{\{Y\in A\}}\right]$ for all $A\subseteq \mathbb R^{d_2}$

To be honest I don't understand the definition. Like

the reason for requiring $\mathbb E[X|Y]$ to be a function of $Y$
Why $\mathbb E\left[Z\unicode{x1D7D9}_{\{Y\in A\}}\right]=\mathbb E\left[X\unicode{x1D7D9}_{\{Y\in A\}}\right]$ needed for all $A\subseteq \mathbb R^{d_2}?$

Here is one example they mentioned:

$\Omega=[-1,1]$ and $\mathbb P$ is uniform distribution. Define $$\begin{align}X(\omega)&=-\frac12+\unicode{x1D7D9}_{\{\omega\in[-1,-1/2]\cup[0,1/2]\}}+2\unicode{x1D7D9}_{\{\omega\in[-1/2,0]\}}\\Y(\omega)&=\unicode{x1D7D9}_{\{\omega\geq0\}}\\Z(\omega)&=1-Y(\omega)\end{align}$$ Then $\mathbb E[X|Y]=Z$ and $\mathbb P(X=Z)=0$

I didn't get how to compute conditional expectation using the above definition.

Here is another definition from A First Look at Rigorous Probability Theory, by Jeffrey S. Rosenthal

Definition: If $Y$ is a random variable, and if we define $v $ by $v(S)=\mathbb P(Y\in S|B)=\mathbb P(Y\in S,B)/P(B)$, then $v=\mathcal L(Y|B)$ is a probability measure, called the conditional distribution of $Y$ given $B$. $\mathcal L(Y\unicode{x1D7D9}_{B})=\mathbb P(B)\mathcal L(Y|B)+P(B^c)\delta_0$, so taking expectations and re-arranging, $$\mathbb E(Y|B)=\mathbb E(Y\unicode{x1D7D9}_{B})/\mathbb P(B)$$

Here also I can't understand the role of $v$ and how it creates similar thing with above definition.

The Borel-Kolmogorov paradox provides some motivation for defining conditional expectation in this way (or at least shows how the more naive definition attempts are problematic). — Karl
– Karl, Commented Aug 5, 2021 at 18:44

user6247850 · Accepted Answer · 2021-08-05 17:31:48Z

The problem with using $\mathbb E[X|Y=y]=\frac{\mathbb E[X\unicode{x1D7D9}_{Y=y}]}{\mathbb P(Y=y)}$ is that $\mathbb{P}(Y=y)$ may be $0$ for all $y$, for example if $Y$ is a normally distributed random variable.

We require $\mathbb{E}[X|Y]$ to be a function of $Y$ because we want to capture the idea that knowing $Y$ should be enough to compute $\mathbb{E}[X|Y]$, i.e. $\mathbb{E}[X|Y]$ depends only on the value of $Y$.

The condition $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = \mathbb{E}[X\unicode{x1D7D9}_{Y \in A}]$ for all $A \subset \mathbb{R}^{d_2}$ (typically the definition is that $A$ is a Borel measurable subset, but that's not too important here) is sort of the generalization of $\mathbb E[X|Y=y]=\frac{\mathbb E[X\unicode{x1D7D9}_{Y=y}]}{\mathbb P(Y=y)}$. If we had that $\mathbb{P}(Y=y) > 0$, then we could set $A = \{y\}$ so that $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = g(y) \mathbb{P}(Y=y)$ and the condition $\mathbb{E}[Z\unicode{x1D7D9}_{Y \in A}] = \mathbb{E}[X\unicode{x1D7D9}_{Y \in A}]$ would become \begin{align}g(y)\mathbb{P}(Y=y) &= \mathbb{E}[X\unicode{x1D7D9}_{Y =y}] \notag \\ g(y) &=\frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =y}]}{\mathbb{P}(Y=y)},\end{align} so $\mathbb E[X|Y=y]$ would be defined the way you suggested. This property agrees with your definition when $\mathbb{P}(Y=y) > 0$, but still works for continuous random variables where $\mathbb{P}(Y=y) = 0$ for all $y$.

For the example given in the post, we have that $\mathbb{P}(Y = 1) = \mathbb{P}(Y=0) = \frac 12$, so we only need to find $g(0)$ and $g(1)$. Using the above equation for $g(y)$, we compute \begin{align*} g(0) &= \frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =0}]}{\mathbb{P}(Y=0)} = 2 \int_{-1}^0 X(\omega) d \mathbb{P}(\omega) = \int_{-1}^0 X(\omega)d\omega = 1 \\ g(1) &= \frac{\mathbb{E}[X\unicode{x1D7D9}_{Y =1}]}{\mathbb{P}(Y=1)} = 2 \int_{0}^1 X(\omega) d \mathbb{P}(\omega) = \int_{0}^1 X(\omega) d\omega = 0, \end{align*} so $$\mathbb{E}[X|Y] = g(Y) = 1-Y.$$

I update my question body @user6247850. Thanks(+1) for your response. I still confused why different author define it differently. I failed to create connection between them. Could you just me any book/article where I can clear my idea and could able to understand those thing by my own? Like what's happening under the hood if the definition change. — WhyMeasureTheory
– WhyMeasureTheory, Commented Aug 5, 2021 at 16:37
The second definition you gave is for a conditional probability measure given a fixed subset $B$ rather than for computing a conditional expectation given a random variable. I don't know if there is anywhere that would directly compare those definitions. The book I would recommend for conditional expectation is Williams' Probability with Martingales. — user6247850
– user6247850, Commented Aug 5, 2021 at 16:46
I will definitely look the book. Could you illustrate how the example compute $\mathbb E[X|Y]$ is equal $Z$ using the first definition? Actually I feel lost when try to solve this. @user6247850 — WhyMeasureTheory
– WhyMeasureTheory, Commented Aug 5, 2021 at 16:50
@WhyMeasureTheory Added the example. That definition isn't terribly useful for computation, more verifying that the conditional expectation you computed is correct. — user6247850
– user6247850, Commented Aug 5, 2021 at 17:33
Sorry, for accepting your answer took much time. I was forgot to accept that. I came across this question and I was thinking, "Is it possible to find $\mathbb E[X|Y]$ without knowing the underlying probability measure?" I think only $\sigma-$algebra is enough, but couldn't make it surely, @user6247850 — WhyMeasureTheory
– WhyMeasureTheory, Commented Aug 21, 2021 at 7:42

Mittens · Accepted Answer · 2022-11-09 00:23:32Z

Here is another approach to the OP question. Since the question is about functions with values on Euclidean space, it is enough to consider each component of the random vectors and so, it suffices to consider the case where random variables take values on $\mathbb{R}$.

Here we give two alternative definitions of Conditional expectation. The first is rather geometric (orthogonal projections of Hilbert spaces) and should probably be the one to have always in mind; the second approach is more general and depends on the Radon-Nikodym theorem. Existence and uniqueness (almost surely, rather) will be clear in both approaches once the tools used are revealed. All equality between functions in this posting are meant to be almost surely.

Geometric approach: Let $(\Omega,\mathscr{F},\mathbb{P})$ a probability space, and let $\mathscr{A}$ a $\sigma$-algebra such that $\mathcal{A}\subset \mathcal{F}$. Consider the collection $\mathcal{H}=\mathcal{L}_2(\mathbb{P})$ that are sure integrable, i.e., $X\in\mathcal{H}$ iff $\mathbb{E}[X^2]<\infty$. Let $\mathcal{H}_\mathcal{A}$ the subspace of $\mathcal{H}$ which consists of $\mathcal{A}$-measurable functions. The space $\mathcal{H}$ is a Hilbert space with the inner product $\langle X, Y\rangle=\mathbb{E}[XY]$. It is not difficult to check that $\mathcal{H}_\mathcal{A}$ is a closed subspace of $\mathcal{H}$. For any $X\in\mathcal{H}$, the orthogonal projection $P_{\mathcal{A}}X$ of $X$ onto $\mathcal{H}_{\mathcal{A}}$, which exists and is unique ($\mathbb{P}$-almost surely), is an $\mathcal{A}$-measurable function such that for any other $Z\in\mathcal{H}_{\mathcal{A}}$, $$\mathbb{E}\Big[\big(X-P_\mathcal{A}(X)\big)^2\Big]\leq \mathbb{E}[(X-Z)^2]$$ and $$\mathbb{E}\big[(X-P_\mathcal{A}X)Z\big]=0$$ The first inequality means that $P_\mathcal{A}X$ is the best approximation to $X$ by elements in $\mathcal{H}_{\mathcal{A}}$. The second identity means that $X-P_\mathcal{A}(X)$ is orthogonal to the space $\mathcal{H}_\mathcal{A}$. The later property implies that for any set $A\in\mathcal{A}$ $$\mathbb{E}[X\mathbb{1}_A]=\mathbb{E}[(X-P_\mathcal{A}(X))\mathbb{1}_A]+\mathbb{E}[P_{\mathcal{A}}(X)\mathbb{1}_A]=\mathbb{E}[P_{\mathcal{A}}(X)\mathbb{1}_A]$$ where the last identity follows from the fact that $\mathbb{1}_A$ is itself an element of $\mathcal{H}_{\mathcal{A}}$. This motivates the following definition, even in the case where $X$ is merely an integrable function and not necessarily squared-integrable.

If $X$ is integrable, that is, $\mathbb{E}[|X|]<\infty$, then the conditional expectation of $X$ given $\mathcal{A}$ is an $\mathcal{A}$-measurable function, denoted as $\mathbb{E}[X|\mathcal{A}]$ such that for any $A\in\mathcal{A}$ \begin{align} \mathbb{E}[X\mathbb{1}_A]=\int_A X\,d\mathbb{P}=\int_A\mathbb{E} [X|\mathcal{A}]\,d\mathbb{P}=\mathbb{E}\Big[\mathbb{E}[X|\mathcal{A}]\mathbb{1}_A\Big]\tag{0}\label{zero} \end{align} When $X$ is also squared integrable, $E[X|\mathcal{A}]$ is the orthogonal projection of $X$ onto $\mathcal{H}_{\mathcal{A}}$.

The next approach is more general theoretical and based in the Radon-Nikodym theorem. It also applies to general $\sigma$-finite measure spaces. Suppose $(E,\mathscr{E},\mu)$ is a $\sigma$-finite measure, $(F,\mathscr{F})$ is a measure space and $T:(E,\mathscr{E})\rightarrow(F,\mathscr{F})$ measurable. Let $f\in L_1(\mu)$ and define the (real-valued or complex) measure $\mu^f$ as $\mu^f(A)=\int_A f\,d\mu$. The map $T$ induce a measures $\mu_T$ and $\mu^f_T$ on $(\mathscr{F})$ defined as \begin{align} \mu_T(A)&=\mu(T^{-1}(A))=\int \big(\mathbb{1}_{A}\circ T\big)\,d\mu\tag{1}\label{one}\\ \mu^f_T(A)&=\mu^f(T^{-1}(A))=\int \big(\mathbb{1}_A\circ T\big)\,f\,d\mu\tag{2}\label{two} \end{align} for all $A\in\mathscr{F}$. Notice that $\mu^f_T$ is a finite and absolutely continuous with respect to $\mu_T$: $\mu_T(A)=0=\mu(T^{-1}(A))$ implies that $\mu^f_T(A)=\int_{T^{-1}(A)} f\,d\mu=0$, and $|\mu^f_T(A)|\leq\int_{T^{-1}(A)}|f|\,d\mu\leq\int_E|f|\,d\mu<\infty$.
If $\mu_T$ is $\sigma$-finite, that is if $E$ can be covered by a sequence of sets $T^{-1}(A_n)$, $A_n\in\mathscr{F}$, with $\mu_T(A_n)<\infty$, then by the Radon-Nikodym theorem, there exists a unique $\mathscr{F}$-measurable function, which we denote by $P_Tf$, such that \begin{align} \mu^f_T(A)=\int_A P_Tf\,d\mu_T,\qquad A\in\mathscr{F}\tag{3}\label{three} \end{align} $P_Tf$ is the conditional expectation of $f$ under $T$. The following special case is the most familiar: Consider the case $E=F$ and $\mathscr{F}\subset\mathscr{E}$. The map $T:x\mapsto x$ is clearly $\mathscr{E}-\mathscr{F}$ measurable, and $\mu_T$ is the restriction of $\mu$ to $\mathscr{F}$. Then \eqref{three} reads $$\int \mathbb{1}_A f\,d\mu=\int\mathbb{1}_AP_Tf\,d\mu$$ When $\mu$ is a probability measure (then all $\sigma$-finiteness requirements for application of Radon-Nikodym are satisfied) we obtain a condition similar to \eqref{zero}. In this case, the description of $P_Tf$ as the conditional expectation of $f$ given $T$ coincides with that for conditional expectation of $f$ given $\mathscr{F}$.

Remarks:

The existance of $\mathcal{E}[X|\mathscr{A}]$ when $X$ is integrable but not square integrable can be obtain by approximating $X$ by bounded functions $X_N=\max(-N,\min(X,N))$, $N\in\mathbb{N}$ each of which is square integrable, and then passing to a subsequence where pointwise convergence holds. Uniqueness is easy: If $Y$ and $Z$ were to $\mathcal{A}$-measurable functions such that \eqref{zero} holds, then $A=\{Y<Z\}\in\mathscr{A}$ and $$0=\mathbb{E}\big[(Z-Y)\mathbb{1}_A\big]$$ The since $Z-Y>0$ on $A$, the properties of the integral implies that $\mathbb{P}[A]=0$. Similarly $\mathbb{P}[\{Z<Y\}]=0$ and so, $Z=Y$ ($\mathbb{P}$-a.s.).
The operator $P_T$ under a rather similar setting is also known as the Perron-Frobenius transfer operator

Wow, nice approach @OliverDiaz. I wish I have this answer that time. Thanks — WhyMeasureTheory
– WhyMeasureTheory, Commented Nov 9, 2022 at 18:07

Stack Exchange Network

Understanding conditional expectation using measure-theoretical definition

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Understanding conditional expectation using measure-theoretical definition

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions