Sample data and its corresponding random variables

Question

Simple question but I can't find the logic behind that. In many texts I see expressions like

"Let $\{ x_1, \dots, x_n\}$ denote our sample data and $\{ X_1, \dots, X_n\}$ their corresponding random variables".

The above expression seems quite ambiguous to me since $X_i$ is in general a real-valued function and $x_i$ can be either a scalar $\in \mathbb{R}$, or a vector $\in \mathbb{R}^n$. So $X_i$ can be (for example) the $i$-th column of the matrix with columns $x_i, \forall i \in \{1,\dots,n\}$, or any other real-valued function as well. As a more concrete example, in the link

https://courses.cit.cornell.edu/econ620/reviewm8.pdf,

at the end of page 2, the author says "Let $t = t(y)$ be a function of the observations and let $T = t(Y)$ be the corresponding random variable". Definitely there is some underlying logic but wasn't able to find out from my bibliography. So can you please explain me what's the "silent" identification that happens between actual data and random variables? Cheers.

The quotation exemplifies a common (and egregious) abuse of language and mathematical notation-but I doubt it's going to disappear soon. Strictly speaking, $t(y)$ references a value in the image of the function $t$ and "$t=t(y)$" is rarely true, because it equates a function to one of its values. The author meant this as shorthand for "let $t$ be a [measurable] function and let $y$ be the generic name for a value in its domain; and let $T=t\circ Y$ be the random variable corresponding to a random variable $Y$ whose image is in the domain of $T.$" You can see why a shorthand language is useful. — whuber
– whuber ♦, Commented Feb 12, 2020 at 16:47

Forgottenscience · Accepted Answer · 2020-02-12 16:07:35Z

Consider a function $f: \mathcal X \rightarrow \mathcal X$. How would you distinguish between $f$ and a particular realization of it, $f(x)$? In general this is completely clear, the function is an object that lives on some abstract space while $f(x)$ is some value in $\mathcal X$.

A random variable is the exact same - $Y$ is in reality a function from a underlying probability space $\Omega$ into $\mathcal X$, which typically is the real numbers or similar. So when we write $Y$ we really mean the function $Y: \Omega \rightarrow \mathcal X$. A sample from $Y$ is just $Y(\omega)$, which lives in $\mathcal X$. So any sample is really just values of the function $Y$ evaluated at different $\omega$'s. It also that if we define $Z = f(Y)$, we have a composite function, which is just another random variable.

thanks for the reply; I don't understand how given $\{x_1, \dots ,x_n\}$ sample data, one constructs $\{X_1, \dots, X_n\}$. — user430191
– user430191, Commented Feb 12, 2020 at 16:12
+1. Random variables are not "constructed" from data: they are theoretical models used to analyze the data. — whuber
– whuber ♦, Commented Feb 12, 2020 at 16:40
@whuber I understand that we want to create an abstraction when pass from sample data to random variables. However, there must be some "natural" way of doing this. I am tempted to say categorical way, but this is not a statistical term. — user430191
– user430191, Commented Feb 12, 2020 at 16:45
The process of using random variables to understand data (or quantitative theories) is known generally as "stochastic modeling" or creating "probabilistic models." It often requires experience, creativity, imagination, multiple attempts, good understanding of the intended applications, and skill with manipulating such models. Those with sufficient experience may feel that certain approaches are "natural," but I don't think we could equate "natural" with "automatic" or "routine." — whuber
– whuber ♦, Commented Feb 12, 2020 at 16:50

Stack Exchange Network

Sample data and its corresponding random variables

1 Answer 1

Hot Network Questions

Sample data and its corresponding random variables

1 Answer 1

Related

Hot Network Questions