10
$\begingroup$

Let us consider a pair of random variables $y, x$ with a conditional distribution $p(y | x)$ and marginal distributions $p_Y(y), p_X(x)$. We observe an i.i.d. sample $D_x = \{x_i\}_{i = 1}^n$. The index $j = \mathrm{arg}\min_i x_i$.

What is the distribution of $y_j$?

For example, $(y, x)$ have a joint Gaussian distribution.

$\endgroup$
6
  • 1
    $\begingroup$ A general formula will be complicated due to the need to handle the chance of ties (leading to ambiguity in $j$). Would you like to make some simplifying assumptions, such as that the marginal distribution of $x$ is continuous? $\endgroup$ Commented Aug 12 at 15:47
  • 4
    $\begingroup$ It is incorrect to write $p_y(y)$ or the like. Often $p_Y(y)$ is used, where the capital $Y$ is the random variable and the lower-case $y$ is any of the possible values of the argument. Thus $p_Y(3)$ is the value of the density function of the random variable $Y$ at $3,$ and similarly for $p_X(3).$ Without this distinction, one cannot even understand such things as $\Pr(Y\le y).$ (But this is a good question.) $\endgroup$ Commented Aug 12 at 19:27
  • $\begingroup$ @MichaelHardy Good points--but "incorrect" is too strong. In my post (for consistency with the question) I continued to use this potentially ambiguous notation because what matters in the analysis are the random variables $x_j$ and $y_j,$ eliminating most of the problems. ("$F_x(x)$" overloads the symbol "$x$" but its meaning is clear.) Moreover, one is not obliged to call a possible value of a random variable "$x.$" Thus, for instance--everyone would hate this but it's legitimate--one could define the CDF of the random variable $x$ as $F_x(X)=\Pr(x \le X)$ or even $F_x(t)=\Pr(x\le t).$ $\endgroup$ Commented Aug 13 at 15:49
  • 1
    $\begingroup$ @whuber And notice how I used the word "often." $\endgroup$ Commented Aug 14 at 18:36
  • 1
    $\begingroup$ @MichaelHardy Certainly! I recall plotting graphs of some functions in high school where I made the y axis horizontal and the x axis vertical. Although they were clearly labeled, the teacher would not give me credit -- and I think they were right, because they saw that I although I understood the mathematics just fine, there was a lesson to be learned about communication (that I never forgot). $\endgroup$ Commented Aug 14 at 19:01

1 Answer 1

13
$\begingroup$

Let $F_x$ be the marginal cumulative distribution function (CDF) of $x$ and let $F_{y\mid x}$ be the conditional CDF. The independence of the data (with an implicit assumption that $F_x$ is continuous) implies the CDF of $x_j$ is

$$\Pr(x_j \le x) = 1 - \left(1 - F_x(x)\right)^n.$$

Consequently the CDF of $y_j$ is obtained by averaging over $x_j,$

$$\Pr(y_j \le y) = \int_{\mathbb R} F_{y\mid x}(y)\, \mathrm d \left[1 - \left(1 - F_x(x)\right)^n\right].$$


For example, let $(x,y)$ have a Binormal distribution. Choose units of measurement for $x$ in which $F_x$ is standard Normal, $F_x = \Phi.$ Then the regression of $y$ on $x$ is linear with regression function $\hat y = \alpha + \beta x$ for parameters $\alpha$ and $\beta,$ whence (writing $\sigma^2$ for the conditional variance)

$$F_{x\mid y}(y) = \Phi\left(\frac{y - (\alpha + \beta x)}{\sigma}\right)$$

giving (with $\varphi = \Phi^\prime$ the standard normal density function)

$$\Pr(y_j \le y) = n\int_{\mathbb R} \Phi\left(\frac{y - (\alpha + \beta x)}{\sigma}\right)\left(1 - \Phi(x)\right)^{n-1}\varphi(x)\,\mathrm d x.\tag{*}$$

I believe this doesn't simplify (except for tiny values of $n$), but it is amenable to numerical integration for efficient evaluation (as shown below in the R code for the function f).


A quick simulation (in R) supports this formula. The histogram plots the empirical density of $10^5$ values of $y_j$ for $n=30,$ $\alpha = 0,$ $\beta = 1,$ and $\sigma^2 = 2/3$ while the red curve plots the derivative of $(*),$ the conditional density. They agree to within random variation.

enter image description here

BTW, don't let this fool you into thinking the distribution of $y_j$ is Normal or even approximately so. Although in many cases it will be (small values of $|\beta|,$ $n,$ and $\sigma$ are conducive to this appearance), when $y$ and $x$ are strongly correlated and $n$ is sufficiently large, the skewness in the distribution of the minimum of the $x_i$ will become apparent. For instance, changing $n$ to $300$ and $\beta$ to $-5$ yields this obviously skewed histogram:

enter image description here

n <- 30 alpha <- 0 beta <- 1 sigma <- sqrt(2/3) # # Simulation. # set.seed(17) n.sim <- 1e5 x <- apply(matrix(rnorm(n * n.sim), n), 2, min) y <- beta * x + alpha + rnorm(length(x), 0, sigma) # # Plot (a subset of) the (x, y_j) data. # j <- seq_len(min(n.sim, 1e3)) plot(x[j], y[j]) # # Integration. # f <- function(x, y, n, alpha, beta, sigma) { exp(pnorm(y, alpha + beta * x, sigma, log = TRUE) + pnorm(x, log = TRUE, lower.tail = FALSE) * (n - 1) + dnorm(x, log = TRUE) + log(n)) } f.yx <- Vectorize(function(y, ...) { integrate(\(x) f(x, y, n, alpha, beta, sigma), lower = -Inf, upper = Inf, ...)$value }) # # Numerical differentiation. # Y <- seq(min(y), max(y), length.out = 101) Z <- f.yx(Y, rel.tol = 1e-12) # (Can benefit from high precision) dZ.dY <- diff(Z) / diff(Y) y0 <- (head(Y, -1) + tail(Y, -1)) / 2 # # Plot the results. # hist(y, freq = FALSE, ylim = c(0, max(dZ.dY)), main = expression("Histogram of " * y[j]), xlab = "Value", breaks = 50) lines(y0, dZ.dY, col = "Red", lwd = 2) 
$\endgroup$
10
  • 2
    $\begingroup$ The first formula I believe is $P(x_j \leq x) = 1 - [1 -F(x)]^n$. Also in the 2-nd formula one should read $F_{y | x}$ in place of $F_{x | y}$. $\endgroup$ Commented Aug 13 at 6:25
  • 1
    $\begingroup$ @Yves Many thanks for the corrections. $\endgroup$ Commented Aug 13 at 14:05
  • $\begingroup$ In formula (*) there lacks the normal density say $\phi(x)$ which is correctly used in the R code. Best regards. $\endgroup$ Commented Aug 13 at 15:24
  • 1
    $\begingroup$ Although I agree that this is not the case here, random indices in random sequences are a source of quite subtle problems. Attempt of a short proof: for any "test" function $g(y)$ we have $\mathbb{E}[g(Y_J) \, \vert \, X_J] = \sum_{j=1}^n \mathbb{E}[g(Y_J) \, \vert \, X_J,\, J =j] \Pr[J = j \vert X_J]$. The conditional expectation at r.h.s. is $\mathbb{E}[g(Y) \, \vert \, X]$ and the probabilities are $1/n$. This is similar to the 2-nd sampling you describe. $\endgroup$ Commented Aug 19 at 6:42
  • 1
    $\begingroup$ @Yves I appreciate you drawing attention to these subtleties. $\endgroup$ Commented Aug 19 at 13:39

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.