4
$\begingroup$

Consider two independent and identically distributed random vectors of dimensionality $N$, $\mathbf{x}$ and $\mathbf{y}$, where their elements are iid generated from a Gaussian with zero mean and variance equal to $\sigma^2$; i.e., $\mathbf{x} , \mathbf{y} \sim \mathcal{N}(0, \sigma^2)$.

What is the probability that $\mathrm{sign}(\mathbf{x})=(\mathrm{sign(x_1)},\mathrm{sign(x_2)}, \ldots, \mathrm{sign(x_N)})$ be equal to $\mathrm{sign}(\mathbf{y})=(\mathrm{sign(y_1)},\mathrm{sign(y_2)}, \ldots, \mathrm{sign(y_N)})$ if their Euclidean distance is smaller than $r$, $r \geq 0$: $\Pr\left[ \mathrm{sign}(\mathbf{x})=\mathrm{sign}(\mathbf{y})\ \vert\ \Vert \mathbf{x} - \mathbf{y} \Vert \leq r\right]$.

Note: $\mathbf{x}$ and $\mathbf{y}$ are independent.

$\endgroup$
6
  • 1
    $\begingroup$ I assume you meant $\Vert \mathbf{x} - \mathbf{y} \Vert$ in your last line of math? $\endgroup$ Commented Dec 12, 2011 at 21:04
  • 2
    $\begingroup$ What kind of answer are you looking for? Even in the case $N=1$ there is no closed form solution (except possibly for special values of $r/\sigma$). $\endgroup$ Commented Dec 13, 2011 at 6:20
  • $\begingroup$ @jbowman: you are right. $\endgroup$ Commented Dec 13, 2011 at 20:15
  • $\begingroup$ @whuber: any upper or lower bound is quite good for me. $\endgroup$ Commented Dec 13, 2011 at 20:16
  • $\begingroup$ You mean the obvious bounds of $2^{-N}$ and $1$ will be fine? $\endgroup$ Commented Dec 13, 2011 at 20:23

1 Answer 1

3
$\begingroup$

This is not a complete answer, but may provide you a way to go. As a hands-on-approach I suggest to perform a simulation to get an idea how the resulting property looks like and then start to derive a formula for it.

Here is such a simulation for a fixed standard deviation. Playing around with other standard deviations returned similar results.

require(lattice) stdev <- 1 N <- 5 trials <- 2000 resPerN <- data.frame("N"=rep(1:N,each=trials), "r"=rep(-1,N*trials), "prob"=rep(-1,N*trials)) for(n in 1:N){ res <- data.frame("r"=rep(-1,trials),"signum"=rep(-1,trials)) for(i in 1:trials){ x <- rnorm(n,0,stdev) y <- rnorm(n,0,stdev) diff <- sqrt(sum((x-y)^2)) signum <- sum(sign(x) == sign(y)) == n res[i,] <- c(diff,signum) } res <- res[order(res$r),] range <- (1+(n-1)*trials) : (n*trials) resPerN$N[range] <- n resPerN$r[range] <- (res$r - min(res$r))/(max(res$r)-min(res$r)) invprob <- 1/(cumsum(res$signum)/cumsum(1:trials)) invprob[which(invprob == Inf)] <- 0 resPerN$prob[range] <- (invprob - min(invprob))/(max(invprob)-min(invprob)) } xyplot(prob~r,data=resPerN,groups=resPerN$N,type="b",xlab="(min-max-transformed) r",ylab="(min-max-transformed) 1/prob | <= r",auto.key=T, par.settings=list(superpose.line = list(col = rainbow(5),lty = 1), superpose.symbol=list(col = rainbow(5),pch=15,cex=0.8))) 

which resulted in this plot enter image description here

which indicates a (n inverse) logistic relationship.

Going further by collecting beta-coefficients and intercepts, one pair per dimension, you should be able to derive a function dependent on r. Afterwards I'd vary the standard deviation to include it into the equation.

$\endgroup$
2
  • $\begingroup$ There is no need to consider multiple standard deviations. Do you see why? Any answer will be a function of $N$ and $r/\sigma$ only. $\endgroup$ Commented Dec 14, 2011 at 15:06
  • $\begingroup$ @cardinal yes I see, thank you for the hint. One can transform $N(0,\sigma^2)$ to standard normal distribution and use $||ax-ay||=a||x-y||$ $\endgroup$ Commented Dec 14, 2011 at 15:38

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.