2
$\begingroup$

I want to calculate the Mahalanobis distance between two vectors that represent two points.

For example:

u={1,2,4}; v={0,1,-2}; Mahalanobis[u_, v_] := Module[{cov, d}, ( cov = Covariance[{u, v}]; N@Sqrt[(u - v).PseudoInverse[cov].(u - v)] )] 

I developed this function but I am not sure about the covariance matrix?

$\endgroup$
5
  • $\begingroup$ What seems to be the problem? Is the output not what you expect? $\endgroup$ Commented Oct 14, 2014 at 20:13
  • 2
    $\begingroup$ That's not what the Mahalanobis distance is. The covariance matrix should be a third parameter independent of u and v. $\endgroup$ Commented Oct 14, 2014 at 20:19
  • 1
    $\begingroup$ @RahulNarain, I don't understand what do you mean by your comment. $\endgroup$ Commented Oct 14, 2014 at 22:47
  • $\begingroup$ For another explanation of Mahalanobis distance, see stats.stackexchange.com/questions/62092/… $\endgroup$ Commented May 22, 2017 at 13:22
  • $\begingroup$ Example in the docs: reference.wolfram.com/language/ref/TTest.html#75259751 $\endgroup$ Commented May 22, 2017 at 13:29

1 Answer 1

1
$\begingroup$

This is not what a Mahalanobis distance is. It isn't a distance between 2 vectors. It is defined as a distance between a vector and a cohort of vectors with a given mean and a covariance matrix (of the cohort).

Try this instead:

Dm = Compile[{{u, _Real, 1}, {\[Mu], _Real, 1}, {s, _Real, 2}}, First@\[Sqrt]((u - \[Mu]).Inverse[s].Transpose[{u - \[Mu]}]), CompilationOptions->{"ExpressionOptimization"->True}, RuntimeOptions->"Quality", RuntimeAttributes->Listable, CompilationTarget->"C"] cohort = RandomVariate[BinormalDistribution[{5, 5}, {.5, 1.5}, .3], 1000]; \[Mu] = Mean@cohort; s = Covariance@cohort; Print["\[Mu] = ",{\[Mu]}\[Transpose]//MatrixForm, " S = ",s//MatrixForm]; points = {\[Mu]} ~Join~Table[N@5+2{Cos[x],Sin[x]}, {x,1/16\[Pi],2\[Pi],1/8\[Pi]}] ~Join~Table[N@5+4{Cos[x],Sin[x]}, {x,1/16\[Pi],2\[Pi],1/8\[Pi]}]; ListPlot[{cohort, Labeled[#,Round[Dm[#,\[Mu],s],.01]]&/@points}, PlotRange->{{0,10},{0,10}},AspectRatio->1,PlotStyle->{Darker@LightBlue,{Red,PointSize[.01]}}] 

enter image description here

A bit of optimization:

(* Inverse[] cannot be parallelized and takes too long *) manypoints=RandomVariate[NormalDistribution[5,3],{1000000,2}]; AbsoluteTiming[Dm[#,\[Mu],s]&[manypoints];] (* {5.973094, Null} *) (* using 2D matrix inverse formula as a special case *) FastDm2D=Compile[{{u,_Real,1},{\[Mu],_Real,1},{s,_Real,2}}, First@Sqrt[(u-\[Mu]).{{s[[2, 2]]/(-s[[1, 2]] s[[2, 1]] + s[[1, 1]] s[[2, 2]]), -( s[[1, 2]]/(-s[[1, 2]] s[[2, 1]] + s[[1, 1]] s[[2, 2]]))}, {-( s[[2, 1]]/(-s[[1, 2]] s[[2, 1]] + s[[1, 1]] s[[2, 2]])), s[[1, 1]]/(-s[[1, 2]] s[[2, 1]] + s[[1, 1]] s[[2, 2]])}}.Transpose[{u-\[Mu]}]], CompilationOptions->{"ExpressionOptimization" -> True},Parallelization->True, RuntimeOptions->"Quality",RuntimeAttributes->Listable,CompilationTarget->"C"]; AbsoluteTiming[FastDm2D[#,\[Mu],s]&[manypoints];] (* {0.222699, Null} *) 

Of course that looks MUCH prettier when typed in Mathematica:

enter image description here

EDIT - OPTIMIZATION:

(* adding a wrapper to precompute inverse of S produces the fastest results *) FastDmCompiled = Compile[{{u, _Real, 1}, {\[Mu], _Real, 1}, {sInv, _Real, 2}}, First@Sqrt[(u - \[Mu]).sInv.Transpose[{u - \[Mu]}]], CompilationOptions -> {"ExpressionOptimization" -> True}, Parallelization -> True, RuntimeOptions -> "Quality", RuntimeAttributes -> Listable, CompilationTarget -> "C"]; FastDm[u_, \[Mu]_, s_] := FastDmCompiled[u, \[Mu], Inverse[s]]; AbsoluteTiming[FastDm[#,\[Mu],s] &[manypoints];] (* {0.151167,Null} *) 

Hope that helps. Good luck!

$\endgroup$
5
  • $\begingroup$ You might consider using LinearSolve[] instead of Inverse[] here. $\endgroup$ Commented Apr 22, 2017 at 11:01
  • $\begingroup$ Have an example? I tried to plug in LinearSolve[s,IdentityMatrix@Length@s] and had no luck. Much slower than Inverse[]. $\endgroup$ Commented Apr 22, 2017 at 11:11
  • $\begingroup$ That's not the way to use LinearSolve[]; try Sqrt[(u - μ).LinearSolve[s, u - μ]]. You might want to see this. $\endgroup$ Commented Apr 22, 2017 at 11:18
  • $\begingroup$ Still slower on my machine than Inverse, and still cannot be parallelized. I'm running 11.1 on Win7 with 8-thread i7 CPU. It's probably because it is already parallelized internally, and thus refuses to be run in parallel inside a compiled function. Says CompiledFunction::pext: Instruction 3 in CompiledFunction[...] calls ordinary code that can be evaluated on only one thread at a time. $\endgroup$ Commented Apr 22, 2017 at 11:30
  • $\begingroup$ Of course Inverse[s] can simply be passed into the function. ;-) That solution simply FLIES! for any vector dimension. Just edited the post to append this result. $\endgroup$ Commented Apr 22, 2017 at 11:41

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.