3
$\begingroup$

This code implements Slice Inverse Regression (SIR) in an unusual way. I notice that, when I compare it to the standard algorithm, the modified algorithm does better. By better, I mean that the function is more collapsed along the principle direction found by the SIR algorithms.

I show a simple example below. The plot shows the $y$ plotted against the active direction in the $x$ space. Notice that there is certainly more spread to the blue dots than the purple dots, which both have the same sizes. This is a subtle example, but it is simple and repeatable and I hope it gets my point across. The underlying function here is $y=a^Tx$, for an arbitrary $a$ vector, and each x is distributed from 0 to 1.

I have not seen this approach before and am baffled that it improves the performance of the SIR algorithm. Why does the modified algorithm perform better? Is this "matrix whitening"? Is my implementation lacking? I read the papers that the code cites but they do not advocate for this advanced QR decomposition.

The simple algorithm is to sample a function $y=y(x)$ $N$ times, resulting in an $NxM$ matrix, $x$ and a vector with $N$ entries, $y$. In preprocessing,subtract the average along each dimension in $x$ from $x$ so that it has zero mean, $x=x-\overline{x}$, where $\overline{x}$ yields the average along each dimension of $x$, then divide by the sum of the covariance matrix of x. Arrange x and y such that $Y_1 < Y_2 \dots < Y_R$, where $R$ is the desired number of partitions and $X_R$ and $Y_R$ are equal-sized sets of partitions (so the inequalities are true for any member of any set and the x-y ordering is intact). Next, estimate a covariance matrix from each slice, $C=\frac{1}{N}\sum_{r=1}^R N_r \mu_r \mu_r^T$, where $N_r$ is the number of entries in the $r^\mathrm{th}$ partition and $\mu_r$ is the mean of $X_r$ along each dimension (so dimension$(\mu_r)=$dimension$(x)$). Finally, calculate the important directions with an eigen decomposition of the covariance matrix, $W^T \Lambda W = C$.

The modified algorithm is similar, but performs a $QR$ factorization before estimating the covariance matrix, and finds the important directions using the covariance matrix eigen decomposition. Specifically, $x$ is decomposed as $x=QR$, the covariance matrix is estimated as $\tilde{C}=\frac{1}{N}\sum_{r=1}^R N_r \tilde{\mu}_r \tilde{\mu}_r^T$, where $\tilde{\mu}_r$ is the mean of the $r^\mathrm{th}$ partition of $\sqrt{N}Q$ along each input dimension. Perform an eigen decomposition to yield $\tilde{W}^T\tilde{\Lambda}\tilde{W}=\tilde{C}$. Finally, solve $\sqrt{N}RL=\tilde{M}$, where $\tilde{M}$ is the matrix $W$ where each eigenvector is reversed. $L$ is the matrix of important directions.

Here is a numerical example. sliced is the python package from github, the simple approach does not use QR factorization and is consistent with wikipedia, and "Sir with QR" is my implementation of the github code.

import numpy as np import matplotlib.pyplot as plt from sliced import SlicedInverseRegression # sliced inverse regression # X - NxM matrix of M inputs and N observations # Y - vector of responses of length N # Groups - number of partitions # SIMPLE - Flag to not use QR approach def SIR(X, Y, nPartitions=10, SIMPLE=False): # get number of outputs ny = Y.size # calculate number of members in each partition groupsize = ny // nPartitions # transform X so that each input has zero mean X = X - np.mean(X, 0) X /= np.sqrt(np.sum(np.cov(X.T))) # sort X by ordering of y sidx = np.argsort(Y) sortedX = X[sidx] # initiate empty covariance matrix C = np.zeros((X.shape[1], X.shape[1])) if not SIMPLE: # Q is orthonormal, R is upper diaganol Q, R = np.linalg.qr(X) # Construct variable Z to estimate slice covarances #Z = Q[sidx, :] Z = (np.sqrt(ny) * Q)[sidx, :] # iterate through each partition for ii in range(nPartitions): # subx holds the x entries associated with this partition if not SIMPLE: subx = Z[ii * groupsize : (ii + 1) * groupsize] else: subx = sortedX[ii * groupsize : (ii + 1) * groupsize] # estimate covariance matrix associated with each slice nx = subx.shape[0] # weigh matrix by number of members mur = np.mean(subx, 0) C += nx * np.outer(mur, mur.T) # divide by number of samples, yielding average covaraince matrix C /= (X.shape[0]) # Just do eigen decomposition of covariance matrix in simple appraoch if SIMPLE: v, W = np.linalg.eig(C) return (v, W) # compute eigenvalues evals, evecs = np.linalg.eigh(C) # reverse each eigenvector evecs = evecs[:, ::-1] # Solve for final directions ?? #dirs = np.linalg.solve(ny * R, evecs) dirs = np.linalg.solve(np.sqrt(ny) * R, evecs) # normalize eigenvectors for ii in range(dirs.shape[0]): dirs[:, ii] /= np.sqrt(np.sum(dirs[:, ii] ** 2)) return (evals[::-1], dirs) # I use the angle_between function to line up eigenvectors def unit_vector(vector): return vector / np.linalg.norm(vector) def angle_between(v1, v2): v1_u = unit_vector(v1) v2_u = unit_vector(v2) return np.arccos(np.clip(np.dot(v1_u, v2_u), -1.0, 1.0)) if __name__ == '__main__': # construct X and Y data np.random.seed(12345) X = np.random.uniform(0, 1, (1000, 4)) Y = (np.array([3, 1, 5, -1]).dot(X.T)) ** 2 # used the sliced package sir = SlicedInverseRegression(n_directions=200) sir.fit(X, Y) plt.scatter(sir.directions_[0].dot(X.T), Y, s=7, marker='*', c='k', label='Sliced') # use the "simple" approach v, W = SIR(X, Y, SIMPLE=True) if np.rad2deg(angle_between(W[:, 0], sir.directions_[0])) > 90: W *= -1 plt.scatter(W[:, 0].dot(X.T), Y, c='none', edgecolor='blue', label='Simple SIR') # use the "not simple" approach, that involves the QR decomposition v, W = SIR(X, Y, SIMPLE=False) if np.rad2deg(angle_between(W[:, 0], sir.directions_[0])) > 90: W *= -1 plt.scatter(W[:, 0].dot(X.T), Y, c='none', edgecolor='purple', label='SIR with QR') plt.legend() plt.savefig('compared') plt.clf() 

enter image description here

$\endgroup$

1 Answer 1

1
$\begingroup$

Even if very late, this is a very good catch. The python package you are using is consistent with R package dr, which uses QR decomposition. I can't say where this can improve the SIR algorithm, but I noticed a couple of things about your implementation. First in the package they weigh the Z_means by sqrt(N) and not by N. Second, they apply eigen-decomposition on the outer product of their Z_means, instead of adding the outer product of Z_mean of each slide every time inside a loop as you do. Not sure if this creates a difference, though. Finally, (and I think this matters): when you return (v,W) if SIMPLE, you do not normalize back the directions. Maybe try to normalize them back with the sqrt of inverse of X covariance matrix instead of using the angles. It would be helpful if you could quantify the deviations from the true sliced also, e.g. with sum of squared deviations

$\endgroup$
2
  • $\begingroup$ cran.r-project.org/web/packages/dr/vignettes/overview.pdf here is an explanation on the use of QR decomposition $\endgroup$ Commented Apr 9, 2022 at 13:07
  • $\begingroup$ Thanks for this detailed response, it makes a lot of sense. I didn't realize that the QR factorization served to normalize the data. $\endgroup$ Commented Apr 14, 2022 at 0:10

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.