Condition number of a matrix do not increase when data is removed

Question

If anything, it will most likely decrease.

Let $A$ be a matrix with $m$ rows and $n$ columns ($m\le n$) and $\sigma_m(A)>0$. Let $A'$ be the same matrix with one row removed. Their condition numbers are $$\kappa=\frac{\sigma_1(A)}{\sigma_m(A)},\qquad\qquad\kappa'=\frac{\sigma_1(A')}{\sigma_{m-1}(A')}$$ Hypothesis: $$\kappa\ge\kappa'$$ This is an empirical observation. Once I got a wind of it, I tried to generate some matrices randomly and iterated through rows keeping track of what the best and the worst condition numbers are for the matrix with one row removed. Turned out the worst one never got bigger than the one for the original matrix. Random matrices was made up from uniformly distributed numbers ($[0,1]$) in one case, normally distributed numbers in the second and the last case was with sparse matrices and low integer numbers. The last case even let me see rare events when there existed a row after removal of which condition number remained the same.

Then I tried to optimize a 3x3 matrix with fixed condition number in the form $$A=U\Sigma$$ and see what the worst and the best case will be after removal of the last row. Turned out the worst condition number never got higher than the original one and in the best case (no surprise) it improved to $1$.

So, my question is thus: is hypothesis is true or wrong? If hypothesis is wrong, then there should exist a counter-example. It's possible that I never stumbled on one, so can anybody provide it? And if the hypothesis is true, then there should exist an explanation or better yet a theorem that proves it. Can anybody describe what is going on and why? And possibly provide reference to some beginner and intermediate level books and/or articles discussing this question in depth?

From stated above it can be assumed that I want my matrix to have as large a condition number as possible, but that would be wrong assumption. As any sane human, I want my matrix to have a condition number as close to $1$ as possible, but to get there with the only operation allowed — removal of rows — I need to understand when and why the worst case happens and how "bad" it can be.

I have some good understanding of linear algebra (at least I think I do) and on friendly terms with singular value decomposition (one of the most powerful tools ever thought up), which most likely will be needed in the explanations, seeing as condition number is computed though singular numbers. But I don't have higher education, so, please, let your answers and references not be too "mathy".

Thanks everyone in advance.

EDIT: It was rightly pointed out in the comments below that terms singular/nonsigular do not apply to nonsquare matrices. That was my mistake, made from a desire to use concise descriptive word, and I'm sorry. What I meant is, of course, that all singular values of matrix $A$ are distinct from $0$ or, equivalently, that $\operatorname{rank}(A)=\min(m,n)$. This case seems to be somewhat easier to study.

EDIT 2: It was pointed out to me (in several places) that the key to my question is Cauchy's interlace theorem (CIT for short, also called Poincare separation theorem by wiki). This required me to assimilate of a lot of new information. I found an easy to understand elementary proof in this year 2004 article. This proof uses only single outside reference to one of the most fundamental and widely used theorems of calculus: the intermediate value theorem. Everything else is just straightforward technical work. Instead, the proof of this article, which aspire to be the shortest possible (and hence the easiest), pawns off all the problems and rigours of work to some theorem from entirely different area of mathematics. Yes, it's an example of math beauty, but one has to know both areas to truly appreciate it. Older books seem to prove CIT in conjunction with the Courant-Fisher min-max theorem (Sturmian separation theorem, Richard Bellman, Introduction to matrix analysis, 1987, 2nd ed., pp. 115-118) or by referencing to Sylvester's law of inertia (Beresford N. Parlett, The Symmetric Eigenvalue Problem, 1980, par. 10.1). It's possible there are other approaches, but my short-time self-study allowed only so much information to be consumed. Anyway, in most relevant here form CIT is stated as follows:

Cauchy's interlace theorem (for eigenvalues). Let $Q\in\mathbb{R}^{m\times m}$ be a symmetric matrix with eigenvalues: $$\lambda_1\ge\lambda_2\ge\ldots\ge\lambda_m$$ and let $Q'\in\mathbb{R}^{(m-l)\times (m-l)}$ be a principal submatrix of $Q$ with eigenvalues: $$\mu_1\ge\mu_2\ge\ldots\ge\mu_{m-l}$$ Then their eigenvalues interlace: $$\lambda_k\ge\mu_k\ge\lambda_{k+l},\qquad 1\le k\le m-l$$

Now, to get the same relations between singular values of matrices $A$ and $A'$ (which is $A$ with rows removed) we need to polish results above a little. It's very straightforward work (which adds nothing new of substance) and uses only the fact that singular value decomposition (SVD for short) exists (otherwise there would be nothing to compare).

Corollary (CIT for singular values). Let $A\in\mathbb{R}^{m\times n}$, $m\le n$ be a matrix with an economic SVD: $$A=U\Sigma V^T,\quad U^TU=UU^T=V^TV=I_m$$ $$U\in\mathbb{R}^{m\times m},\qquad \Sigma=\operatorname{diag}(\sigma_1,\sigma_2,\ldots,\sigma_m),\qquad V\in\mathbb{R}^{n\times m}$$ Let $A'\in\mathbb{R}^{(m-l)\times n}$ be a matrix resulted from $A$ by removal of $l$ rows ($0<l<m$) and having an economic SVD: $$A'=U'\Sigma'V'^T,\quad U'^TU'=U'U'^T=V'^TV'=I_{m-l}$$ $$U'\in\mathbb{R}^{(m-l)\times(m-l)},\qquad \Sigma'=\operatorname{diag}(\tau_1,\tau_2,\ldots,\tau_{m-l}),\qquad V'\in\mathbb{R}^{n\times(m-l)}$$ Here $I_m$ and $I_{m-l}$ are identity matrices with specified size (by index).

Then their singular values interlace: $$\sigma_k\ge\tau_k\ge\sigma_{k+l},\qquad 1\le k\le m-l$$

Proof. Lets first note that the matrix $A'$ can be expressed in terms of the matrices making up the SVD of $A$ (but an expression itself is not an SVD of anything): $$A'=U_{\text{red}}\Sigma V^T$$ Where $U_{\text{red}}$ is the matrix $U$ with the same rows removed that are removed from $A$ to get $A'$. Lets introduce two new matrices: $$Q=AA^T=U\Sigma V^TV\Sigma^TU^T=U\Sigma^2U^T\in\mathbb{R}^{m\times m}$$ $$Q'=A'A'^T=U'\Sigma'V'^TV'\Sigma'^TU'^T=U'\Sigma'^2U'^T\in\mathbb{R}^{(m-l)\times(m-l)}$$ $$Q'=U_{\text{red}}\Sigma V^TV\Sigma^TU_{\text{red}}^T=U_{\text{red}}\Sigma^2U_{\text{red}}^T$$ From the first two equations we see that these two matrices are symmetric and what their eigenvalues are. And from the third equation we see that $A'$ is a principle submatrix of $A$ hence the CIT for eigenvalues is applicable here: $$\sigma^2_k\ge\tau^2_k\ge\sigma^2_{k+l},\qquad 1\le k\le m-l$$ Since all singular values are non-negative by definition, squares can be dropped: $$\sigma_k\ge\tau_k\ge\sigma_{k+l},\qquad 1\le k\le m-l$$ Q.E.D.

Finally, to prove my hypothesis only two inequalities are needed (for $k=1$ and $k=m-l$; positivity of $\sigma_m$ is given by problem statement, positivity of every other SV follows): $$\sigma_1\ge\tau_1>0\Longrightarrow\frac{\sigma_1}{\tau_1}\ge 1$$ $$\tau_{m-l}\ge\sigma_m>0\Longrightarrow\frac{\tau_{m-l}}{\sigma_m}\ge 1$$ Lets write condition numbers being compared as ratio, then rearrange and substitute: $$\frac{\kappa}{\kappa'}=\frac{\sigma_1}{\sigma_m}\frac{\tau_1}{\tau_{m-l}}=\frac{\sigma_1}{\tau_1}\frac{\tau_{m-l}}{\sigma_m}\ge 1$$ Then $$\kappa\ge\kappa'\ge 1$$Q.E.D.

Being armed with knowledge above it's easy to suggest a matrix removal of any row of which will not decrease its condition number. Its two greatest and two lowest singular values should be equal: $$\sigma_1=\sigma_2>\sigma_{m-1}=\sigma_m>0$$ Which means an example should have at least 4 rows. It seems impossible to provide a 3-row matrix with such a property, but I don't know how to prove it. A 2-row matrix is always drops its condition number to 1 after removal of a row because 1-row matrix has $\kappa=1$ by definition. Also, in the worst case removal at least half the rows (with right indices) of a matrix is needed to decrease its condition number.

For the record, I found this question which is closely related to mine, but being asked in reverse.

I've got some follow-up and related questions, but as this post got big enough, I'll ask them (or not) separately. Feel free to correct any mistakes and typos I've made (I hope there is none, but I wouldn't hold a candle) and comment if I got something wrong. It seems to me that facts stated above are most basic knowledge required in data reduction science (if such a division exists) and I would like to hear a detailed comment about this and an advice in which direction to move to learn more. Thanks everyone for help.

Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community
– Community Bot, Commented May 11 at 13:36
how is your opening sentence "Let $A$ be a nonsingular matrix with $m$ rows and $n$ columns and $m\le n$" supposed to be true? If $m\lt n$ then $A$ always has a non-trivial kernel [check rank-nullity] and $A'$ must always have a non-trivial kernel. I'm guessing you meant for both of them to be surjective. As far as your question goes, consider writing $A^*A =(A')^*A' + \mathbf x\mathbf x^*$ and working through the implications. — user8675309
– user8675309, Commented May 11 at 16:25
Also your statement $\kappa'=\frac{\sigma_1(A')}{\sigma_m(A')}$ does not make sense since $A'$ only has $m-1$ rows so if you want to say it has an $m$th largest singular value then that must be zero. This is not an insignificant point since interlacing combined with my previous hint tells you $\sigma_{m}(A)^2\leq \sigma_{m-1}(A')^2$ and $\sigma_{1}(A')^2\leq \sigma_{1}(A)^2$ — user8675309
– user8675309, Commented May 12 at 3:15
@BaR5uk I ran a simple experiment testing the row removal on 10,000 random $100×10$ Gaussian matrices and I couldn't reproduce the observation. $κ$ actually increased 58% of the time. — Hyperplane
– Hyperplane, Commented May 17 at 10:33
Is this question not just a consequence of these estimates on extremal singular values when rows are added? math.stackexchange.com/questions/1198379/… — whpowell96
– whpowell96, Commented May 17 at 22:25

Stack Exchange Network

Condition number of a matrix do not increase when data is removed

0

You must log in to answer this question.

Linked

Hot Network Questions

Condition number of a matrix do not increase when data is removed

0

You must log in to answer this question.

Linked

Related

Hot Network Questions