1
$\begingroup$

From this answer I have that $ \int_Yf(y)\,\mathrm{d}(g\mu)(y)=\int_Xf(g(x))\,\mathrm{d}\mu(x)$, where $g$ is a map between measurable spaces and $g\mu$ is the image measure.

With $X=[0,r]\times[0,2\theta]$ and $Y=[-r,r]\times\left[-\sqrt{r^2-x^2},\sqrt{r^2-x^2}\right]$ I’d like to show $$ \int_{[-r,r]\times\left[-\sqrt{r^2-x^2},\sqrt{r^2-x^2}\right]}f\,\mathrm{d}(\lambda(x)\times\lambda(y))=\int_{[0,r]\times[0,2\theta]}ft\,\mathrm{d}(\lambda(t)\times\lambda(\theta))$$ (which is used in calculus but never rigorously proved). For this I need to show that the image measure on the sigma algebra of $Y$ under the map $g(r,\theta)=(r\sin\theta,r\cos\theta)$ is the two-dimensional Lebesgue measure. Basically, if $B$ is in $Y$’s sigma-algebra, $(g(t(\lambda(t)\times\lambda(\theta))))(B)=(\lambda(x)\times\lambda(y))(B)$.

I think it’s sufficient to show this for measurable rectangles as the rest would follow from the Monotone Class Theorem, but I don’t know how to proceed.

So far I have

$$ (g(t(\lambda(t)\times\lambda(\theta))))(B)=(t(\lambda(t)\times\lambda(\theta)))\left(g^{-1}(B)\right)=\int_0^r\int_0^{2\theta}t\chi_{_{g^{-1}(B)}}(t,\theta)\,\mathrm{d}\lambda(\theta)\,\mathrm{d}\lambda(t)=\int_0^r\int_0^{2\theta}t\chi_{_B}(t\sin(\theta),t\cos(\theta))\,\mathrm{d}\lambda(\theta)\,\mathrm{d}\lambda(t)=\int_0^r\int_0^{2\theta}\frac{1}{t}\left(x^2+y^2\right)\chi_{_B}(x,y)\,\mathrm{d}\lambda(\theta)\,\mathrm{d}\lambda(t).$$

$\endgroup$
5
  • 1
    $\begingroup$ See here for the change of variables in $\Bbb{R}^n$. $\endgroup$ Commented Oct 25 at 16:00
  • $\begingroup$ @peek-a-boo Do you think there might be a way to get this special two-dimensional case without Radon-Nikodym? I haven't seen this theorem yet. I'm working my way through this book and it only comes later. The change of variables formula however is used in the proof to show the volume of the unit ball in an earlier chapter (page 141) so I wanted to justify it using the results I've seen so far. In any case thank you for the relevant link. $\endgroup$ Commented Oct 25 at 18:20
  • $\begingroup$ another type of proof (for the general case) is given in Folland’s real analysis book (section 2.6 IIRC), and this is based on the more typical idea of approximating how the transformation $g$ distorts the measures from the domain and target; it boils down to approximating $g$ near a point $a$ by the derivative $Dg_a$ (i.e $g(a+h)\approx g(a)+ Dg_a(h)$, then using translation-invariance, and that linear maps distort measure by absolute value of determinant). Anyway, it is kind of surprising that Axler wrote an entire book on real analysis without proving the change of variables theorem… $\endgroup$ Commented Oct 25 at 19:58
  • $\begingroup$ … especially since he does prove the very important theorems of Lebesgue differentiation and Radon-Nikodym. Rudin does an excellent job showing the utility of these two theorems in C7 of RCA. Anyway, for now I think you’re better off consulting Folland (elementary but annoying/tedious proof of change of variables). Of course, there are the usual handwavy arguments about drawing inscribed sectors anc comparing them to triangles (but there are several loose ends to be tied up that I think it’s better to just prove the general case precisely (in the spirit of Folland)) $\endgroup$ Commented Oct 25 at 20:00
  • $\begingroup$ @peek-a-boo Thank you for the reference; I'll try going through the proof. "it is kind of surprising that Axler wrote an entire book on real analysis without proving the change of variables theorem" I've searched for relevant keywords and didn't find this result but maybe it comes later and is called something else. $\endgroup$ Commented Oct 26 at 0:12

1 Answer 1

2
$\begingroup$

Besides the Radon-Nikodym route as in my linked answer (which is a mildly weaker version than in Rudin’s RCA), and the tedious estimates version in Folland, there’s another approach (e.g Spivak’s Calculus on Manifolds or Dieudonne’s chapter 16.24), which is a “bootstrapping method”. This means we keep trying to simplify the class of $f$’s or $\phi$’s for which the theorem needs to be proved, so as a result, we only have to prove “by-hand” a small number of easy cases, then we let some abstract reasoning take care of the other generalities.

Change of Variables Theorem.

Let $\Omega\subset\Bbb{R}^n$ be open, $\phi:\Omega\to\Bbb{R}^n$ an injective $C^1$ map with nowhere vanishing Jacobian determinant (so $\phi[\Omega]$ is open by the inverse function theorem). Then, for every Lebesgue-measurable $f:\phi[\Omega]\to [0,\infty]$, we have \begin{align} \int_{\phi[\Omega]}f\,d\lambda&=\int_{\Omega}(f\circ\phi)\cdot |\det D\phi|\,d\lambda. \end{align}

The proof is in several steps, some of which are reductions (obvious or technical), and others are “easy” special cases.

Step 0: Reduction to one-sided inequality.

If the inequality $\geq$ in the change of variables formula holds for all $f$ and $\phi$, then in fact equality holds. This is because the reverse inequality can be obtained by applying this assumed inequality with the function $g=(f\circ \phi)\cdot |\det D\phi|$ playing the role of $f$ and $\psi:=\phi^{-1}$ playing the role of $\phi$.

Step 1: Localization.

Suppose $(U_{\alpha})_{\alpha\in A}$ is an open cover of $\Omega$. We claim that if the theorem holds for each of the restrictions $\phi|_{U_{\alpha}}:U_{\alpha}\to \phi[U_{\alpha}]$, then it is true for $\phi$ as well.

To see this, first reduce to an atmost countable subcover $(U_j)_{j=1}^N$ (with $N$ a positive integer or $\infty$). Now, create disjoint sets: $E_1=U_1$ and $E_j= U_j\setminus(U_1\cup\cdots\cup U_{j-1})$ for $j\geq 2$. Then, $E_j\subset U_j$, they are all (Borel) measurable sets which are pairwise disjoint, and their union is $\Omega$. Since $\phi$ is injective (this is one of the few places this assumption is used), the sets $\phi[E_j]$ are also pairwise disjoint and have union $\phi[\Omega]$. So, \begin{align} &\int_{\phi[\Omega]}f\,d\lambda =\sum_{j=1}^N\int_{\phi[\Omega]}\chi_{\phi[E_j]}f\,d\lambda =\sum_{j=1}^N\int_{\phi[U_j]}\chi_{\phi[E_j]}f\,d\lambda \\\\ &=\sum_{j=1}^N\int_{U_j}\chi_{E_j}\cdot (f\circ\phi)\cdot |\det D\phi|\,d\lambda =\sum_{j=1}^N\int_{\Omega}\chi_{E_j}\cdot (f\circ\phi)\cdot |\det D\phi|\,d\lambda =\int_{\Omega}(f\circ\phi)\cdot |\det D\phi|\,d\lambda. \end{align} The swapping of the series with the integral is justified by monotone convergence. Changing the domains of integration from $\phi[\Omega]$ to $\phi[U_j]$ is because the integrand vanishes outside here (likewise from $U_j$ back to $\Omega$). The key step is in going from the first line to the second where we have applied the theorem to the restriction $\phi|_{U_j}$ and the function $\chi_{\phi[E_j]}f$. This completes the proof of the localization step.

Step 2: Reduction to non-negative $f\in C_c(\phi[\Omega])$.

Suppose first $F$ is non-negative and integrable on $\phi[\Omega]$. By general measure theory arguments, we know there is a sequence $(g_j)_{j=1}^{\infty}$ of continuous compactly supported functions on $\phi[\Omega]$ such that $g_j\to F$ in $L^1$. Consider $f_j=\max(g_j,0)$; then $f_j$ is also continuous, compactly supported, is furthermore non-negative, and we have $|F-f_j|\leq |F-g_j|$. Thus, $f_j\to F$ in $L^1$. By passing to a subsequence, we can assume pointwise convergence a.e, i.e there is a measure-zero set $Z\subset\phi[\Omega]$ such that $f_j\to f$ outside $Z$. Thus, $(f_j\circ\phi)\cdot |\det D\phi|\to (f\circ\phi)\cdot|\det D\phi|$ outside $\phi^{-1}[Z]$ in $\Omega$. But since $\phi^{-1}$ is $C^1$ hence locally Lipschitz, it follows $\phi^{-1}[Z]$ has measure-zero. Thus we have pointwise a.e convergence. Hence, by Fatou’s lemma and our assumption that the theorem holds for functions in $C_c(\phi[\Omega])$, it follows that \begin{align} \int_{\Omega}(F\circ\phi)\cdot|\det D\phi|\,d\lambda&\leq \liminf_{j\to\infty}\int_{\Omega}(f_j\circ\phi)\cdot |\det D\phi|\,d\lambda=\liminf_{j\to\infty}\int_{\phi[\Omega]}f_j\,d\lambda=\int_{\phi[\Omega]}F\,d\lambda. \end{align} Hence, the one-sided inequality holds for integrable non-negative $F$. For a general $f$ (i.e not necessarily integrable) consider the various truncations $f_{M,N}= f\cdot \chi_{\{f\leq M\}}\cdot \chi_{\phi[\Omega]\cap B_N(0)}$, i.e we have truncated the height of the function and also the support (this is $\sigma$-finiteness of Lebesgue measure in disguise). Then, each $f_{M,N}$ is non-negative and integrable, so the above inequality holds for each of these guys. Now, we can let $M,N\to\infty$ and use the monotone convergence theorem on both sides to deduce that the above inequality holds for $f$ as well: \begin{align} \int_{\Omega}(f\circ\phi)\cdot |\det D\phi|\,d\lambda&\leq \int_{\phi[\Omega]}f\,d\lambda. \end{align} Since this is true for all $f$ and $\phi$, step 0 now tells us we actually have equality for all $f$ and $\phi$.

Step 3: If the theorem is true for $\phi:U\to V$ and $\psi:V\to W$, then it is true for $\psi\circ\phi$

This is obvious (one must use chain rule at one stage of the calculation, and multiplicativity of determinants).

Step 4: The theorem is true if $\phi$ is affine.

If $\phi(x)=T(x)+a$ with $T$ linear, and $E\subset \phi[\Omega]$ is an arbitrary Lebesgue measurable set, then since $|\det D\phi|=|\det T|$, the claim for $f=\chi_E$ follows from the well-known and important linear case. See for instance this answer of mine. Thus, by considering non-negative linear combinations, the formula holds for all non-negative simple functions, and by monotone convergence, for all Lebesgue-measurable $f:\phi[\Omega]\to [0,\infty]$.

Step 5: The theorem is true for $n=1$.

By step 2, it suffices to consider non-negative continuous $f$. For each $x\in\Omega$, since $\phi’(x)\neq 0$, there is a $\delta>0$ such that $\phi’$ maintains a constant sign on $(x-2\delta,x+2\delta)$. We may WLOG compose with a reflection (step 4) and assume this sign is positive. Let $U_x= (x-\delta,x+\delta)$. Then, $\phi$ is strictly increasing and continuous, so the image set is the open interval $(\phi(x-\delta), \phi(x+\delta))$. Now, the theorem follows on $U_x$ from the usual 1D “$u$-substitution”, which is just a combination of the chain rule plus the FTC (this is easy since $f$ is continuous). So, we found an open cover $(U_x)_{x\in\Omega}$ on which the theorem holds, hence by step 1, it holds for $\Omega$ directly.

Step 6. The theorem is true if $\phi(x)=(\alpha(x), x_2,\dots, x_n)$ with $\frac{\partial \alpha}{\partial x_1}$ never $0$.

For each $\xi\in\Bbb{R}^{n-1}$, let $\Omega_{\xi}=\{t\in\Bbb{R}\,: (t,\xi)\in \Omega\}$; this is an open subset of $\Bbb{R}$, possibly empty. If not empty, then $t\mapsto \alpha(t,\xi)$ maps this onto an open set $V_{\xi}\subset\Bbb{R}$ bijectively and has nowhere-vanishing derivative (the image is open by the IFT, and this map is injective since $\phi$ is). So, by the 1D case (step 5) we have \begin{align} \int_{V_{\xi}}f(\tau,\xi)\,d\tau&=\int_{\Omega_{\xi}}f(\alpha(t,\xi),\xi)\cdot \left|\frac{\partial\alpha}{\partial x_1}(t,\xi)\right|\,dt=\int_{\Omega_{\xi}}(f\circ\phi)(t,\xi)\cdot|\det D\phi(t,\xi)|\,dt. \end{align} Now, integrate over all possible $\xi$’s, and use Fubini to deduce the theorem holds for this type of $\phi$.

Step 7: Finishing up.

Suppose for every $x\in \Omega$, there exists an open neighbourhood $U$ of $x$ in $\Omega$, and a finite number of $C^1$ diffeomorphisms $\Phi_1,\dots, \Phi_p$ either affine (i.e step 4) or of the type considered in step 6, such that we have the factorization $\phi|_U=\Phi_1\circ\cdots\circ\Phi_p$. Then, by step 3, the theorem will be true for $\phi|_U$, and since the collection of all such $U$’s form an open cover of $\Omega$, step 1 implies that the claim is true for $\phi$, hence the theorem will be completed in full generality.

This is now a standard exercise in multivariable calculus. Fix $x\in\Omega$. Actually, by composing with suitable affine bijections on the domain and target, we may suppose WLOG that $x=0$, that $\phi(0)=0$, and $D\phi(0)=\mathrm{id}_{\Bbb{R}^n}$. Now, for each $1\leq j\leq n$, define $\psi_j:\Omega\to\Bbb{R}^n$ as \begin{align} \psi_j(x)&=(\phi_1(x),\dots, \phi_j(x), x_{j+1},\dots, x_n). \end{align} This is $C^1$, and $D\psi_j(0)=\mathrm{id}_{\Bbb{R}^n}$ as well. So, by the inverse function theorem, there is a neighbourhood $U_j$ on which this is a $C^1$ diffeomorphism onto its image. Let $U=U_1\cap\cdots\cap U_n$ (so on this smaller domain they’re all diffeomorphisms). Observe that by definition, \begin{align} \phi&=\psi_n=(\psi_n\circ \psi_{n-1}^{-1})\circ\cdots (\psi_2\circ\psi_1^{-1})\circ \psi_1. \end{align}

  • Since $\det D\psi_1(x)=\frac{\partial \phi_1}{\partial x_1}(x)$, and the former is non-zero, it follows that $\psi_1$ is the type of map considered in step 6.
  • For $2\leq j\leq n$, observe that each $\psi_j\circ \psi_{j-1}^{-1}$ will be of the form $x\mapsto (x_1,\dots, x_{j-1},\alpha_j(x),x_{j+1},\dots, x_n)$. The Jacobian determinant is $\frac{\partial \alpha_j}{\partial x_j}$ and this is never $0$ since $\psi_j\circ\psi_{j-1}^{-1}$ is a diffeomorphism. Now, if we simply permute the coordinates in the domain and the target, we can get the $\alpha_j$ into the first slot, and ensure it’s the derivative with respect to $x_1$ which is non-zero.

Thus, $\phi|_U$ is a finite composition of either affine bijections, or maps of the type considered in step 6. This completes the proof of this little lemma, and hence of the entire theorem.


Summary.

Steps 1 and 2, while they seem long to write out, are actually “obvious” measure-theoretical lemmas. We’re just saying that if the theorem holds for a broad enough collection of $f$ and open sets $U\subset\Omega$, then it’s true in full generality. In Spivak’s Calculus on Manifolds, he uses a partition of unity in step 1; but the only reason he needs to is because he doesn’t develop Lebesgue theory so he can’t deal with the sets $E_j$ (which aren’t necessarily Jordan measurable, either because they’re unbounded or have boundaries with positive measure).

Steps 3 and 5 are obvious, while step 4 (the affine or linear case) is sometimes claimed to be easy though it actually isn’t (see the link).

With the help of Fubini, step 6 is obvious, but the idea of considering this type of transformation is perhaps not so obvious. Step 7 is definitely not obvious, but the point is to inductively go up in dimensions by considering changes of variables which affect only one coordinate at a time (or are affine).

So, I would say step 4 is the most important, then step 7 is the most ingenious, while steps 3,5 are obvious, and steps 0,1,2,6 are “routine”. Also, notice how it is only in steps 1 and 5 that the injectivity of $\phi$ is used (very crucially, to avoid overcounting a set). But while some individual steps may be easy, I think this proof is a wonderful example of reducing to simpler cases (“divide and conquer”) by considering various possible $f,\phi,\Omega$’s. The drawback of this proof is that it’s not clear how to generalize to rougher $\phi$, whereas with the Radon-Nikodym approach, this is slightly better.

$\endgroup$
4
  • $\begingroup$ Thank you so much for the help and taking the time to write this out. I actually went through Folland's proof as suggested but will try to go through the details (especially step 7) of your answer as well. The general strategy as laid out in the summary is very clear though (also the truncation in step 2 and subsequent use of the monotone convergence theorem is original to me). $\endgroup$ Commented Oct 28 at 23:37
  • $\begingroup$ @user1591353 btw, in Folland, depending on the edition/printing, there is an error in the proof, so you should check out the errata on his website online (the mistake was not considering such truncations). In other words, truncations+ MCT is actually a very standard technical trick to try to reduce to the case of bounded functions with finite measure supports. $\endgroup$ Commented Oct 29 at 1:08
  • $\begingroup$ oh I guess one final comment I’ll make is that if you observe Folland’s proof of the change of variables theorem, the meat is in proving (a one-sided inequality) when the set is a cube (i.e which is what your original question is asking about). The rest is just the usual measure-theoretic arguments to get to Borel/Lebesgue-measurable sets and subsequently Borel/Lebesgue-measurable $f$. $\endgroup$ Commented Oct 29 at 3:40
  • $\begingroup$ "in Folland, depending on the edition/printing, there is an error in the proof" I think I have a printing with a mistake where the dominated convergence theorem is used where it isn't justified. Thanks for pointing that out. "the meat is in proving (a one-sided inequality) when the set is a cube" Yes; the second part of the proof isn't difficult. I just had to go over some multivariable notions first for the beginning of it. $\endgroup$ Commented Nov 1 at 15:37

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.