Let $(X, \mathcal{F}, \mu)$ be a measure space. If $f: X \to [0, +\infty)$ is non-negative and measurable, then $$ \int_X f(x) d\mu(x) =\int_0^\infty \mu(\{ x \in X: f(x)\geq t\})dt $$ It is not very clear to me how this intuitively makes sense and why it is called the "layer cake representation". In particular, the right hand side seems to be "summing up" the length segment of all of the level sets of $f$, whereas the left hand side is the area under the function $f$. Why are they equivalent intuitively?
Is this somehow related to how the abstract integration is defined on measure spaces: Partitioning the range instead of partitioning the domain? I can not seem to relate these concepts tightly together into a good enough intuition.
*Update:
After some pondering and rethinking about the comment of @BrianTung, here are my thoughts on my question:
To understand the equality
$$ \int_X f(x) \,d\mu(x) = \int_0 ^\infty \mu(\{ x \in X: f(x) > t \}) \,dt $$
intuitively, we might as well just situate us in the case where $\mu$ is the Lebesgue measure. Then indeed the left hand side is exactly the area under the graph of $f$ essentially by immediate definition. The expression on the right hand side of the equality like @Brain Tung commented $$ \int_0 ^\infty \mu(\{ x \in X: f(x) > t \}) \,dt $$ can be intuitively and almost naively understood as treating $\mu(\{ x \in X: f(x) > t \})$ as the length of a rectangle and $dt$ as the width of the rectangle and the integral sign as a sum over these partitioning rectangles. (This naive interpretation actually doesn't correspond to any known integration theory that I know of... It seems to be just purely reading off notations while using a Riemann integral type interpretation). The picture below can be used to describe this procedure:
As both sides of the equality are the area under the graph, the equality indeed should hold.
*Questions Remaining:
Note that the right hand side is another Lebesgue integral: $\int_0 ^\infty \mu(\{ x \in X: f(x) > t \}) \,dt$. In particular, this right hand side is "intuitively" the area under the function $g(t) := \mu(\{ x \in X: f(x) > t \})$ on the domain $t \in [0, +\infty)$. However, what exactly is this geometrically speaking? I can not seem to even begin to graph this? Is this related to the above picture somehow?
*Some Comments on General Lebesgue Integration Intuition:
We should note that the first picture above is one of those interpretations for Lebesgue integrals when people say Lebesgue integral "partitions" range, but we must be cautious in relating this claim with the above picture.
I believe this is not a direct interpretation of Lebesgue integral. A similar concern about this picture has been raised here:How is Lebesgue integration "partitioning the range"? In fact, the answer with the highest vote has exactly what I think Lebesgue integral is and I will summarize it here:
A classical result says: Every Lebesgue measurable nonnegative function can be upward approximated through simple functions of the form that partitions the range of the function into intervals. (One should check the proof of this lemma to convince themselves this is true: Partition the range of the functions as intervals, take the preimages of these partition pieces and form a partition (not necessarily intervals) of the domain. Noting how the range partition and the domain partitions forms bijective relation. In particular, we can match each domain partition with a number in the range partition pieces. Sum them up and we get a simple function. Some technical issues can be taken care of by taking the cutoff of the function and only construct the simple function within the cutoff. Repeat the construction with finer and finer partition of the range and we get the pointwise increasing approximation of simple functions.)
Simple function integrals are defined by summing up the product of the function value and it's corresponding range pre-image. This quite literally computes the area under the graph of the simple functions.
The Monotone Convergence Theorem tells us we can approximate Lebesgue integral of a nonnegative Lebesgue measurable functions using these kind of upward approximating simple functions. Thus with some sacrifice of the Lebesgue integral "accuracy", we can actually approximate the Lebesgue integral of a nonnegative Lebesgue measurable function by simple function integral. A more accurate picture of thinking about Lebesgue integrals can be found below:
In fact, this is why intuitively Lebesgue integrals are set up to represent the area under the graph of the nonnegative Lebesgue measurable function $f$.
Now as it doesn't matter how we partition and sum up the total area, instead of evaluating the area using the second picture above, we can actually evaluate the area using the first picture.
*Update 2: After some discussions, I have gained some intuitive understanding of the integral on the right hand side. While not geometrically meaningful, but we can still interpret the integral as the following: We are essentially using the function $g(t) = \mu(\{ x \in X: f(x) \geq t \})$ to build up the integral of $\int_X f(x) \,d\mu(x)$. That is to say, we wish to integrate the function $g$ from $0$ to $+\infty$ to get the area of the function $f$. Now note $g(t)$ (height of $g$ at $t$) is intuitively the measure/amount of points $x$ such that $f(x) \geq t$. Now the simple function integral approximate $\int_0 ^\infty g(t) \,dt$ would sum up the product of the g(t) with the measure of the preimage of $g(t)$. That is, we will sum up the product of the amount of points $x$ such that $f(x) \geq t$ with a certain weight given by the preimage of $g(t)$ (again a measure of how many points are in a given interval in the range of $g(t)$). In particular, we get a weighted sum of the amount of points $x$ such that $f(x) \geq t$ to approximate $\int_0 ^\infty g(t) \,dt$. This makes intuitive sense: Points $x$ such that $f(x)$ is bigger will get double counted many times while points $x$ such that $f(x)$ is small might only be counted once or twice. Indeed, every time we count the points $x$ with $f(x)$ small, we also count the points $x$ with $f(x)$ large. This means points $x$ with higher function value of $f(x)$ will contribute (weight-adjusted contribution) to the integral of $\int_X f(x) \,d\mu(x)$ (Left hand side of the equality) more than the points $x$ with lower function value of $f(x)$! This, in fact, explains the name "Layer Cake Representation": For every $t \in [0, +\infty)$ (more precisely, for every piece of partition in $[0, +\infty)$ under the interval partition of the range of $g$), we butter the base ($x$-axis) with one layer of cream (measure/amount of points $x$ such that $f(x)$ is above $t$) according to the height of the function $f$. The places on the base (domain) that have higher value $f$ will be buttered more times than the other places. Therefore, forming a layer cake representation of the area under the graph of $f$ after allowing $t \to \infty$.

