I read in a paper here that in a two time period differences-in-differences scenario where it claims the DiD estimator is the ATT (Average Treatment on Treated). I am trying to understand why that is. Denote the two time periods by $t^*$ and $t^* - 1$ and define a treatment indicator $D_i$, so that $D_i = 1$ for units that participate in the treatment and $D_i = 0$ for units that do not participate in the treatment.
Next, for $t \in \{t^* - 1, t^*\}$, define $Y_{it}(1)$ to be unit $i$'s treated potential outcome in time period $t$ (this is the outcome that it would experience if it were in the treated group), and define $Y_{it}(0)$ to be unit $i$'s untreated potential outcome in time period $t$ (this is the outcome that it would experience if it were in the untreated group).
The paper states that
$$ \text{ATT} = E[Y_{t^*}(1) - Y_{t^*}(0) \mid D = 1] $$
There is an existing post here which discusses it, but I feel it fails to explain it intuitively.
It seems that in the DiD scenario, there is a treated and a control group at time $t^*$, since one got the treatment, and the other didn't. It therefore seems strange to talk about them being both treated. How can I think about this?