Let's say that we are trying to estimate the effect of a policy whose goal is to increase voter turnout in city-level elections (say, get out and vote campaigns ran by local government). Cities held elections in 2010 et 2020 but only some of them implemented the policy before the 2020 elections.
Whether we can consider the treatment as strictly exogenous is unclear since cities could choose whether or not to implement the policy (but it doesn't seem like the decision was taken based on how high or low turnout was in the past).
Let's say that variables that usually affect voter turnout in city elections are : population size, proportion of the population that is above 65 and average education level.
Which of the following models would make more sense from a statistical point of view :
linear regression estimating the 2020 voter turnout based on the following variables : treatment, 2014 turnout, population size, proportion of the population that is above 65 and average education level
diff and diff regression estimating voter turnout based on the following variables : treatment, year, treatment x year (with year being 0 in 2010 and 1 in 2020).
If the latter makes more sense, how should we include control variables in the diff and diff model ?
Thank you very much !