Shortly after posting this question, I found a paper which explained the signal model for GMT very well. I have attached the link to the paper, if anyone ever needs clarification. End-to-End Moving Target Indication for Airborne Radar Using Deep Deep Learning. According
According to this paper the space-time steering vector for a single moving target, is formed by the combination of both spatial steering vector (say A_s$A_{s}$) and temporal steering vector (the steering vector formed by the dopplerDoppler freq, say A_d$A_{d}$). The The final steering vector = A_s ⊗ A_dis $A_{s} \otimes A_{d}$, where (⊗$\otimes$ represents the outer product). This answers my first question of if I should includeabout including the doppler freqDoppler frequency in the array manifold calculation. And, including the doppler freqDoppler frequency in the array manifold calculation, also answers my second question. Hope this helps. Also, correct me if I am wrong.