To begin, it’s always worth following the guidance of your advisor, since there may be a theoretical rationale tied to your broader research design. That said, I understand your uncertainty here. Using a more distant pre-period such as $t = -3$ as the omitted category is less common (and may seem awkward), though there may be situations where it can be justified. Let me address your considerations, in turn.
Excluding a distant period and then discarding the intervening time periods is generally not advisable. This creates a somewhat artificial baseline that complicates interpretation of the dynamic treatment effects. The more standard practice is to drop the period immediately prior to treatment (i.e., $t = -1$). The reason is that all coefficients in the event-study regression are then interpreted relative to that “last untreated” period. This aligns naturally with the idea of testing for pre-trends in the periods leading up to treatment and then tracing out post-treatment dynamics.
To give a concrete example: suppose we have county-year data, and a policy is announced one year before implementation. If we believe anticipation effects may operate in the announcement year, then it may be defensible to omit a more distant year (say, $t = −3$); you may be specifically interested in the period immediately preceding the first exposure year. Conceptually, the announcement year often functions as a partially treated period, so it can be important to estimate and assess the extent of that potential contamination directly. But absent such theory-driven concerns, most applied papers follow the convention of setting $t = −1$ as the base category. Indeed, many software packages (e.g., eventstudyinteract, twowayfeweights, csdid, etc.) will default to omitting $t = 1$ when producing event-study plots.
In short: there is nothing inherently “wrong” with choosing a distant pre-period if you have a clear theoretical justification. In fact, the authors of this working paper use event time $-3$ quite nicely (see, e.g., Figure 2, p. 22). But in the absence of such, the standard and most transparent choice is to omit $t = -1$. You don’t need to trace out coefficients for every single pre- and post-period either, you can restrict attention to a symmetric window around the event. Ultimately, what matters is that the choice is theoretically grounded and also clearly explained to your reader.
For examples, see my earlier discussion here where I review how several applied papers plot their event study coefficients.