Whether or not a threshold of 97+% means the tool misses a lot of true positives is irrelevant. It makes no sense to forbid the use of a tool just because it misses a lot of true positives. Lateral flow tests (LFTs) for COVID-19 can miss 20-80% of true positive cases; that just means we can't (and don't) rely solely on LFTs. It doesn't mean we should ban LFTs.
Additionally, this metric ─ users who posted >2 answers in a given week, and were suspended within three weeks ─ seems suspiciously precise. Why is >2 answers the cutoff? Why is 1 week the period in which those answers were written, and why is 3 weeks the period in which they were suspended? It smells of cherry picking-picking to me. How robust is this finding to changes in the metric?