My questions arised basically from playing around with the STIM "getting_started" notebook, regarding the part on the rotated surface code.
There 3d syndrome extraction rounds are performed and decoded to estimate the logical error rate. In particular, this includes finding the threshold due to the different scaling of different distances.
Playing around, I changed the number of rounds to 1, expecting the threshold to vanish (as O(d) rounds are necessary to have confidence in the syndromes). But it did not, it even improved to >1%.
I suspect this is due to the "data qubit measurement" round, which is performed at the end (is this right?). As a result the plot with the correct scaling is not an argument for correct syndrome extraction (which it is not) but for the FT of the syndrome extraction circuit (the circuit is FT but the syndrome can not be trusted).
But now my question is: How would I set up an experiment to check a "valid" syndrome extraction in STIM? Meaning, an experiment that shows that actual d (or less) rounds are necessary.
I can't drop the final data qubit measurement, because then a single error after the last CX of the syndrome extraction could result in an observable flip without being detected. One could avoid placing such errors, but this seems not very elegant to me.

