Which statistical test would you recommend for comparing two interventions in a crossover study with repeated HRV measurements?

Question

I’m conducting a randomized crossover trial with 16 participants, where each subject receives two interventions (sub-occipital muscle inhibition and deep breathing). For each intervention, heart rate variability (HRV) metrics (e.g., RMSSD and HF) are recorded before and after the intervention.

I’m aiming to determine whether one intervention leads to greater parasympathetic activation than the other, based on these HRV measures.

The design involves:

Repeated measures (pre and post)

Two conditions per participant

A small sample size (n = 16)

A few potential covariates (e.g., stress level, respiratory rate)

What statistical approach would you recommend for analyzing this kind of data? Would you use a method that compares pre/post differences (deltas), or would you suggest a model that incorporates all measurements directly? I'm particularly interested in approaches that account for within-subject variability and repeated measures.

Can you post a sample of your data, even if it is made up, so we can better understand the structure? — Demetri Pananos
– Demetri Pananos, Commented Apr 18 at 13:39
Are stress level and respiratory rate between or within participant? That is, does a participant have one of those only, or one every time heart rate is measured? — Jeremy Miles
– Jeremy Miles, Commented Apr 18 at 16:04
@JeremyMiles Thanks for the question! I don’t directly measure stress level—it’s more of a potential uncontrolled confounding variable that might influence HRV responses. However, respiratory rate is measured indirectly via the HRV data using Kubios software, and it's available at each HRV timepoint (i.e., before and after each intervention). So it would be considered a within-subject variable that could vary across conditions and time. — AB108
– AB108, Commented Apr 19 at 9:49
@DemetriPananos Participant Condition Time RMSSD HF 1 SI Pre 32.5 450 1 SI Post 42.1 520 1 DB Pre 33.0 460 1 DB Post 39 495 2 DB Pre 35.2 480 2 DB Post 43.5 550 2 SI Pre 31 445 — AB108
– AB108, Commented Apr 19 at 10:00

User1865345 · Accepted Answer · 2025-04-19 21:37:12Z

The following answer assumes that the HRV outcomes don't need to be explicitly combined in a model (for ex. with a manova).

Regarding the first question: whether to use a method that compare pre/post differences?

The classical two-steps approach to crossover design analysis with 2 repeated measures per participant suggests doing this comparison as a first step. This gives you a within-subject effect estimate, but does not control for a period effect (for ex. effect of getting used to the experiment).

You then take the means of these differences per treatment branch (2 branches in your design).

In the second step of the approach, you take the difference of the two resulting means (for ex. muscle inhibition -> deep breathing average minus deep breathing -> muscle inhibition average, if you are comparing deep breathing to muscle inhibition).

This second step removes the eventual additive period effect.

In practice, I would do the first step with manual calculations, and use statistical software to do an independent t-test.

Here is an example of the two-steps approach in R:

# First step: difference within subjects crossover_patient_split <- split(crossover_data, crossover_data$PatientID) patient_diff_df <- do.call("rbind", lapply(crossover_patient_split, FUN=function(x) { data.frame( period_diff=(x$X[x$Period == 1] - x$X[x$Period == 2]), PatientID=x$PatientID[1], Sequence=x$Sequence[1] # Seq. 1 is A -> B, Seq. 2 is B -> A ) }) ) # Second step: t-test on the difference between sequences t.test( period_diff ~ Sequence, data=patient_diff_df, var.equal=TRUE)

In the paper refered below, they recommend using a Wilcoxon rank-sum test if you suspect non-normality of the within-subject differences. In small sample with paired differences in continuous outcomes, this non-normality easily pops up due to outliers.

Regarding the second question: would you suggest a model that incorporates all measurements directly?

In your current experimental design, the big advantage of specifying a model is that it lets you add time-varying covariates such as the respiratory rate.

This approach is also more direct and clear in my opinion: you directly include controls for the subjects and the time of the measurements. On top of this, you get a treatment difference estimate.

Here is a R linear regression model excerpt that yields the same t-statistic as the two-steps approach above:

fit1 <- lm(X ~ Treatment + factor(PatientID) + Period, data=crossover_data) summary(fit1)

The Treatment could be muscle inhibition for ex. Notice that I now simply specify that I want to control for the subject and period effects in the formula. The treatment branch is not explicitly given, but it's used to identify the Period effect.

Here is a quick peek in the underlying table structure:

Reference: On the proper use of the crossover design in clinical trials

User1865345 · Accepted Answer · 2025-04-19 21:39:31Z

I will start first with some comments/questions about your experimental design.

You say you are recording HRV metrics, and you mention 2 of them (RMSSD and HF). But how many are you really recording? This paper lists no less than 26 of them... The issue here is that of multiple comparisons; even if you are looking at just 5-6 such metrics, the use of multiple comparison corrections (MCC) will severaly impact your power, which is already challenged due to the sampe number of subjects (16).
You also state that you have "a few potential covariates:, and mention 2 (stress level, respiratory rate). Do you intend to use them, and if so how many of thyem do you have? How about age, sex, etc? This becomes relevant for regressions.
A good approach for your problem could have been multivariate multiple regression (e.g. here or here). But... You have 5-6 DV's, and 4-5 covariates, in addition to your 2 treatments, and the before/after measurements, but you have only 16 subjects. That is much too few; you will be simply overfitting.

So what can you do with the data you have. Given the small number of samples, and the complexity of your data (multiple metrics, multiple covariates), your best shot at getting something useful is to simplify.

So the first thing I would do is ignore the covariates; you would need many more subjects to be able to use them fruitfully.

Then I would do some (simple) exploratory analysis, for all the metrics, for both treatments. Simply compute the (After-Before), and see if the observed effect size is clinically relevant (e.g. if HF was elevated on average by 1%, would this matter clinically? I doubt it...). Note here that I would not run any hypothesis test; just look at the descriptive statistic, and see if any clinician would care about this difference. There is no point in further evaluating any non-clinically relevant metric (even if a paired t-test would have been statistically significant). Say, you end up with 3 metrics which are clinically relevant for Treatment A, and 2 for Treatment B.

Now, I would test just these, say, 5 metrics, with a hypothesis test. This would just be a paired t-test (After-Before). I would use a single-tailed test, because, based on teh specific metric, I would know that an increase in HRV (your topic of interest) would result in an increase, or decrease, in that specific metric. This will give you a bit more power. But you will need to use a MCC here; so with 5 metrics, you will now have $\alpha \approx 0.01$, and that will really hurt your power. But maybe you can be lucky? Say, you end up with 1 metric, where treatment A had a clinically relevant effect, and the test was also statistically significant. Note though that while the observed effect was clinically relevant, and the test was statistically significant, the low bound of the CI may not be clinically relevant! But at least you have something which may be worth pursuing (you observed a meaningfull effect, and it is unlikely to be a fluke).

Now, I would compare A to B for all the clinically relevant metrics. Now we run paired t-tests on the paired differences (i.e. run a 1 sample t-test on $((\textrm{After}_A-\textrm{Before}_A)-(\textrm{After}_B-\textrm{Before}_B))$. You again need to adjust your $\alpha$; but I would do that MCC separately from the earlier one (because earlier you were testing a singke metric for a single treatment, and now you are comparing A to B; different null). But here I would use a 2-tailed test (unless you can argue that, before even seeing the data, A should be a better treatment -e.g.from a previous study). So even more challenged for power.

With a bit of luck, you may find 1 metric, where the observed effect is clinically relevant for A, and also, where A is statistically better than B.

And then, the next step would be to collect much more data (either to confirm the lucky findings, or simply to gain more power).

Stack Exchange Network

Which statistical test would you recommend for comparing two interventions in a crossover study with repeated HRV measurements?

2 Answers 2

Linked

Hot Network Questions

Which statistical test would you recommend for comparing two interventions in a crossover study with repeated HRV measurements?

2 Answers 2

Linked

Related

Hot Network Questions