Advice on optimal random-effects structure for not fully crossed repeated-measures design?

Question

I have a study that is designed such that it seems to straddle a within- and between-subject design. I posted about this previously here, but at that stage I was hoping for a simple ANOVA-like solution. I am beginning to think one does not exist and have turned to the multilevel modelling literature instead. If I find a solution here, I will post an answer to the original question as well.

To recap, my study is interested in attitudes toward the homeless and in particular whether ethnicity interacts with homelessness to increase stigma. To study this, participants read vignettes about someone and make ratings related to stigma. My independent variables are whether the character in the vignette has a home or not (homed, homeless) and ethnicity (white, black). In a perfect world, everyone would receive all four resulting conditions (homed-white, homed-black, homeless-white, homeless-black). However, due to time, each participant can receive only two. As such, each participant receives one of the following:

homeless-black / homed-white homeless-white / homed-black

(the order of the two vignettes counterbalanced for each but that isn't important here – likewise, due to design constraints, we could use only a single name for each black / white character and wanted to use only 1 vignette for each homed and homeless condition)

This looks like a 2 x 2 repeated measures design in that each participant receives both levels of homelessness and both levels of ethnicity, but it isn't quite because they don't receive all four conditions, only two.

My question is, were we to fit this as a multilevel model, would it be appropriate to use the following random-effects structure (for convenience I use lme4 like notation):

stigma ~ home + ethnicity + home:ethnicity + (home + ethnicity | subject)

This would include a random intercept for each participant, as well as random slopes for the main effects, but exclude the random slope for the interaction as we lack sufficient information to explore within-subject variation in that. Am I correct that this the maximal structure permitted by the data?

Open to any other advice, also, or alternate proposals for analytic approaches!

This was asked in the previous thread by @Sointu, and is worth asking again. Have you already collected this data or do you still have time to alter the design? The comments and answers in the other thread all seem to converge on the idea that this design is not particularly amenable to your stated research questions. You made it very difficult to de-confound race and homed/homeless in this design and it's not clear that they can be disentangled. — Erik Ruzek
– Erik Ruzek, Commented Feb 3, 2024 at 21:04
@ErikRuzek, unfortunately the design is locked in at this time and not something I have the power to change due to a few constraints. But this is also only a pilot for now and we'll use a different design. Still, we'll have data and need to do something with it. Given these constraints, and accepting that within-participant variability cannot be modelled for the interaction, does the above model sound okay (at this stage, I should ask... the best we can do)? Or is there another way? — nostatisfaction
– nostatisfaction, Commented Feb 4, 2024 at 16:48
You might want to take into account Latino and Asian populations as well in this vignette. B/W might be too restrictive to lend any useful insights. — jbuddy_13
– jbuddy_13, Commented May 10, 2024 at 13:40

kjetil b halvorsen · Accepted Answer · 2024-05-10 13:32:00Z

The question of how to model this data is an important one, but before doing so, I want to show how the structure of the data can provide clues about the true design your study implies. Note that I assume you have only two ethnicities (black/non-black). To start, I will reiterate that the two conditions you showed are indeed between-subjects factors. I will encode these as follows:

Cond == 0 -- homeless-black / homed-white

Cond == 1 -- homeless-white / homed-black

You then have indicators for whether the target is homeless (is_homeless), whether the target is black (is_black), an interaction of these two (homeless_by_black), the subject identifier (subject), and the outcome (stigma). This is how I would set up your data:

subject	is_homeless	is_black	homeless_by_black	condition	stigma
1	1	1	1	0	4.3
1	0	0	0	0	4.8
2	1	0	0	1	3.5
2	1	0	0	1	3.2

Some things that stick out:

As you said, each participant only sees targets that correspond to the condition they were assigned to, which means that no subject sees all conditions.
The interaction term only "turns on" for those in condition==0.
- Accordingly, the interaction is confounded with condition. You could include the interaction term in the model, but it is only giving you the contrast of interest for those in condition==0. It will use information from those in condition==1 but they all have a value of 0 and they never saw a homeless-black target. Not very useful, unfortunately.
Each subject receives both a black and a non-black target and a homed and homeless target for rating.
- These can be considered within-subjects (sometimes called "main effects") factors.

To model this, I would lean toward a minimal model. Depending on a lot of factors (sample size, variability of the outcome, true between subjects variation in the association of interest, etc.), a maximal model is often overkill and doesn't necessarily do what its proponents claim. You can explore it, but know that you are likely run into estimation problems due to overfitting and you will have to pare down the random effect structure. I would run the following model:

m1 <- lmer(stigma ~ is_homeless + is_black + condition + (1 | subject), d=data))

This gives you mean differences in the within-subjects factors of homed/homeless, black/non-black, and the between-subjects factor of condition. The latter, as shown above, is giving you some (minimal) information about the interaction but you cannot conclude that any mean differences are solely due to this. You and others may choose to model the interaction, but I personally think it is not meaningful because it isn't a test of what you really want. Specifically, in the current design, it is not a true within-person comparison of whether viewing a homeless black target is associated with more or less stigma compared to other groups.

Interaction seems like a simpler answer than multi level, good insight. — jbuddy_13
– jbuddy_13, Commented May 10, 2024 at 13:42

Stack Exchange Network

Advice on optimal random-effects structure for not fully crossed repeated-measures design?

1 Answer 1

Linked

Hot Network Questions

Advice on optimal random-effects structure for not fully crossed repeated-measures design?

1 Answer 1

Linked

Related

Hot Network Questions