Let's suppose that data is collected for clinics across the state. The clinics are located in different counties, but also some of the clinics are owned by large healthcare systems that are located in different counties. This data doesn't fit the typical design of system nested within a county. The data is collected for the specific clinic. The outcomes are collected for each clinic and the sociodemographic information is available for the county level.
What is of interest is the association between the sociodemographic information on the county level and the outcomes collected at the clinic.
I'm thinking that I could create a new random intercept for each clinic (ClinicID) so that each one can be nested within the county. In the table below there are 8 clusters, nested in 5 counties. But I have not accounted for any characteristics of the health care system. Is there another way I could also account for clustering of the health systems? Would I add another random effect? I am still figuring out the specification for the random variable in glmTMMB, but I think it would be (1 | CountyID) + (1 | ClinicID) because each ClinicID is unique.
Also, where can I find information about population offsets when nesting random variables? I just want to be sure I'm using the right number-- I think it would be for the clinic, not the county.
I'm a R user and relatively new to these multi-level regression. My apologies for such a basic question and thank you in advance for any help!
Edit: I think that I can just add a random intercept also for the SystemID, and then ClinicID is nested in the CountyID: (1 | SystemID) + (1 | CountyID/ClinicID). I read somewhere you can add these things for the pseudoeffect (for example, gender). I'm not totally sure how it relates to offsets. But the problem is that the demographic info is on county level so there's no variation within the county cluster if I do it that way.
There are 15 health systems, 27 counties, 36 clinics. Thousands of observations are available per clinic.
Most counties (19) have just one health system. But 7 counties have two health systems and 1 county has three health systems. On the flip side, ten health systems are just in one county, and there are five health systems that are large, operating in 2,3,5,7, and 9 different counties, respectively.
Example:
| SystemID | CountyID | ClinicID |
|---|---|---|
| A | 1 | A1 |
| A | 2 | A2 |
| A | 3 | A3 |
| A | 4 | A4 |
| B | 1 | B1 |
| B | 2 | B2 |
| C | 5 | C5 |
| C | 4 | C4 |
Side note: Unfortunately, I'm modeling zero-inflated data with glmmTMB and the wrapper mentioned for multiple membership specification of random effects is only for lme4. But also I don't think this is multiple membership because according to the answer here "So to give a definition of multiple membership, I would say this occurs when the lowest level units "belong" to more than one upper-level unit."
In my case, we just have more than one random variable, and each clinic can only belong to one level of each of the two random variables.
New update: I realized that I need covariates for the lowest level or else there won't be variation within the county clusters. So I've been working on collecting that data.