Analysing counting data on R

Question

This is a follow up to a previous question where I explained that I have a set of data of ~2000 people with repeated measurements over multiple years between 2000-2022 (some people have data for the full time period whereas others only for a subset of these years). Within a single year, each person can only fall into one of four groups: 0, 1, 2, or 3. After my previous question, I am now able to count the number of times that each person changes groupings within their sampling period using this code:

df %>% count(ID, wt = diff(CultGroup) != 0)

This is a subset of the data for the first 20 people sampled:

structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 9, 9, 9, 9, 9, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20), CultGroup = c(1, 1, 1, 1, 1, 1, 3, 3, 3, 1, 3, 3, 0, 1, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 1, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 0, 0, 0, 0, 0, 3, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 2, 2, 3, 3, 3, 3, 3, 3, 1, 0, 0, 3, 0, 3, 3, 2, 2, 3, 2, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 0, 0, 0, 0, 0, 1, 1), Year = c(2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2001, 2002, 2002, 2003, 2004, 2009, 2010, 2011, 2009, 2010, 2011, 2012, 2013, 2020, 2021, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2006, 2007, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2022, 2009, 2011, 2012, 2013, 2014, 2015, 2017, 2018, 2019, 2020, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2011, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2016, 2017, 2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2010, 2011, 2012, 2013, 2013, 2014, 2015)), row.names = c(NA, -170L), class = c("tbl_df", "tbl", "data.frame"))

However, now I want to know more about the nature of these changes. I would like to know if the changes for each person are more often from one group to another e.g. 1 to 2 or if there is a lot more back and forth changes e.g. from group 1 to 2 and back to 1 again etc. Is there a best way to plot this or visualise the changes in groupings for each person? And are there any stats that would be advisable to quantify the nature of these changes?

Thanks!

This looks... complicated and better suited for cross validated. Context is important, the causes aren't in the data. — user2974951
– user2974951, Commented May 18, 2022 at 10:06
Hi! Thank you for your comment. I am new to this sort of analysis, what do you mean by cross-validated? The groupings refer to different bacterial organisms that were cultured by a microbiology lab within that year. However, the causes for the changes are not known, so I am attempting to dig into the nature of the changes so we can know more about them but we do not know the causes. — Micaela Mossop
– Micaela Mossop, Commented May 18, 2022 at 10:38

Wimpel · Accepted Answer · 2022-05-18 11:07:47Z

2

simple visualisation option

library(tidyverse) ggplot(data = mydata, aes( x = Year, y = CultGroup)) + geom_col() + facet_wrap(~ID, ncol = 5)

answered May 18, 2022 at 11:07

Wimpel

27.9k1 gold badge25 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2974951 Over a year ago

So what does this visual tell you? What is the interpretation? How did you treat the repeated measurements?

Wimpel Over a year ago

@user2974951 there are no repeated measurements.. only one measurement each year... The bas just indicates in what CultGroup the ID is positioned for that your. This way, you can visually see changes back-and-forth betwreen CultGroups. No more, no less.. So ID == 11 staus in tbe same group (1) in all years, while ID == 1 switches a couple of times between group 1 and 3.

Collectives™ on Stack Overflow

Analysing counting data on R

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related