1

I have a potentially very stupid question, but can't seem to find a solution easily. And i'm pretty new to R, so please forgive my ignorance.

I'm looking for a way to loop through all variables in my dataframe. For instance, to make two-way tables of all variables compared to one specific variable (say, Sex or Educational level). I used to work with Stata, but since R is free, I am now supposed to work with R (I heard there are a plethora of other benefits to working with R as well, so I am very willing to learn :)).

Say, I have 20 variables, of which 15 are answers from a survey and 5 are demographic variables. I would like to see how different answers compare to differences in demographics.

Normally I would tackle the problem above in Stata with something simple as:

for i = 1 to 5 { for j = 1 to 3 { tab Sex Var`i'_`j', chi2 } } 

making 15 tables, for the variables Var1_1 to Var5_3 vs Sex, and giving a Pearson chi2 statistic.

So, I tried what I thought was the same for R:

for (i in 1:5) { for (j in 1:3){ print(table(chisq.test(paste(df$Sex, "df$Var",i,"_",j,sep="")))) } } 

but this doesn't work.

Can anyone please point me in the right direction as how to solve this? Any help is highly appreciated!

3
  • You can use summary(df) or lapply(df, table), where the first will give you a summary of the data.frame where numerical variables are summarized with min, max, mean, median and categorical (factor) variables with a table. The second gives you a list of tables of your variables. Commented Oct 2, 2019 at 8:59
  • 2
    You really need to study help("$). It explains when you can use $ and when to use [] and [[]] instead. In general, approaches that work well in one language do not necessarily transfer well to another language. This is such a case. Commented Oct 2, 2019 at 9:22
  • Thanks, I'll read up on that and try again. I also edited my question a bit since my example seems poorly chosen (considering how the first comment answers how to achieve similar results via another way) Commented Oct 2, 2019 at 9:40

1 Answer 1

1

Let's pretend that df is your data and first 15 columns are answers. In this case you can use this

lapply(df[,1:15], function(x) {chisq.test(x, df$Sex)}) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.