R for loop for calculating sums based on a data frame's different columns

Question

My current data frame looks like this:

# Create sample data my_df <- data.frame(seq(1, 100), rep(c("ind_1", "", "", ""), times = 25), rep(c("", "ind_2", "", ""), times = 25), rep(c("", "", "ind_3", ""), times = 25), rep(c("", "", "", "ind_4"), times = 25)) # Rename columns names(my_df)[names(my_df)=="seq.1..100."] <- "value" names(my_df)[names(my_df)=="rep.c..ind_1................times...25."] <- "ind_1" names(my_df)[names(my_df)=="rep.c......ind_2............times...25."] <- "ind_2" names(my_df)[names(my_df)=="rep.c..........ind_3........times...25."] <- "ind_3" names(my_df)[names(my_df)=="rep.c..............ind_4....times...25."] <- "ind_4" # Replace empty elements with NA my_df[my_df==''] = NA

What I want to script is a rather simple for loop that calculates the sum of the value column for each of the four ind_*columns and prints the result.

So far my very meagre attempt has been:

# Create a vector with all individuals individuals <- c("ind_1", "ind_2", "ind_3", "ind_4") # Calculate aggregates for each individual for (i in individuals){ ind <- 1 sum_i <- aggregate(value~ind_1, data = my_df, sum) print(paste("Individual", i, "possesses an aggregated value of", sum_i$value)) ind <- ind + 1 }

As you can see, I currently struggle to include the correct command to calculate the sum based on one column after another as the current output, naturally, only calculates the result of ind_1. What needs to be changed in the aggregatecommand to achieve the desired result (I'm a total beginner but thought of using indices for proceeding from one column to another?)?

first hint: try colnames(my_df) <- c("value", "ind_1", "ind_2", "ind_3" ,"ind_4") — Roman
– Roman, Commented Oct 23, 2017 at 9:51

lactea · Accepted Answer · 2017-10-23 10:00:24Z

4

Assuming you´d want to calculate the sum if ind-column matches an expression in your individuals-vector:

individuals <- c("ind_1", "ind_2", "ind_3", "ind_4") for (i in 1:(ncol(my_df)-1)){ print(sum(my_df$value[which(my_df[,individuals[i]] == individuals[i])])) }

Why do you want to use print() instead of storing the results in a separate vector?

answered Oct 23, 2017 at 10:00

lactea

12110 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

T. K. Over a year ago

I preferred printing for no particular reason. What would be needed to store them as a seperate vector inside the loop?

lactea Over a year ago

There are two options. You can create a vector beforehand and fill with respective data using your ias index or, you can cbind() while going through the loop. The latter option is significantly slower, however it does not matter for short vectors.

Roman · Accepted Answer · 2017-10-23 10:56:45Z

You can try tidyverse as well:

my_df %>% gather(key, Inds, -value) %>% filter(!is.na(Inds)) %>% group_by(key) %>% summarise(Sum=sum(value)) # A tibble: 4 x 2 key Sum <chr> <int> 1 ind_1 1225 2 ind_2 1250 3 ind_3 1275 4 ind_4 1300

Idea is to make the data long using gather. Filter the NAs out, then group by Inds and summarize the values.

A more base R solution would be:

library(reshape2) my_df_long <- melt(my_df, id.vars = "value",value.name = "ID") aggregate(value ~ ID, my_df_long, sum, na.rm= T) ID value 1 ind_1 1225 2 ind_2 1250 3 ind_3 1275 4 ind_4 1300

The second solution taught that there was no need for a loop at all, thank you for pointing me to the respective package.

Collectives™ on Stack Overflow

R for loop for calculating sums based on a data frame's different columns

2 Answers 2

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Related