1

My current data frame looks like this:

# Create sample data my_df <- data.frame(seq(1, 100), rep(c("ind_1", "", "", ""), times = 25), rep(c("", "ind_2", "", ""), times = 25), rep(c("", "", "ind_3", ""), times = 25), rep(c("", "", "", "ind_4"), times = 25)) # Rename columns names(my_df)[names(my_df)=="seq.1..100."] <- "value" names(my_df)[names(my_df)=="rep.c..ind_1................times...25."] <- "ind_1" names(my_df)[names(my_df)=="rep.c......ind_2............times...25."] <- "ind_2" names(my_df)[names(my_df)=="rep.c..........ind_3........times...25."] <- "ind_3" names(my_df)[names(my_df)=="rep.c..............ind_4....times...25."] <- "ind_4" # Replace empty elements with NA my_df[my_df==''] = NA 

What I want to script is a rather simple for loop that calculates the sum of the value column for each of the four ind_*columns and prints the result.

So far my very meagre attempt has been:

# Create a vector with all individuals individuals <- c("ind_1", "ind_2", "ind_3", "ind_4") # Calculate aggregates for each individual for (i in individuals){ ind <- 1 sum_i <- aggregate(value~ind_1, data = my_df, sum) print(paste("Individual", i, "possesses an aggregated value of", sum_i$value)) ind <- ind + 1 } 

As you can see, I currently struggle to include the correct command to calculate the sum based on one column after another as the current output, naturally, only calculates the result of ind_1. What needs to be changed in the aggregatecommand to achieve the desired result (I'm a total beginner but thought of using indices for proceeding from one column to another?)?

1
  • 1
    first hint: try colnames(my_df) <- c("value", "ind_1", "ind_2", "ind_3" ,"ind_4") Commented Oct 23, 2017 at 9:51

2 Answers 2

4

Assuming you´d want to calculate the sum if ind-column matches an expression in your individuals-vector:

individuals <- c("ind_1", "ind_2", "ind_3", "ind_4") for (i in 1:(ncol(my_df)-1)){ print(sum(my_df$value[which(my_df[,individuals[i]] == individuals[i])])) } 

Why do you want to use print() instead of storing the results in a separate vector?

Sign up to request clarification or add additional context in comments.

2 Comments

I preferred printing for no particular reason. What would be needed to store them as a seperate vector inside the loop?
There are two options. You can create a vector beforehand and fill with respective data using your ias index or, you can cbind() while going through the loop. The latter option is significantly slower, however it does not matter for short vectors.
3

You can try tidyverse as well:

my_df %>% gather(key, Inds, -value) %>% filter(!is.na(Inds)) %>% group_by(key) %>% summarise(Sum=sum(value)) # A tibble: 4 x 2 key Sum <chr> <int> 1 ind_1 1225 2 ind_2 1250 3 ind_3 1275 4 ind_4 1300 

Idea is to make the data long using gather. Filter the NAs out, then group by Inds and summarize the values.

A more base R solution would be:

library(reshape2) my_df_long <- melt(my_df, id.vars = "value",value.name = "ID") aggregate(value ~ ID, my_df_long, sum, na.rm= T) ID value 1 ind_1 1225 2 ind_2 1250 3 ind_3 1275 4 ind_4 1300 

1 Comment

The second solution taught that there was no need for a loop at all, thank you for pointing me to the respective package.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.