How to group by multiple columns in dataframe using R and do aggregate function

Question

I have a dataframe with columns as defined below. I have provided one set of example, similar to this I have many countries with loan amount and gender variables

 country loan_amount gender 1 Austia 175 F 2 Austia 100 F 3 Austia 825 M 4 Austia 175 F 5 Austia 1025 M 6 Austia 225 F

Here I need to group by countries and then for each country, I need to calculate loan percentage by gender in new columns, so that new columns will have male percentage of total loan amount for that country and female percentage of total loan amount for that country. I need to do two group_by function, first to group all countries together and after that group genders to calculate loan percent.

 Total loan amount = 2525 female_prcent = 175+100+175+225/2525 = 26.73 male_percent = 825+1025/2525 = 73.26

The output should be as below:

 country female_percent male_percent 1 Austia 26.73 73.26

I am trying to do this in R. I tried the below function, but my R session is not producing any result and it is terminating.

 df %>% group_by(country, gender) %>% summarise_each(funs(sum))

Could someone help me in achieving this output? I think this can be achieved using dplyr function, but I am struck inbetween.

Please accept one of the answers as correct, if it helped solve your problem. — Drj
– Drj, Commented Jun 20, 2016 at 11:33

Cabana · Accepted Answer · 2016-06-05 17:03:27Z

library(dplyr) library(tidyr) df %>% group_by(country, gender) %>% summarise(total_loan_amount =sum(loan_amount)) %>% spread(gender, total_loan_amount) %>% ungroup() %>% transmute(country = country, female_percent = F / (F+M), male_percent = M /(F+M))

results in

 Source: local data frame [1 x 3] country female_percent male_percent (fctr) (dbl) (dbl) 1 Austia 0.2673267 0.7326733

Drj · Accepted Answer · 2016-06-05 16:13:20Z

I am sure there are better ways of doing it. Below is my simplistic take.

library(dplyr); library(reshape2) Summary <- df %>% group_by(country, gender) %>% summarise(Net = sum(loan_amount)) final <- recast(Summary, country~gender, id.var = c("country", "gender")) final <- mutate(final, F_percent = final$F/(final$F+final$M), M_percent = final$M/(final$F+final$M) )

Naming the columns with better names and retaining or dropping certain columns should now be easy.

David Bloom · Accepted Answer · 2019-03-08 06:41:36Z

I know this is an old post, but just thought I'd share my solution, which I think is a bit cleaner.

library(tidyverse) df <- data.frame(stringsAsFactors=FALSE, country = c("Austia", "Austia", "Austia", "Austia", "Austia", "Austia"), loan_amount = c(175, 100, 825, 175, 1025, 225), gender = c("F", "F", "M", "F", "M", "F") ) df %>% group_by(country) %>% summarize(female_percent = sum(loan_amount[gender=="F"]) / sum(loan_amount), male_percent = 1-female_percent)

Alan Gómez · Accepted Answer · 2023-07-11 01:03:31Z

Also is possible to do the task using a very reliable and trusted R base functions:

df <- data.frame(stringsAsFactors=FALSE, country = c("Austia", "Austia", "Austia", "Austia", "Austia", "Austia"), loan_amount = c(175, 100, 825, 175, 1025, 225), gender = c("F", "F", "M", "F", "M", "F") ) attach(df)

df2 <- aggregate(loan_amount ~ country + gender, FUN = function(x) {sum(x)/sum(loan_amount)})

Finally reshape de new data frame:

attach(df2) reshape(df2, direction = "wide", idvar = "country", timevar = "gender")

Outputs

 country gender loan_amount 1 Austia F 0.2673267 2 Austia M 0.7326733 country loan_amount.F loan_amount.M 1 Austia 0.2673267 0.7326733

Stack Exchange Network

How to group by multiple columns in dataframe using R and do aggregate function

4 Answers 4

Hot Network Questions

How to group by multiple columns in dataframe using R and do aggregate function

4 Answers 4

Related

Hot Network Questions