How to compute the NAs with the column mean and then multiply columns of different lengths in R?

Question

My question might be not so clear so I am putting an example.

My final goal is to produce

final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e)

I have five data frames (one column each) with different lengths as follows:

df1

 a 1. 1 2. 2 3. 4 4. 2

df2

 b 1. 2 2. 6

df3

 c 1. 2 2. 4 3. 3

df4

 d 1. 1 2. 2 3. 4 4. 3

df5

 e 1. 4 2. 6 3. 2

So I want a final database which includes them all as follows

finaldf

 a b c d e 1. 1 2 2 1 4 2. 2 6 4 2 6 3. 4 NA 3 4 2 4. 2 NA NA 3 NA

I want all the NAs for each column to be replaced with the mean of that column, so the finaldf has equal length of all the columns:

finaldf

 a b c d e 1. 1 2 2 1 4 2. 2 6 4 2 6 3. 4 4 3 4 2 4. 2 4 3 3 4

and therefore I can produce a final result for final=(df1$a*df2$b)+(df1$a*df3$c*df4$d)+(df4$d*df5$e) as I need.

user1357015 · Accepted Answer · 2022-01-17 21:49:15Z

The easiest by far is to use the qpcR, dplyr and tidyr packages.

library(dplyr) library(qpcR) library(tidyr) df1 <- data.frame(a=c(1,2,4,2)) df2 <- data.frame(b=c(2,6)) df3 <- data.frame(c=c(2,4,3)) df4 <- data.frame(d=c(1,2,4,3)) df5 <- data.frame(e=c(4,6,2)) mydf <- qpcR:::cbind.na(df1, df2, df3, df4,df5) %>% tidyr::replace_na(.,as.list(colMeans(.,na.rm=T))) > mydf a b c d e 1 1 2 2 1 4 2 2 6 4 2 6 3 4 4 3 4 2 4 2 4 3 3 4

Depending on your rgl settings, you might need to run the following at the top of your script to make the qpcR package load (see https://stackoverflow.com/a/66127391/2554330 ):

options(rgl.useNULL = TRUE) library(rgl)

GuedesBF · Accepted Answer · 2022-01-17 22:35:29Z

With purrr and dplyr, we can first put all dataframes in a list with mget(). Second, use set_names to replace the dataframe names with their respective column names. As a third step, unlist the dataframes to get vectors with pluck. Then add the NAs by making all vectors the same length. Finally, bind all vectors back into a dataframe with as.data.frame, then use mutate with ~replace_na and colmeans.

library(dplyr) library(purrr) mget(ls(pattern = 'df\\d')) %>% set_names(map_chr(., colnames)) %>% map(pluck, 1) %>% map(., `length<-`, max(lengths(.))) %>% as.data.frame %>% mutate(across(everything(), ~replace_na(.x, mean(.x, na.rm=TRUE))))

Collectives™ on Stack Overflow

How to compute the NAs with the column mean and then multiply columns of different lengths in R?

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related