0

Want to do Bootstrapping while comparing two dataframe column wise with the different number of rows.

I have two dataframe in which row represent values from experiments and column with the dataset names (data1, data2, data3, data4)

emp.data1 <- data.frame( data1 = c(234,0,34,0,46,0,0,0,2.26,0, 5,8,93,56), data2 = c(1.40,1.21,0.83,1.379,2.60,9.06,0.88,1.16,0.64,8.28, 5,8,93,56), data3 =c(0,34,43,0,0,56,0,0,0,45,5,8,93,56), data4 =c(45,0,545,34,0,35,0,35,0,534, 5,8,93,56), stringsAsFactors = FALSE ) emp.data2 <- data.frame( data1 = c(45, 0, 0, 45, 45, 53), data2 = c(23, 0, 45, 12, 90, 78), data3 = c(72, 45, 756, 78, 763, 98), data4 = c(1, 3, 65, 78, 9, 45), stringsAsFactors = FALSE ) 

I am trying to do bootstrapping(n=1000). Values are selected at random replacement from emp.data1(14 * 4) without change in the emp.data2(6 * 4). For example from emp.data2 first column (data1) select 6 values colSum and from emp.data1(data1) select 6 random non zero values colSum Divide the values and store in temp repeat the same 1000 times and take a median value et the end. like this i want to do it for each column of the dataframe. sample code I am providing which is working fine but i am not able get the non-zero random values for emp.data1

nboot <- 1e3 boot_temp_emp<- c() n_data1 <- nrow(emp.data1); n_data2 <- nrow(emp.data2) for (j in seq_len(nboot)) { boot <- sample(x = seq_len(n_data1), size = n_data2, replace = TRUE) value <- colSums(emp.data2)/colSums(emp.data1[boot,]) boot_temp_emp <- rbind(boot_temp_emp, value) } boot_data<- apply(boot_temp_emp, 2, median) 

From the above script i am able get the output but each column emp.data1[boot,] data has zero values and taken sum. I want indivisual ramdomly selected non-zero values column sum so I tried below script not able remove zero values. Not able get desired output please some one help me to correct my script

nboot <- 1e3 boot_temp_emp<- c() for (i in colnames(emp.data2)){ for (j in seq_len(nboot)){ data1=emp.data1[i] data2=emp.data2[i] n_data1 <- nrow(data1); n_data2 <- nrow(data2) boot <- sample(x = seq_len(n_data1), size = n_data2, replace = TRUE) value <- colSums(data2[i])/colSums(data1[boot, ,drop = FALSE]) boot_temp_emp <- rbind(boot_temp_emp, value) } } boot_data<- apply(boot_temp_emp, 2, median) 

Thank you

8
  • 1
    There seem to be too many missing words to understand this request. If English is not your native tongue then perhaps have a colleague review this to help fill in the gaps. Commented Feb 24, 2022 at 19:04
  • @IRTFM I have edited my lines please check Commented Feb 24, 2022 at 19:12
  • Appears you want a bootstrapped estimate of the median of ... something. What is desired to be used to obtain your neo-samples is still unclear to my reading. Commented Feb 24, 2022 at 19:30
  • @IRTFM yes I want median values Commented Feb 24, 2022 at 19:31
  • You want the median of the combined values of two columns? Commented Feb 24, 2022 at 19:32

1 Answer 1

1

Here is a solution.
Write a function to make the code clearer. This function takes the following arguments.

  1. x the input data.frame emp.data1;
  2. s2 the columns sums of emp.data2;
  3. n = 6 the number of vector elements to sample from emp.data1's columns with a default value of 6.

The create a results matrix, pre-compute the column sums of emp.data2 and call the function in a loop.

boot_fun <- function(x, s2, n = 6){ # the loop makes sure ther is no divide by zero nrx <- nrow(x) repeat{ i <- sample(nrx, n, replace = TRUE) s1 <- colSums(x[i, ]) if(all(s1 != 0)) break } s2/s1 } set.seed(2022) nboot <- 1e3 sums2 <- colSums(emp.data2) results <- matrix(nrow = nboot, ncol = ncol(emp.data1)) for(i in seq_len(nboot)){ results[i, ] <- boot_fun(emp.data1, sums2) } ratios_medians <- apply(results, 2, median) old_par <- par(mfrow = c(2, 2)) for(j in 1:4) { main <- paste0("data", j) hist(results[, j], main = main, xlab = "ratios", freq = FALSE) abline(v = ratios_medians[j], col = "blue", lty = "dashed") } par(old_par) 

Created on 2022-02-24 by the reprex package (v2.0.1)


Edit

Following the comments here is a revised version of the bootstrap function. It makes sure there are no zeros in the sampled vectors, before computing their sums.

boot_fun2 <- function(x, s2, n = 6){ nrx <- nrow(x) ncx <- ncol(x) s1 <- numeric(ncx) for(j in seq.int(ncx)) { repeat{ i <- sample(nrx, n, replace = TRUE) if(all(x[i, j] != 0)) { s1[j] <- sum(x[i, j]) break } } } s2/s1 } set.seed(2022) nboot <- 1e3 sums2 <- colSums(emp.data2) results2 <- matrix(nrow = nboot, ncol = ncol(emp.data1)) for(i in seq_len(nboot)){ results2[i, ] <- boot_fun2(emp.data1, sums2) } ratios_medians2 <- apply(results2, 2, median) old_par <- par(mfrow = c(2, 2)) for(j in 1:4) { main <- paste0("data", j) hist(results2[, j], main = main, xlab = "ratios", freq = FALSE) abline(v = ratios_medians2[j], col = "blue", lty = "dashed") } par(old_par) 

Created on 2022-02-27 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the code, if I dont have the fixed n=6 because emp.data2 also contains some zero values is it possible to consider nonzero values and make n=? is there way for each column n will change
@Luckysardar You can change the value of n but it will be the same for all columns. Why do you want to change this? As long as the denominators are not zero, all is well, no?
Okay thanks for the suggestion If just want change median instade of colSums can you just suggest how can I change i am getting some error if I pass median. I need both observation colSum and median
there is one confusion in function boot_fun I wanted s1 values should be non zero if n=6 emp.data1$data1 6 values should be nonzero before doing sum same for column data2, data3, data4
@Luckysardar The instruction if(all(s1 != 0)) break guarantees that s1 values are all non zero. That's why the function has a repeat loop. As for the median, in the problem description you say you want to bootstarp the ratios of colSums, then compute the medians of the bootstrapped values. Isn't this right? If it is, then the apply after the for loop already does that.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.