0

I have a data frame:

DF <- data.frame(Value = c("AB", "BC", "CD", "DE", "EF", "FG", "GH", "HI", "IJ", "JK", "KL", "LM"), ID = c(1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1)) 

My question: I would like to create a new column that includes a (binary) random number ('0' or '1') for cases 'ID' == 1 with a fixed proportion (or pre-defined prevalence) (e.g., random numbers '0' x 2 and '1' x 4).

EDIT I: For non-case specific purposes, the solution might be:

DF$RANDOM[sample(1:nrow(DF), nrow(DF), FALSE)] <- rep(RANDOM, c(nrow(DF)-4,4)) 

But, I still need the cas-specific assignment AND the aforementioned solution does not explicitly refer to '0' or '1'.

(Note: The variable 'value' is not relevant for the question; only an identifier.)

I figured out relevant posts on stratified sampling or random row selection - but this question is not covered by those (and other) posts.

Thank you VERY much in advance.

2 Answers 2

1

You can subset the data first by case ID == 1. To ensure occurrence of 1s and 0s, we use rep function and set replace to False in sample function.
Here's a solution.

library(data.table) set.seed(121) DF[ID == 1, new_column := sample(rep(c(0,1), c(2,4)), .N, replace = F)] print(DF1) Value ID new_column 1: AB 1 1 2: BC 0 NA 3: CD 0 NA 4: DE 1 1 5: EF 0 NA 6: FG 1 1 7: GH 1 1 8: HI 0 NA 9: IJ 0 NA 10: JK 1 0 11: KL 0 NA 12: LM 1 0 
Sign up to request clarification or add additional context in comments.

6 Comments

Yes, I know. BUT: I require a solution without a prior subset AND you worked with probabilities (I need fixed proportions - no probabilities).
I've made few changes. After doing set.seed, the prob parameter will always generate the same number of 1s and 0s. Since you want to generate 1 and 0 randomly, such that 1 occurs 4 times and 0 should occur 2 times, that's how it's going to work.
Thanks. But what about the 'prob' part (you computed the proportion of the provided numbers). But I would like to have the fix numbers. Finally, when you create an additional column (e.g., DF_1[ID == 1, new_column_additional := sample(c(0,1), .N, replace = T, prob = c(0.33,0.67))] ) the result is wrong.
I've added the fix by using rep instead of probabilities. Please check.
GREAT. This works as anticipated. Thanks for your fast replies and your help. Good solution.
|
0
library(dplyr) DF <- data.frame(Value = c("AB", "BC", "CD", "DE", "EF", "FG", "GH", "HI", "IJ", "JK", "KL", "LM"), ID = c(1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1), stringsAsFactors = FALSE) DF %>% group_by(ID) %>% sample_n(4, replace = FALSE) 

3 Comments

Your example does not work since you do not refer to the defined cases.
I edited my answer. Is this what you were looking for?
Thank. No, since your example is not case-specific.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.