-4

Suppose I have a dataset with (90,000 x 17) i.e. (n x p) where n is the number of observations and p is the number of variables and I would like to take a random sample of 20% of rows from my whole dataset how can this be done in R?

After taking a random sample I will be performing cluster analysis accordingly.

I had tried using other questions to answer my question but they were inconclusive because it was not giving me what I needed.

3
  • 1
    sample() with repeated sampling could help Commented Mar 5, 2019 at 14:33
  • 4
    df[sample(nrow(df), nrow(df)*0.2),] Commented Mar 5, 2019 at 14:34
  • 2
    Remember to fix seed set.seed(1492) (or any number) in order to obtain reproducibility of your sample! Commented Mar 5, 2019 at 14:35

1 Answer 1

6

You can do it with sample_frac from dplyr, here is an example with the database iris

 library(dplyr) #data(iris) sample20 <- iris %>% sample_frac(0.2) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.