Replicating a data.frame according to a vector of lengths`

Question

I have a data.frame:

set.seed(1) short.df <- data.frame(id=letters[1:10],name=LETTERS[1:10])

And I want to replicate each row by a number of times given by a vector whose length equals nrow(short.df):

lengths <- c(sample(10000,10,replace=F))

This takes too long for my real data size:

long.df <- do.call(rbind,lapply(1:length(lengths),function(x) data.frame(id=rep(short.df$id,lengths[x]),name=rep(short.df$name[x],lengths[x]))))

Any way to do it faster?

Rich Scriven · Accepted Answer · 2017-01-14 23:08:28Z

You can replicate the rows by using rep() in the i argument of [.data.frame.

long.df <- short.df[rep(1:nrow(short.df), lengths), ]

Check:

identical(nrow(long.df), sum(lengths)) # [1] TRUE

The new row names may not be desirable, but those are easy to change.

1 Answer 1