187

The following code combines a vector with a dataframe:

newrow = c(1:4) existingDF = rbind(existingDF,newrow) 

However this code always inserts the new row at the end of the dataframe.

How can I insert the row at a specified point within the dataframe? For example, lets say the dataframe has 20 rows, how can I insert the new row between rows 10 and 11?

3
  • Use a convenient index and sort? Commented Jul 19, 2012 at 13:27
  • 26
    existingDF = rbind(existingDF[1:10,],newrow,existingDF[-(1:10),]) Commented Jul 19, 2012 at 13:31
  • 1
    With a simple loop and a condition if needed, rows can be appended from one dataframe into another. A sample code is as shown below newdataframe[nrow(newdataframe)+1,] <- existingdataframe[i,] Commented May 8, 2016 at 12:56

6 Answers 6

179

Here's a solution that avoids the (often slow) rbind call:

existingDF <- as.data.frame(matrix(seq(20),nrow=5,ncol=4)) r <- 3 newrow <- seq(4) insertRow <- function(existingDF, newrow, r) { existingDF[seq(r+1,nrow(existingDF)+1),] <- existingDF[seq(r,nrow(existingDF)),] existingDF[r,] <- newrow existingDF } > insertRow(existingDF, newrow, r) V1 V2 V3 V4 1 1 6 11 16 2 2 7 12 17 3 1 2 3 4 4 3 8 13 18 5 4 9 14 19 6 5 10 15 20 

If speed is less important than clarity, then @Simon's solution works well:

existingDF <- rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]) > existingDF V1 V2 V3 V4 1 1 6 11 16 2 2 7 12 17 3 3 8 13 18 4 1 2 3 4 41 4 9 14 19 5 5 10 15 20 

(Note we index r differently).

And finally, benchmarks:

library(microbenchmark) microbenchmark( rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]), insertRow(existingDF,newrow,r) ) Unit: microseconds expr min lq median uq max 1 insertRow(existingDF, newrow, r) 660.131 678.3675 695.5515 725.2775 928.299 2 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 801.161 831.7730 854.6320 881.6560 10641.417 

Benchmarks

As @MatthewDowle always points out to me, benchmarks need to be examined for the scaling as the size of the problem increases. Here we go then:

benchmarkInsertionSolutions <- function(nrow=5,ncol=4) { existingDF <- as.data.frame(matrix(seq(nrow*ncol),nrow=nrow,ncol=ncol)) r <- 3 # Row to insert into newrow <- seq(ncol) m <- microbenchmark( rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]), insertRow(existingDF,newrow,r), insertRow2(existingDF,newrow,r) ) # Now return the median times mediansBy <- by(m$time,m$expr, FUN=median) res <- as.numeric(mediansBy) names(res) <- names(mediansBy) res } nrows <- 5*10^(0:5) benchmarks <- sapply(nrows,benchmarkInsertionSolutions) colnames(benchmarks) <- as.character(nrows) ggplot( melt(benchmarks), aes(x=Var2,y=value,colour=Var1) ) + geom_line() + scale_x_log10() + scale_y_log10() 

@Roland's solution scales quite well, even with the call to rbind:

 5 50 500 5000 50000 5e+05 insertRow2(existingDF, newrow, r) 549861.5 579579.0 789452 2512926 46994560 414790214 insertRow(existingDF, newrow, r) 895401.0 905318.5 1168201 2603926 39765358 392904851 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 787218.0 814979.0 1263886 5591880 63351247 829650894 

Plotted on a linear scale:

linear

And a log-log scale:

log-log

Sign up to request clarification or add additional context in comments.

6 Comments

Inserting a row at the end gives weird behaviour!
@Maarten With which function?
I guess it's the same weird behaviour I'm describing here: stackoverflow.com/questions/19927806/…
The weird behaviour does not occur with insertRow2, in my particular data frame and row.
How do you just add a row of numbers to a df? I have df with columns a,b,c,d and I want to add the row 1,2,3,4. How do I do that?
|
48
insertRow2 <- function(existingDF, newrow, r) { existingDF <- rbind(existingDF,newrow) existingDF <- existingDF[order(c(1:(nrow(existingDF)-1),r-0.5)),] row.names(existingDF) <- 1:nrow(existingDF) return(existingDF) } insertRow2(existingDF,newrow,r) V1 V2 V3 V4 1 1 6 11 16 2 2 7 12 17 3 1 2 3 4 4 3 8 13 18 5 4 9 14 19 6 5 10 15 20 microbenchmark( + rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]), + insertRow(existingDF,newrow,r), + insertRow2(existingDF,newrow,r) + ) Unit: microseconds expr min lq median uq max 1 insertRow(existingDF, newrow, r) 513.157 525.6730 531.8715 544.4575 1409.553 2 insertRow2(existingDF, newrow, r) 430.664 443.9010 450.0570 461.3415 499.988 3 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 606.822 625.2485 633.3710 653.1500 1489.216 

2 Comments

This is a cool solution. Still can't figure out why it's so much faster than the simultaneous call to rbind, but I'm intrigued.
Answers with benchmarks should have some extra reputation applied automatically IMO. Thanks!
46

The .before argument in tibble::add_row() can be used to specify the row.

tibble::add_row( cars, speed = 0, dist = 0, .before = 3 ) #> speed dist #> 1 4 2 #> 2 4 10 #> 3 0 0 #> 4 7 4 #> 5 7 22 #> 6 8 16 #> ... 

3 Comments

Sometimes it is really worthwhile scrolling down a little. For me this is the best answer. Thanks!
Wow! So useful. I did not know about this dplyr function.
Just an FYI, add_row is imported into dplyr from the tibble package
10

You should try dplyr package

library(dplyr) a <- data.frame(A = c(1, 2, 3, 4), B = c(11, 12, 13, 14)) system.time({ for (i in 50:1000) { b <- data.frame(A = i, B = i * i) a <- bind_rows(a, b) } }) 

Output

 user system elapsed 0.25 0.00 0.25 

In contrast with using rbind function

a <- data.frame(A = c(1, 2, 3, 4), B = c(11, 12, 13, 14)) system.time({ for (i in 50:1000) { b <- data.frame(A = i, B = i * i) a <- rbind(a, b) } }) 

Output

 user system elapsed 0.49 0.00 0.49 

There is some performance gain.

Comments

1

To insert a row after row 10 in a dataframe we can use use:

library(berryFunctions) newrow = c(1:4) df <- insertRows(df, 10, new = newrow) 

Comments

0

Data frames are actually lists of vectors. Hence we could deploy base::append—that adds elements to a vector—in Map, which is pretty fast. Here a raw version:

appendRow <- \(data, x, before) { as.data.frame(Map(\(v, y, ...) append(v, y, before - 1L), data, x)) } 

Usage

> appendRow(data=mtcars, x=rep_len(-9, ncol(mtcars)), before=3) |> head() mpg cyl disp hp drat wt qsec vs am gear carb 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3 -9.0 -9 -9 -9 -9.00 -9.000 -9.00 -9 -9 -9 -9 4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 5 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 6 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 

Benchmark

Checked for equivalence, using Rscript --vanilla.

Based on 10M rows of mtcars:

Unit: milliseconds expr min lq mean median uq max neval cld appendRow 384.0279 388.8217 715.8154 667.6382 893.8432 1324.993 10 a insertRow2 1930.2843 2163.2117 2215.9483 2217.6469 2224.6168 2540.156 10 b dplyr::add_row 3320.5795 3326.2406 3948.6887 3646.0097 4489.3242 5014.286 10 c 

Based on 10 to 10M rows of mtcars:

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.