Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link
URL Rewriter Bot
URL Rewriter Bot

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answerin his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

setorder works now with data.frames
Source Link
Arun
  • 119.2k
  • 28
  • 290
  • 396

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder()  ## setorder() now also works with data.frames    # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------ 

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.

orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder() # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------ 

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder()  ## setorder() now also works with data.frames    # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------ 
added 46 characters in body
Source Link
Arun
  • 119.2k
  • 28
  • 290
  • 396

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answerin his answer). There has been quite a lot of improvements and also a new function setorder() since then.

require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)),   x = sample(c("A", "D", "C"), 1e8, TRUE),   y = sample(100, 1e8, TRUE),   z = sample(5, 1e8, TRUE),   stringsAsFactors = FALSE) 

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.

require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)), x = sample(c("A", "D", "C"), 1e8, TRUE), y = sample(100, 1e8, TRUE), z = sample(5, 1e8, TRUE), stringsAsFactors = FALSE) 

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.

require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)),   x = sample(c("A", "D", "C"), 1e8, TRUE),   y = sample(100, 1e8, TRUE),   z = sample(5, 1e8, TRUE),   stringsAsFactors = FALSE) 
Added base R to timings.
Source Link
Arun
  • 119.2k
  • 28
  • 290
  • 396
Loading
Source Link
Arun
  • 119.2k
  • 28
  • 290
  • 396
Loading