Revisions to Sort (order) data frame rows by multiple columns

replaced http://stackoverflow.com/ with https://stackoverflow.com/

edited May 23, 2017 at 10:31

URL Rewriter Bot

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

setorder works now with data.frames

Source Link

edited May 27, 2015 at 12:28

Arun

119.2k
28
290
396

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder()  ## setorder() now also works with data.frames    # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.

orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder() # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.

orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder()  ## setorder() now also works with data.frames    # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------

added 46 characters in body

Source Link

edited Mar 29, 2015 at 16:12

Arun

119.2k
28
290
396

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answerin his answer). There has been quite a lot of improvements and also a new function setorder() since then.

require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)),   x = sample(c("A", "D", "C"), 1e8, TRUE),   y = sample(100, 1e8, TRUE),   z = sample(5, 1e8, TRUE),   stringsAsFactors = FALSE)

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.

require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)), x = sample(c("A", "D", "C"), 1e8, TRUE), y = sample(100, 1e8, TRUE), z = sample(5, 1e8, TRUE), stringsAsFactors = FALSE)

The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.

require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)),   x = sample(c("A", "D", "C"), 1e8, TRUE),   y = sample(100, 1e8, TRUE),   z = sample(5, 1e8, TRUE),   stringsAsFactors = FALSE)

Added base R to timings.

Source Link

edited Mar 29, 2015 at 16:01

Arun

119.2k
28
290
396

Loading

Source Link

answered Mar 29, 2015 at 15:52

Arun

119.2k
28
290
396

Loading

Collectives™ on Stack Overflow

Return to Answer