The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answerin his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.
The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.
The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.
The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.
orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder() ## setorder() now also works with data.frames # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------ The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.
orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder() # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------ The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then. From v1.9.5+, setorder() also works with data.frames.
orderBy( ~ -z + b, data = dat) ## doBy plyr::arrange(dat, desc(z), b) ## plyr arrange(dat, desc(z), b) ## dplyr sort(dat, f = ~ -z + b) ## taRifx dat[with(dat, order(-z, b)), ] ## base R # convert to data.table, by reference setDT(dat) dat[order(-z, b)] ## data.table, base R like syntax setorder(dat, -z, b) ## data.table, using setorder() ## setorder() now also works with data.frames # R-session memory usage (BEFORE) = ~2GB (size of 'dat') # ------------------------------------------------------------ # Package function Time (s) Peak memory Memory used # ------------------------------------------------------------ # doBy orderBy 409.7 6.7 GB 4.7 GB # taRifx sort 400.8 6.7 GB 4.7 GB # plyr arrange 318.8 5.6 GB 3.6 GB # base R order 299.0 5.6 GB 3.6 GB # dplyr arrange 62.7 4.2 GB 2.2 GB # ------------------------------------------------------------ # data.table order 6.2 4.2 GB 2.2 GB # data.table setorder 4.5 2.4 GB 0.4 GB # ------------------------------------------------------------ The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answerin his answer). There has been quite a lot of improvements and also a new function setorder() since then.
require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)), x = sample(c("A", "D", "C"), 1e8, TRUE), y = sample(100, 1e8, TRUE), z = sample(5, 1e8, TRUE), stringsAsFactors = FALSE) The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.
require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)), x = sample(c("A", "D", "C"), 1e8, TRUE), y = sample(100, 1e8, TRUE), z = sample(5, 1e8, TRUE), stringsAsFactors = FALSE) The R package data.table provides both fast and memory efficient ordering of data.tables with a straightforward syntax (a part of which Matt has highlighted quite nicely in his answer). There has been quite a lot of improvements and also a new function setorder() since then.
require(plyr) require(doBy) require(data.table) require(dplyr) require(taRifx) set.seed(45L) dat = data.frame(b = as.factor(sample(c("Hi", "Med", "Low"), 1e8, TRUE)), x = sample(c("A", "D", "C"), 1e8, TRUE), y = sample(100, 1e8, TRUE), z = sample(5, 1e8, TRUE), stringsAsFactors = FALSE)