5

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue / bsent / breturn / tsent / treturn / csales type: Date / Char / Char / Char / Char / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric 

I want to sort the data by those variables in the following order:

  1. date
  2. mcode
  3. ycode
  4. yissue

The order of levels are important here, i.e. they should be sorted by date first, and if there are identical dates, they should be sorted by mcode, so and so forth. How can I do that in R?

2
  • 2
    Reading the first paragraph of help(sort) answers your question. Commented Nov 3, 2010 at 15:49
  • 1
    After getting the answers below, I'm sure I've done the right thing. I ♥ Stack Overflow. Commented Nov 3, 2010 at 19:12

4 Answers 4

11

Perhaps something like this?

> df<- data.frame(a=rev(1:10), b=rep(c(2,1),5), c=rnorm(10)) > df a b c 1 10 2 -0.85212079 2 9 1 -0.46199463 3 8 2 -1.52374565 4 7 1 0.28904717 5 6 2 -0.91609012 6 5 1 1.60448783 7 4 2 0.51249796 8 3 1 -1.35119089 9 2 2 -0.55497745 10 1 1 -0.05723538 > with(df, df[order(a, b, c), ]) a b c 10 1 1 -0.05723538 9 2 2 -0.55497745 8 3 1 -1.35119089 7 4 2 0.51249796 6 5 1 1.60448783 5 6 2 -0.91609012 4 7 1 0.28904717 3 8 2 -1.52374565 2 9 1 -0.46199463 1 10 2 -0.85212079 

The "order" function can take several vectors as arguments.

Sign up to request clarification or add additional context in comments.

1 Comment

you can also prefix an argument to order with a - to sort ascending instead of descending just for that criterion, e.g., order(df$b, -df$a, df$c).
8

building on the earlier solution, here are two other approaches. the second approach requires plyr.

df.sorted = df[do.call(order, df[names(df)]),]; df.sorted = arrange(df, a, b, c) 

Comments

4

if none of the above answers light your fire you can always use the orderBy() function from the doBy package:

require(doBy) sortedData <- orderBy(~date+mcode+ycode+yissue , data=unsortedData) 

As you might intuitively expect, you can put a negative sign in front of any variable to sort it descending.

There's nothing magical about orderBy(). As the documentation states, it is a "wrapper for the order() function - the important difference being that variables to order by can be given by a model formula."

I find the syntax easier to remember.

Comments

1

Additional notes: use -c() to reverse sort factor or character columns

with(df, df[order(a, b, -c(myCharCol)), ]) 

Also you can add a vector to pick only certain columns

with(df, df[order(a, b, c), c('a','b','x','y')]) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.