How to sort by column in descending order in Spark SQL?

Question

I tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in ascending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.

He means "df.sort("col1").show(10) also sorts in ascending order" — Josiah Yoder
– Josiah Yoder, Commented Jul 15, 2016 at 18:30
This solution worked perfectly for me : stackoverflow.com/a/38575271/5957143 — abc123
– abc123, Commented Nov 6, 2018 at 3:12

Gabber · Accepted Answer · 2017-04-07 10:21:36Z

You can also sort the column by importing the spark sql functions

import org.apache.spark.sql.functions._ df.orderBy(asc("col1"))

Or

import org.apache.spark.sql.functions._ df.sort(desc("col1"))

importing sqlContext.implicits._

import sqlContext.implicits._ df.orderBy($"col1".desc)

Or

import sqlContext.implicits._ df.sort($"col1".desc)

also when you're ordering ascending by all columns, the asc keyword is not necessary: ..orderBy("col1", "col2").

Sky · Accepted Answer · 2019-07-17 15:58:15Z

116

It's in org.apache.spark.sql.DataFrame for sort method:

df.sort($"col1", $"col2".desc)

Note $ and .desc inside sort for the column to sort the results by.

edited Jul 17, 2019 at 15:58

Sky

2,6071 gold badge22 silver badges31 bronze badges

answered May 19, 2015 at 17:48

Vedom

3,1373 gold badges16 silver badges16 bronze badges

3 Comments

David Griffin Over a year ago

import org.apache.spark.sql.functions._ and import sqlContext.implicits._ also get you a lot of nice functionality.

kavya Over a year ago

@Vedom: Shows a syntax error: df.sort($"Time1", $"Time2".desc) SyntaxError: invalid syntax at the $ symbol

Rimer Over a year ago

@kaks, need to import functions/implicits as described above to avoid that error

Nic Scozzaro · Accepted Answer · 2019-03-24 13:55:01Z

PySpark only

I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:

df.orderBy("col1", ascending=False).show(10)

Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

The question is marked with a scala tag, but this answer is for python only as this syntax as well as a function signature are python-only.

Paul Reiners · Accepted Answer · 2018-12-11 18:33:07Z

21

import org.apache.spark.sql.functions.desc df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))

edited Dec 11, 2018 at 18:33

Paul Reiners

7,94435 gold badges126 silver badges212 bronze badges

answered Sep 11, 2018 at 12:31

Nitya Yekkirala

2653 silver badges3 bronze badges

1 Comment

WestCoastProjects Over a year ago

This is a duplicate answer from the one 3 years earlier by @AmitDubey. should be removed in favor of that one.

OneCricketeer · Accepted Answer · 2018-05-14 16:22:06Z

8

df.sort($"ColumnName".desc).show()

edited May 14, 2018 at 16:22

OneCricketeer

193k20 gold badges146 silver badges276 bronze badges

answered Nov 9, 2017 at 10:38

Nilesh Shinde

4675 silver badges10 bronze badges

Comments

zx485 · Accepted Answer · 2018-09-06 21:10:44Z

In the case of Java:

If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:

Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");

where e_id is the column on which join is applied while sorted by salary in ASC.

Also, we can use Spark SQL as:

SQLContext sqlCtx = spark.sqlContext(); sqlCtx.sql("select * from global_temp.salary order by salary desc").show();

where

spark -> SparkSession
salary -> GlobalTemp View.

Collectives™ on Stack Overflow

How to sort by column in descending order in Spark SQL?

6 Answers 6

1 Comment

3 Comments

1 Comment

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

3 Comments

1 Comment

1 Comment

Comments

Comments

Linked

Related