I tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in ascending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.
- 3He means "df.sort("col1").show(10) also sorts in ascending order"Josiah Yoder– Josiah Yoder2016-07-15 18:30:37 +00:00Commented Jul 15, 2016 at 18:30
- This solution worked perfectly for me : stackoverflow.com/a/38575271/5957143abc123– abc1232018-11-06 03:12:28 +00:00Commented Nov 6, 2018 at 3:12
6 Answers
You can also sort the column by importing the spark sql functions
import org.apache.spark.sql.functions._ df.orderBy(asc("col1")) Or
import org.apache.spark.sql.functions._ df.sort(desc("col1")) importing sqlContext.implicits._
import sqlContext.implicits._ df.orderBy($"col1".desc) Or
import sqlContext.implicits._ df.sort($"col1".desc) 1 Comment
asc keyword is not necessary: ..orderBy("col1", "col2").It's in org.apache.spark.sql.DataFrame for sort method:
df.sort($"col1", $"col2".desc) Note $ and .desc inside sort for the column to sort the results by.
3 Comments
import org.apache.spark.sql.functions._ and import sqlContext.implicits._ also get you a lot of nice functionality.df.sort($"Time1", $"Time2".desc) SyntaxError: invalid syntax at the $ symbolPySpark only
I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:
df.orderBy("col1", ascending=False).show(10) Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
1 Comment
import org.apache.spark.sql.functions.desc df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3")) 1 Comment
In the case of Java:
If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:
Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary"); where e_id is the column on which join is applied while sorted by salary in ASC.
Also, we can use Spark SQL as:
SQLContext sqlCtx = spark.sqlContext(); sqlCtx.sql("select * from global_temp.salary order by salary desc").show(); where
- spark -> SparkSession
- salary -> GlobalTemp View.