Sort the PySpark DataFrame columns by Ascending or Descending order

Sort the PySpark DataFrame columns by Ascending or Descending order

To sort a PySpark DataFrame by its columns, you use the orderBy method, specifying the columns you want to sort by and the sort order (ascending or descending). Here's how to do it:

from pyspark.sql import SparkSession from pyspark.sql.functions import col # Initialize a SparkSession spark = SparkSession.builder \ .appName("example") \ .getOrCreate() # Assuming you have a PySpark DataFrame named df # Sort by a single column ascending df_sorted = df.orderBy("column_name") # Sort by a single column descending df_sorted_desc = df.orderBy(col("column_name").desc()) # Sort by multiple columns ascending df_sorted_multiple = df.orderBy(["column1", "column2"]) # Sort by multiple columns with different sort orders # Here column1 will be sorted in ascending order (default) and column2 in descending order df_sorted_diff = df.orderBy(col("column1"), col("column2").desc()) # Show the sorted DataFrame df_sorted.show() df_sorted_desc.show() df_sorted_multiple.show() df_sorted_diff.show() 

Replace "column_name", "column1", and "column2" with the actual column names from your DataFrame.

Note:

  • By default, orderBy will sort the data in ascending order.
  • If you want to explicitly specify ascending order, you can use the .asc() method on the column: col("column_name").asc().
  • You can pass a list of columns to orderBy if you need to sort by multiple columns.
  • When sorting by multiple columns, the priority of the sorting will be in the order of the columns provided.

Remember to stop the SparkSession when you're done to free up resources:

spark.stop() 

More Tags

data-access-layer static-analysis owl-carousel dart-2 spring-boot-test log4j2 sap-dotnet-connector node-red sh php-5.6

More Programming Guides

Other Guides

More Programming Examples