Count values by condition in PySpark Dataframe

In PySpark, you can use the filter and count functions to count values based on a specific condition in a DataFrame. The filter function allows you to specify a condition to filter rows, and the count function will give you the number of rows after the filter is applied.

Here's a step-by-step guide on how to count values by condition in a PySpark DataFrame:

First, set up your PySpark environment:

from pyspark.sql import SparkSession spark = SparkSession.builder.appName("countValuesByCondition").getOrCreate()

Create a sample DataFrame:

from pyspark.sql import Row data = [Row(name="Alice", age=25), Row(name="Bob", age=30), Row(name="Charlie", age=25), Row(name="David", age=28), Row(name="Eva", age=30)] df = spark.createDataFrame(data) df.show()

Count the values by condition:

Let's count the number of people with age 30:

count_age_30 = df.filter(df.age == 30).count() print(count_age_30)

If you have multiple conditions, you can use the & (and), | (or), and ~ (not) operators:

# Count people with age 30 and name Bob count_bob_age_30 = df.filter((df.age == 30) & (df.name == "Bob")).count() print(count_bob_age_30) # Count people with age less than 30 or name Eva count_condition = df.filter((df.age < 30) | (df.name == "Eva")).count() print(count_condition)

Remember to always wrap individual conditions in parentheses when combining multiple conditions.

(Optional) If you want to count based on unique values and conditions, you can use groupBy:

# Count the number of people for each age df.groupBy("age").count().show()

This way, you can efficiently count values based on conditions in a PySpark DataFrame.

More Tags

scanf dictionary asp.net-core-routing ggplot2 x-editable expandoobject parent-child bower assembly linker-scripts

Count values by condition in PySpark Dataframe

More Tags

More Programming Guides

Other Guides

More Programming Examples

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators