Running a simple example -
dept = [("Finance",10),("Marketing",None),("Sales",30),("IT",40)] deptColumns = ["dept_name","dept_id"] rdd = sc.parallelize(dept) df = rdd.toDF(deptColumns) df.show(truncate=False) print('count the dept_id, should be 3') print('count: ' + str(df.select(F.col("dept_id")).count())) We get the following output -
+---------+-------+ |dept_name|dept_id| +---------+-------+ |Finance |10 | |Marketing|null | |Sales |30 | |IT |40 | +---------+-------+ count the dept_id, should be 3 count: 4 I'm running on databricks and this is my stack - Spark 3.0.1 scala 2.12, DBR 7.3 LTS
Thanks for any help!!