2

I would like to drop columns that contain all null values using dropna(). With Pandas you can do this with setting the keyword argument axis = 'columns' in dropna(). Here an example in a GitHub post.

How do I do this in PySpark ? dropna() is available as a transformation in PySpark, however axis is not an available keyword.

Note: I do not want to transpose my dataframe for this to work.

How would I drop the furniture column from this dataframe ?

data_2 = { 'furniture': [np.NaN ,np.NaN ,np.NaN], 'myid': ['1-12', '0-11', '2-12'], 'clothing': ["pants", "shoes", "socks"]} df_1 = pd.DataFrame(data_2) ddf_1 = spark.createDataFrame(df_1) ddf_1.show() 
0

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.