PySpark: How do I specify dropna axis in PySpark transformation?

Asked 5 years, 9 months ago

Viewed 250 times

I would like to drop columns that contain all null values using dropna(). With Pandas you can do this with setting the keyword argument axis = 'columns' in dropna(). Here an example in a GitHub post.

How do I do this in PySpark ? dropna() is available as a transformation in PySpark, however axis is not an available keyword.

Note: I do not want to transpose my dataframe for this to work.

How would I drop the furniture column from this dataframe ?

data_2 = { 'furniture': [np.NaN ,np.NaN ,np.NaN], 'myid': ['1-12', '0-11', '2-12'], 'clothing': ["pants", "shoes", "socks"]} df_1 = pd.DataFrame(data_2) ddf_1 = spark.createDataFrame(df_1) ddf_1.show()

edited Feb 11, 2020 at 10:21

asked Feb 11, 2020 at 9:39

DataBach

1,6853 gold badges23 silver badges47 bronze badges

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

PySpark: How do I specify dropna axis in PySpark transformation?

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked