Note: coalesce will not replace NaN values, only nulls:
import pyspark.sql.functions as F >>> cDf = spark.createDataFrame([(None, None), (1, None), (None, 2)], ("a", "b")) >>> cDf.show() +----+----+ | a| b| +----+----+ |null|null| | 1|null| |null| 2| +----+----+ >>> cDf.select(F.coalesce(cDf["a"], cDf["b"])).show() +--------------+ |coalesce(a, b)| +--------------+ | null| | 1| | 2| +--------------+
Let's now create a pandas.DataFrame with None entries, convert it into spark.DataFrame and use coalesce again:
>>> cDf_from_pd = spark.createDataFrame(pd.DataFrame({'a': [None, 1, None], 'b': [None, None, 2]})) >>> cDf_from_pd.show() +---+---+ | a| b| +---+---+ |NaN|NaN| |1.0|NaN| |NaN|2.0| +---+---+ >>> cDf_from_pd.select(F.coalesce(cDf_from_pd["a"], cDf_from_pd["b"])).show() +--------------+ |coalesce(a, b)| +--------------+ | NaN| | 1.0| | NaN| +--------------+
In which case you'll need to first call replace on your DataFrame to convert NaNs to nulls.