If you add any kind of logging into a UDF function in PySpark, it won't appear anywhere. Is it some kind of method to make this happen?
So far I tried standard python logging, py4j and also print.
We're running PySpark 2.3.2 with YARN cluster manager on AWS EMR clusters.
For example. Here's a function I want to use:
def parse_data(attr): try: # execute something except Exception as e: logger.error(e) return None I convert it to UDF:
import pyspark.sql.functions as F parse_data_udf = F.udf(parse_data, StringType()) And I will use it on a dataframe:
from pyspark.sql import types as pst dataframe = dataframe.withColumn("new_column", parse_data_udf("column").cast(pst.StringType()) The logs from the function will NOT appear anywhere.