I'm trying to use withColumn to null out bad dates in a column in a dataframe, I'm using a when() function to make the update. I have two conditions for "bad" dates. dates before jan 1900 or dates in the future. My current code looks like this:
d = datetime.datetime.today() df_out = df.withColumn(my_column, when(col(my_column) < '1900-01-01' | col(my_column) > '2019-12-09 17:01:37.774418', lit(None)).otherwise(col(my_column))) I think my problem is that it doesn't like the or operator "|" . From what I have seen on google "|" is what i should use. I have tried "or" as well. Can anyone advise on what i'm doing wrong here.
here is the stack trace.
df_out = df.withColumn(c, when(col(c) < '1900-01-01' | col(c) > '2019-12-09 17:01:37.774418', lit(None)).otherwise(col(c))) File "C:\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\column.py", line 115, in _ njc = getattr(self._jc, name)(jc) File "C:\spark-2.4.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__ File "C:\spark-2.4.4-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco return f(*a, **kw) File "C:\spark-2.4.4-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o48.or. Trace: py4j.Py4JException: Method or([class java.lang.String]) does not exist```