I have a CSV with headings that I'd like to save as Parquet (actually a delta table)
The column headings have spaces in them, which parquet can't handle. How do I change spaces to underscores?
This is what I have so far, cobbled together from other SO posts:
from pyspark.sql.functions import * df = spark.read.option("header", True).option("delimiter","\u0001").option("inferSchema",True).csv("/mnt/landing/MyFile.TXT") names = df.schema.names for name in names: df2 = df.withColumnRenamed(name,regexp_replace(name, ' ', '_')) When I run this, the final line gives me this error:
TypeError: Column is not iterable
I thought this would be a common requirement given that parquet can't handle spaces but it's quite difficult to find any examples.
df.select([col(a).alias(b) for a,b in zip(df.columns,[re.sub(" ","_",i) for i in df.columns])])