0

I need to delete certain entries from nested Json files. As far as I know, I cant just delete them from the json file directly, so my next choice would be to load them into a pyspark dataframe, delete the entries there, create a new json with the same schema (& preferably the same name) and replace the old json file. I have extracted the schema into a json file, is there a way to write the dataframe back into a json file, somehow parsing the extracted schema?

Thanks!

1 Answer 1

1

Spark DataFrameWriter also has a method mode() to specify SaveMode; the argument to this method either takes below string or a constant from SaveMode class.

overwrite – mode is used to overwrite the existing file, alternatively, you can use SaveMode.Overwrite.

append – To add the data to the existing file, alternatively, you can use SaveMode.Append.

ignore – Ignores write operation when the file already exists, alternatively you can use SaveMode.Ignore.

errorifexists or error – This is a default option when the file already exists, it returns an error, alternatively, you can use SaveMode.ErrorIfExists.

df2.write.mode(SaveMode.Overwrite).json("/tmp/spark_output/zipcodes.json") 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.