3

I have the following code:

df = sql_context.sql("select * from table").cache() first_df = df.where(df.id>10) second_df = df.where(df.city = 'NY') third_df = df.where(df.x == 5) unioned_df = first_df.union(second_df).union(third_df) unioned_df.format('csv').save(path) 

Because my code has only one action (write to csv). Is there a point for caching df?
Please ignore the fact that this filters could be done all together.
I did it like this in purpose in order to understand how the cache mechanism work in the backgorund.

1

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.