Is there a point to cache dataframe if my code has only one action

I have the following code:

df = sql_context.sql("select * from table").cache() first_df = df.where(df.id>10) second_df = df.where(df.city = 'NY') third_df = df.where(df.x == 5) unioned_df = first_df.union(second_df).union(third_df) unioned_df.format('csv').save(path)

Because my code has only one action (write to csv). Is there a point for caching df?
Please ignore the fact that this filters could be done all together.
I did it like this in purpose in order to understand how the cache mechanism work in the backgorund.

asked May 10, 2021 at 15:08

dasilva555

1331 gold badge2 silver badges15 bronze badges

A detailed answer: stackoverflow.com/questions/44156365/when-to-cache-a-dataframe

Kafels
– Kafels

2021-05-10 17:02:10 +00:00
Commented May 10, 2021 at 17:02

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Is there a point to cache dataframe if my code has only one action [duplicate]

0

Linked

Hot Network Questions