1

I have the following script to run a SQL query:

val df_joined_sales_partyid = spark.sql(""" SELECT a.sales_transaction_id, b.customer_party_id, a.sales_tran_dt FROM df_sales_tran a JOIN df_sales_tran_party_xref b ON a.sales_transaction_id = b.sales_transaction_id Limit 3""") 

I want to know how I can save the output of this query as a permanent data-frame table. I noticed that every time that I run display(df_joined_sales_partyid), it seems to run the query again. How do I avoid running the query multiple times and save the results to a data-frame table. I am new to writing Scala so forgive me if this is an easy question, but I couldn't find a solution online.

1 Answer 1

1
// caches results in memory df_joined_sales_partyid.cache() // or // memory and disk, see https://spark.apache.org/docs/2.4.0/api/java/index.html?org/apache/spark/storage/StorageLevel.html for other possible values df_joined_sales_partyid.persist(StorageLevel.MEMORY_AND_DISK) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.