Saving Scala SQL Output as DataFrame

Question

I have the following script to run a SQL query:

val df_joined_sales_partyid = spark.sql(""" SELECT a.sales_transaction_id, b.customer_party_id, a.sales_tran_dt FROM df_sales_tran a JOIN df_sales_tran_party_xref b ON a.sales_transaction_id = b.sales_transaction_id Limit 3""")

I want to know how I can save the output of this query as a permanent data-frame table. I noticed that every time that I run display(df_joined_sales_partyid), it seems to run the query again. How do I avoid running the query multiple times and save the results to a data-frame table. I am new to writing Scala so forgive me if this is an easy question, but I couldn't find a solution online.

Denis Makarenko · Accepted Answer · 2019-04-18 20:51:35Z

// caches results in memory df_joined_sales_partyid.cache() // or // memory and disk, see https://spark.apache.org/docs/2.4.0/api/java/index.html?org/apache/spark/storage/StorageLevel.html for other possible values df_joined_sales_partyid.persist(StorageLevel.MEMORY_AND_DISK)

Collectives™ on Stack Overflow

Saving Scala SQL Output as DataFrame

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related