I had created a bucketed table using below command in Spark:
df.write.bucketBy(200, "UserID").sortBy("UserID").saveAsTable("topn_bucket_test") Size of Table : 50 GB
Then I joined another table (say t2 , size :70 GB)(Bucketed as before ) with above table on UserId column . I found that in the execution plan the table topn_bucket_test was being sorted (but not shuffled) before the join and I expected it to be neither shuffled nor sorted before join as it was bucketed. What can be the reason ? and how to remove sort phase for topn_bucket_test?