0

In spark I have a dataframe of some fixed order:

agg_id,agg_key,agg_val,req_num,clk_num 

When I create similar table in cassandra the order of non key columns is not preserved:

CREATE TABLE mytable ( agg_id int, agg_key int, agg_val text, req_num bigint, clk_num bigint, PRIMARY KEY ((agg_id,agg_key), agg_val ) ) WITH CLUSTERING ORDER BY (agg_val asc) 

So when I run desc mytable it shows me the the wrong order (first clk_num, and then req_num)

So when the following code is running, the data inserted in wrong order

ds.write .format("org.apache.spark.sql.cassandra") .options(Map( "keyspace" -> "online_aggregation", "table" -> cassOutTable) ) .mode(SaveMode.Append) .save 

My question is how can I set the columns names here? can I add some property to the options Map? or slightly change the code so it will work correctly. One limitation - no changes of the DF itself (it might be output to multiple sources)

1 Answer 1

1

Just select the columns in the required order before write

ds .select("agg_id", "agg_key", ..., "clk_num") .write .format("org.apache.spark.sql.cassandra") .options(Map( "keyspace" -> "online_aggregation", "table" -> cassOutTable) ) .mode(SaveMode.Append) .save 
Sign up to request clarification or add additional context in comments.

1 Comment

Nice trick :-), But I don't know the order, I need to deduce it from the describe...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.