Skip to main content
AI Assist is now on Stack Overflow. Start a chat to get instant answers from across the network. Sign up to save and share your chats.
0 votes
0 answers
111 views

I am trying to migrate Delta Tables to Iceberg using Scala-Spark keeping the data intact, following this - https://iceberg.apache.org/docs/1.4.3/delta-lake-migration/ Here is the sample code (...
Pramit Pakhira's user avatar
0 votes
1 answer
40 views

We have a use-case wherein we need to cache certain data that has been processed so that Spark does not reprocess the same data in the event of task failures. So say we have a thousand Foo objects for ...
mandar's user avatar
  • 125
1 vote
1 answer
64 views

i am trying to register custom code(for map) like below val session: CqlSession = CassandraConnector.apply(spark.sparkContext).openSession() val codecRegistry: MutableCodecRegistry = session....
Shivam Sajwan's user avatar
0 votes
1 answer
52 views

i want to partition/group rows for every group of size <= limit for example, if i have: +--------+----------+ | id| size| +--------+----------+ | 1| 3| | 2| 6| ...
Gary Chan Chi Hang's user avatar
0 votes
1 answer
43 views

I have two dataframes that have 300 columns and 1000 rows each. They have the same column names. The values are of mixed datatypes like Struct/List/Timestamp/String/etc. I am trying to compare the ...
Noob's user avatar
  • 77
2 votes
1 answer
1k views

Getting the following error while creating a delta table using scalaspark. _delta_log is getting created at the warehouse but it lands into this error after _delta_log creation Exception in thread &...
Sarthak Sharma's user avatar
1 vote
3 answers
1k views

I am trying to run use Intellij to build spark applications written in scala. I get the following error when I execute the scala program: Exception in thread "main" java.lang....
PRATIK CHAPADGAONKAR's user avatar
0 votes
0 answers
79 views

i'm newbie to Scala Spark programming. I have to build a Recommendation System for movies in Scala Spark with the usage of Google Cloud Platform. The dataset is composed by (movie_id, user_id, rating) ...
Luca Genova's user avatar
0 votes
1 answer
73 views

I have a dataframe that looks like this | Column | |------------------------------------------------| |[{a: 2, b: 4}, {a: 2, b: 3}] | |-------...
vZ10's user avatar
  • 2,756
-1 votes
3 answers
111 views

I have the following dataset: |value| +-----+ | 1| | 2| | 3| I want to create a new column newValue that takes the value of newValue from the previous row and does something with it. For ...
Kewitschka's user avatar
  • 1,681
1 vote
2 answers
136 views

I have a Scala Spark dataframe with the schema: root |-- passengerId: string (nullable = true) |-- travelHist: array (nullable = true) | |-- element: integer (containsNull = true)...
Abishek's user avatar
  • 857
-1 votes
2 answers
127 views

I want to divide the quantity value into multiple rows divided by number of months from start date & end date column. Each row should have start date and end date of the month. I also want ...
isrikanthd's user avatar
0 votes
1 answer
2k views

For some weird reasons I need to get the column names of a dataframe and insert it as the first row(I cannot just import without header). I tried using for comprehension to create a dataframe that ...
Jiaming Pei's user avatar
1 vote
1 answer
609 views

I am working on a spark project and have some performance issue that I am struggling with, any help will be appreciated. I have a column Collection that is an array of struct: root |-- Collection: ...
Yue Wang's user avatar
0 votes
1 answer
211 views

I have a spark dataframe column (custHeader) in the below format and I want to extract the value of the key - phone into a separate column. trying to use the from_json function, but it is giving me a ...
marc's user avatar
  • 319

15 30 50 per page