Skip to main content
1 vote
1 answer
48 views

I have a code of pyspark streaming. Which is in following: parsed_df = df.selectExpr("CAST(value AS STRING) as message", "timestamp") \ .select( from_json(col("...
M_Gh's user avatar
  • 1,172
0 votes
0 answers
73 views

I am running a structured streaming workflow in Databricks that is reading data stream from Kinesis. And then I am writing this data to external delta table (on s3). This workflow is running 4 streams ...
DumbCoder's user avatar
  • 515
0 votes
0 answers
86 views

I have a spark Dataframe of about ~60 columns. There are multi level structs in this schema, I have to flatten this dataframe which flattens it close ~1500 columns. The flattening logic is typical to ...
Tom Slayer's user avatar
1 vote
0 answers
96 views

Hi all I am using Databricks Autoloader with PySpark to ingest Parquet files from a directory. Here's a simplified version of my current setup: spark.readStream \ .format("cloudFiles") \ ....
Zeruno's user avatar
  • 1,689
0 votes
0 answers
50 views

I have Spark notebooks for ingesting data from queue(kafka etc) to the bronze layer tables(for instance lh_bronze.order_delta) in the lakehouse. These notebooks are ingesting cdc data. for backfill(...
user125687's user avatar
0 votes
0 answers
28 views

I'm trying to realize to-protobuf transformation inside Spark Streaming code, which read data from kafka topic. Income dataframe: readStreamFromKafka(config).writeStream .foreachBatch { (...
Jelly's user avatar
  • 1,434
0 votes
0 answers
19 views

How to setup dynamic allocation for a spark job which is having data rate about 450k? I tried with the below configurations, but the executor pods are always running with the max executors and it's ...
Yashini's user avatar
0 votes
0 answers
164 views

I'm using databricks autoloader to load parquet files def run_autoloader(table_name, checkpoint_path, latest_file_location, new_columns): # Configure Auto Loader to ingest parquet data to a Delta ...
Boris's user avatar
  • 906
0 votes
1 answer
79 views

I have the bewlow code that fails when Im attampting to do the stream stream left outer joins. @dlt.view def vw_ix_f_activity_gold(): return ( spark.readStream .option("...
play_something_good's user avatar
1 vote
0 answers
51 views

I'm trying to read documents from mongodb Into databricks using spark structured streaming. I'm using to_json() to convert whole document to string. When using this, the schema evolution is working ...
PATHURI B's user avatar
0 votes
0 answers
49 views

I have been using Spark v3.5 Spark Stream functionality for the below use case. I am observing the issue below on one of the environments with Spark Stream. Please if I can get some assistance with ...
Saurabh Agrawal's user avatar
0 votes
0 answers
150 views

Use Case: I am loading the Bronze layer using an external tool, which automatically creates bronze Delta tables in Databricks. However, after the initial load, I need to manually enable changeDataFeed ...
play_something_good's user avatar
-1 votes
1 answer
190 views

I've inherited some Spark code and am having trouble understanding "why does this even work". Short-short is that I load a DF with info from a delta table, and then join that to a streaming ...
Troy Terry's user avatar
0 votes
1 answer
101 views

Objective I plan to use Delta Live Tables (DLT) to deliver near real-time reporting in Power BI. Current Setup I load Bronze Delta tables every 1 minute using Fivetran. These Bronze tables serve as ...
play_something_good's user avatar
0 votes
1 answer
143 views

Use Case I am ingesting data using Fivetran, which syncs data from an Oracle database directly into my Databricks table. Fivetran manages the creation, updates, and inserts on these tables. As a ...
play_something_good's user avatar

15 30 50 per page
1
2 3 4 5
373