99 questions
1 vote
0 answers
55 views
Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location
I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in ...
0 votes
0 answers
191 views
How can I mark Databricks Job as Success Despite a Failed Autoloader Pipeline
Issue: I have a Databricks Workflow/job running a pytest test that is being marked as "Failed" because one of the Autoloader pipelines within it fails, despite the overall job succeeding and ...
0 votes
1 answer
140 views
Validating column names and order in Databricks Autoloader (PySpark) before writing to Delta table?
I am using Databricks Autoloader with PySpark to stream Parquet files into a Delta table. Here's a simplified version of what I am doing: spark.readStream \ .format("cloudFiles") \ ....
1 vote
0 answers
96 views
PySpark Autoloader: How to enforce schema and fail on mismatch?
Hi all I am using Databricks Autoloader with PySpark to ingest Parquet files from a directory. Here's a simplified version of my current setup: spark.readStream \ .format("cloudFiles") \ ....
0 votes
0 answers
164 views
why is databricks autoloader failing to merge new columns with schema evolution
I'm using databricks autoloader to load parquet files def run_autoloader(table_name, checkpoint_path, latest_file_location, new_columns): # Configure Auto Loader to ingest parquet data to a Delta ...
0 votes
0 answers
131 views
Why does databricks autoloader crash after error and how can I fix it?
We are using databricks autoloader to process parquet files into delta format. The job is scheduled to run once per day and the code looks like this: def run_autoloader(table_name, checkpoint_path, ...
0 votes
1 answer
335 views
Databricks Auto loader from tables in Delta Share
I am trying to read delta table in delta shares shared from other environments. The pipeline runs okay; however, as the delta table is update in the source (delta share in GCP), the code below gets ...
0 votes
2 answers
258 views
Does cloudFiles.backfillInterval Reprocess Every File in Source Every Time Autoloader Runs?
Im struggling to understand how to control the backfill process baked into Autoloader: https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/production.html#trigger-regular-...
-1 votes
1 answer
267 views
Autoloader Not Picking Up .text Files in Streaming Mode
I am using Databricks Autoloader to process files in streaming (micro-batch) mode. The source files are in .text format. While the checkpoints are created and the stream does not fail, the Delta table ...
0 votes
0 answers
795 views
empty parquet files in autoloader databricks causing a read error
Looking for a solution to ingesting empty parquet files into databricks using autoloader into unity catalog delta tables without causing the stream to fail. we use a batch process that processes ...
0 votes
0 answers
186 views
Databricks Autoloader long initialization
We use the Autoloader pattern in Databricks to fill from raw files in our storage account into Delta tables. Though successfull for almost all tables, for 2 tables we face strange behaviour. It takes ...
1 vote
0 answers
213 views
Setup Databricks Autoloader Event Grid Subscription with No Expiry on TTL
I intend to use Autoloader in file notification mode. I want to control the naming of the event grid subscription and storage queues, so I am using the module recommended in the official docs to ...
0 votes
2 answers
616 views
Databricks Autoloader batch mode
I am seeking guidance on handling full load scenarios within Databricks using Autoloader. Please don't go too hard on me, since I lack practical experience at this point in time. My scenario is a ...
0 votes
1 answer
204 views
Change Tracking Using Databricks Autoloader and ForEachBatch
I am using Autoloader in Trigger Once mode to load Parquet files from an S3 location. My goal is to implement change data capture by comparing the source and target Delta tables to identify and ...
3 votes
0 answers
638 views
Databricks Autoloader Schema Hint are not taken into consideration in schema file
I am using Autoloader with Schema Inference to automatically load some data into S3. I have one column that is a Map which is overwhelming Autoloader (it tries to infer it as struct -> creating a ...