1

I've zip files in my container and I would get one or more files everyday and as they come in, I want to process the files. I have some questions.

  1. Can I use Databricks autoloader feature to process zip files? Is zip file supported by Autoloader?

  2. What settings need to be enabled to use Autoloader? I have my container and sas token.

  3. Once the zip file is processed (unzip, read each of the file in the zip file), I should not read the zip again. How can I do this when I use Autoloader? Is there any specific setting?

  4. Are there any samples available? I'm new to this area and trying to get more info.

2
  • You are talking about storage conatiner or compute conatiner? Commented May 2, 2022 at 7:54
  • I'm using Azure Storage Blob container. Any help will be useful. Thanks! Commented May 2, 2022 at 14:56

2 Answers 2

1

Unfortunately, processing of Zip file using Azure DataBrick is not possible. Auto Loader supports two modes for detecting new files: directory listing and file notification.

Auto Loader provides a Structured Streaming source called cloudFiles. Given an input directory path on the cloud file storage, the cloudFiles source automatically processes new files as they arrive, with the option of also processing existing files in that directory.

Auto Loader can scale to loading data from storage accounts that contain billions of files that need to be backfilled to pipelines where millions of files are loaded in an hour.

For more information you can refer this Microsoft Document

Sign up to request clarification or add additional context in comments.

Comments

0

Autoloader can read compressed files directly. There is no need to unzip them and no special Autoloader option required. Just configure the same as if they were uncompressed.

Autoloader uses the checkpoint folder to remember what files it has processed.

1 Comment

question is specifically about zip-compressed files. Spark & Autoloader support gzip primarily...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.