4

is there anyway of reading files located in my local machine other than navigating to 'Data'> 'Add Data' on Databricks.

in my past experience using Databricks, when using s3 buckets, I was able to just read and load a dataframe by just specifying the path like so: i.e

df = spark.read.format('delta').load('<path>')

is there any way i can do something like this using databricks to read local files?

1 Answer 1

0

If you use the Databricks Connect client library you can read local files into memory on a remote Databricks Spark cluster. See details here.

The alternative is to use the Databricks CLI (or REST API) and push local data to a location on DBFS, where it can be read into Spark from within a Databricks notebook. A similar idea would be to use the AWS CLI to put local data into an S3 bucket that can be accessed from Databricks.

It sounds like what you are looking for is Databricks Connect, which works with many popular IDEs.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, databricks CLI is the option from where you can copy your file from local system to DBFS or you can move you file to github and use some url module you download in dbfs and uses thatone.
@Raphael I don't think databricks-connect will actually allow you to do this. Best you can do is call toPandas() on a spark dataframe to get it to local, but not access any data on the cluster freely. If there's a spot in the documentation that shows how, adding it here would help.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.