15

I am running spark in cluster mode and reading data from RDBMS via JDBC.

As per Spark docs, these partitioning parameters describe how to partition the table when reading in parallel from multiple workers:

  • partitionColumn
  • lowerBound
  • upperBound
  • numPartitions

These are optional parameters.

What would happen if I don't specify these:

  • Only 1 worker read the whole data?
  • If it still reads parallelly, how does it partition data?
2

1 Answer 1

30

If you don't specify either {partitionColumn, lowerBound, upperBound, numPartitions} or {predicates} Spark will use a single executor and create a single non-empty partition. All data will be processed using a single transaction and reads will be neither distributed nor parallelized.

See also:

Sign up to request clarification or add additional context in comments.

1 Comment

what about writing through JDBC, df.write.mode(SaveMode.Append).jdbc("<other database url>", "<same table name>", <some DbProperties>)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.