I have the following function which I need to test -
private HashMap<String, Dataset<Row>> getDataSources(SparkSession spark) { HashMap<String, Dataset<Row>> ds = new HashMap<String, Dataset<Row>>(); Dataset<Row>dimTenant = spark.table(dbName + "." + SparkConstants.DIM_TENANT) .select("tenant_key", "tenant_id"); Map<String, String> options = new HashMap<>(); mockOptions.put("table", bookValueTable); mockOptions.put("zkUrl", zkUrl); Dataset<Row> bookValue = spark.read().format("org.apache.phoenix.spark") .options(options) .load(); ds.put("dimTenant", dimTenant); ds.put("bookValue", bookValue); return ds; } In this case, I actually need to execute spark.table but need to mock the output of spark.read().format(formatParam).options(optionsParam).load() based on formatParam and optionsParam
How can I achieve this?
Initially I started with mocking DataframeReader.class with deep stub (answer) but turns out spark.table itself calls spark.read().table hence spark.table got impacted. Then I tried spying the spark.read() object but since a new object is generated during the call, hence it also didn't work.