i am using Spark and scala 2.4
My salesperson Dataframe looks like: it has total 54 salesperson, i took example of only 4 column
Schema of SalesPerson table.
root |-- col: struct (nullable = false) | |-- salesperson_4: string (nullable = true) | |-- salesperson_10: string (nullable = true) | |-- salesperson_11: string (nullable = true) | |-- salesperson_21: string (nullable = true) Data of Salesperson Table.
+--------------+--------------+--------------+--------------+ |salesperson_4 |salesperson_10|salesperson_11|salesperson_21| +--------------+--------------+--------------+--------------+ | Customer_933 | Customer_1760| Customer_454 | Customer_127 | |Customer_1297 |Customer_2411 |Customer_158 |Customer_2703 | |Customer_861 |Customer_1550 |Customer_812 |Customer_2976 | +--------------+--------------+--------------+--------------+
My salesType dataframe looks like
Schema of salesType
root |-- Type: string (nullable = true) |-- Customer: string (nullable = true) Data of salesType
|Type |customer | +------+-------------+ |Online|Customer_933 | |inshop|Customer_933| |inshop|Customer_1297| |Online|Customer_2411| |Online|Customer_2411| |Online|Customer_1550| |Online|Customer_2976| |Online|Customer_812 | |Online|Customer_812 | |inshop|Customer_127 | +------+-------------+ i am trying to check which all customer from Salesperson table are available in SalesType table. with two additional column, which shows customer belong to specific salespersonand count of customer occurance in SalesPlace table. Basically all customer from salesperson table and it existance in SalesType table
Expected Output: +------+-------------++------+-------------++------+-------------+ CustomerBelongstoSalesperson|Customer |occurance| salesperson_4 |Customer_933 |2 salesperson_10 |Customer_2411|2 salesperson_4 |Customer_1297|1 salesperson_10 |Customer_1550|1 SalesPerson_21 |Customer_2976|1 SalesPerson_11 |Customer_812 |2 SalesPerson_21 |Customer_127 |1 salesperson_4 |Customer_861 |0 salesperson_10 |Customer_1760|0 SalesPerson_11 |Customer_454 |0 SalesPerson_11 |Customer_158 |0 SalesPerson_21 |Customer_2703|0 +------+-------------++------+-------------++------+-------------+ Code:
val stringCol = df1.columns.map(c => s"'$c', cast(`$c` as string)").mkString(", ") val processedDF = df1.selectExpr(s"stack(${df1.columns.length}, $stringCol) as (Salesperson, Customer)") processedDF.show(false) processedDF.join(df2, Seq("Customer"), "left") .groupBy("Customer") .agg(count("Place").as("Occurance"), first("Salesperson").as("Salesperson")) .show(false) 

.agg(count("Place")Place value is not present anywhere in the input provided. so that I can update the answer after validating it.Placewhich name you replaced?