I am trying to unify two distincts notebooks into a single one by adapting the logic of construction of two different tables.
The first one reads:
spark.sql(''' SELECT CD_CLI, MAX(VL_RPTD_UTZO) AS MAX_VL_RPTD_UTZO, '2017-01-31' AS DT_MVTC FROM vl_rptd_utzo WHERE DT_EXTC BETWEEN '2016-07-31' AND '2016-12-31' GROUP BY CD_CLI ''').createOrReplaceTempView('vl_rptd_max_utzo_2017_01_31') And the second one:
spark.sql('''SELECT CD_CLI, CASE WHEN SUM(in_lim_crt) > 0 THEN ROUND(SUM(SUM_VL_TTL_FAT)/SUM(in_lim_crt), 4) ELSE -99999999999 END AS VL_MED_FAT, '2017-01-31' as DT_MVTC FROM in_lim_fat WHERE DT_MVTC BETWEEN '2016-07-31' AND '2016-12-31' GROUP BY CD_CLI ''').createOrReplaceTempView('media_vl_fatura_2017_01_31') My, perhaps naive?, approach was to union the two selects since they are calling the same fields from the same origin:
spark.sql('''SELECT CD_CLI, CASE WHEN SUM(in_lim_crt) > 0 THEN ROUND(SUM(SUM_VL_TTL_FAT)/SUM(in_lim_crt), 4) ELSE -99999999999 END AS VL_MED_FAT, '2017-01-31' as DT_MVTC FROM in_lim_fat WHERE DT_MVTC BETWEEN '2016-07-31' AND '2016-12-31' GROUP BY CD_CLI UNION SELECT CD_CLI, MAX(VL_RPTD_UTZO) AS MAX_VL_RPTD_UTZO, '2017-01-31' AS DT_MVTC FROM vl_rptd_utzo WHERE DT_EXTC BETWEEN '2016-07-31' AND '2016-12-31' GROUP BY CD_CLI ''').createOrReplaceTempView('new_table') But when I ask for a describe:
spark.sql('describe new_table').show(10, False) The output is:
+----------+-------------+-------+ |col_name |data_type |comment| +----------+-------------+-------+ |CD_CLI |int |null | |VL_MED_FAT|decimal(38,4)|null | |DT_MVTC |string |null | +----------+-------------+-------+ Why MAX_VL_RPTD_UTZO is not showing up in the new table? I am new to sql and maybe this is very naive and simple but I cannot solve it.
select c1, c2, c3, nulletc to get 4 columns. And perhaps you want UNION ALL?