I am facing a problem while reading a csv file with a curious column.
Schema
root |-- Id: integer (nullable = true) |-- Lon_tower: double (nullable = true) |-- Lat_tower: double (nullable = true) |-- Compagny: string (nullable = true) |-- Address_tower: string (nullable = true) |-- Assigned_band_1: string (nullable = true) |-- Assigned_band_2: string (nullable = true) |-- Assigned_band_3: string (nullable = true) |-- Assigned_band_4: string (nullable = true) |-- Assigned_band_5: string (nullable = true) |-- raw_geocode: string (nullable = true) raw_geocode sample
[{'road': 'Calle el Topo', 'residential': 'Los Sauces', 'hamlet': 'El Cardal', 'village': 'Los Sauces', 'city': 'San Andrés y Sauces', 'county': 'Santa Cruz de Tenerife', 'archipelago': 'Canarias', 'postcode': '38720', 'country': 'España', 'country_code': 'es'}] I would like get the key as headers and fill the sparkdataframe with the value or Null if the key doesn't exist for this row. I don't want all the key but only some in a list. I removed the [ ' ]
An example to better understand:
myList = ['road', 'tourism', 'country_code'] |Id |...|raw_geocode | |1 |...|{road: Calle el Topo, archipelago: Canarias, postcode: 38720, country_code: es} |2 |...|{tourism: Mirador Montaña El Molino, road: Mirador Montaña El Molino, village: Barlovento, country_code: es} Desired result
|ID |...|road |tourism |country_code| |1 |...|Calle el Topo |NULL |es |2 |...|Null |Mirador Montaña El Molino |es