Pyspark - create a single dictionary from two key list adding defined value for the list

Question

list1 = ['SO', 'AE', 'AP'] list2 = ['NM', 'NV', 'OR']

I want to create a dictionary from this with adding defined values for each list so it should be:

list1's value = 'Midwest' list2's value = 'Northeast' map = { 'SO': 'Midwest', 'AE': 'Midwest', 'AP': 'Midwest', 'NM': 'Northeast', 'NV': 'Northeast', 'OR': 'Northeast', }

I'm new to PySpark, not able to figure out how to solve it.

Thanks

Is this just a Python problem rather than spark? I don't see any dataframes — mck
– mck, Commented Nov 10, 2020 at 13:18
@mck I'm working on Pyspark, so need to create a dictionary from different lists which I'm further using to map values in a data frame from this dictionary using pyspark — Sky Monster
– Sky Monster, Commented Nov 10, 2020 at 13:35

Czaporka · Accepted Answer · 2020-11-11 09:53:27Z

0

Here's an example. This has nothing to do with Spark though.

>>> list1 = ['SO', 'AE', 'AP'] >>> list2 = ['NM', 'NV', 'OR'] >>> dict1 = {k: "Midwest" for k in list1} >>> dict1 {'SO': 'Midwest', 'AE': 'Midwest', 'AP': 'Midwest'} >>> dict2 = {k: "Northeast" for k in list2} >>> dict2 {'NM': 'Northeast', 'NV': 'Northeast', 'OR': 'Northeast'} >>> dict3 = {**dict1, **dict2} >>> dict3 {'SO': 'Midwest', 'AE': 'Midwest', 'AP': 'Midwest', 'NM': 'Northeast', 'NV': 'Northeast', 'OR': 'Northeast'}

Alternatively, if line dict3 = {**dict1, **dict2} is giving you a SyntaxError (meaning you're using some pretty old Python, i.e. <3.5 or 2.x which is no longer supported), you can merge the dicts like this:

>>> dict3 = dict1.copy() >>> dict3.update(dict2) >>> dict3 {'AP': 'Midwest', 'SO': 'Midwest', 'NM': 'Northeast', 'AE': 'Midwest', 'OR': 'Northeast', 'NV': 'Northeast'}

edited Nov 11, 2020 at 9:53

answered Nov 10, 2020 at 13:36

Czaporka

2,4363 gold badges13 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Sky Monster Over a year ago

Thanks @Czaporka, as working on PySpark still struggling on how to merge these two dictionary, any suggestions.

Czaporka Over a year ago

@SkyMonster the code samples you posted are pure Python, so I gave you a pure Python solution. This will also work perfectly fine in PySpark, except it doesn't use Spark's parallel computation capabilities, so it's not suitable for huge amounts of data. In order to leverage Spark you'll need to operate on RDDs / DataFrames instead of dicts and lists.

Czaporka Over a year ago

@SkyMonster I guess you already have some DataFrame somewhere and just want to use this dict as a mapping for appending a new column, e.g. "region"? In this case it does sound reasonable to store it as a dict. Recently I answered a question about a similar use case, maybe you'll find it useful: link.

Sky Monster Over a year ago

Yes, I have created a udf to create a new column in the table: region_map is my final dict,,,,,, user_func = udf (lambda x: region_map.get(x), StringType()) df= df.withColumn('region',user_func(df.state_name)) But the point is I have to make the final dictionary to make it work based on runtime values for each dictionary, (dict value are not pre-defined)

Czaporka Over a year ago

Ok in this case you'll have to add some more code to your question, showing where the runtime values come from, and explain what exactly is your problem because the way the question is formulated right now, my answer pretty much exhausts the topic.

|

Collectives™ on Stack Overflow

Pyspark - create a single dictionary from two key list adding defined value for the list

1 Answer 1

9 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Linked

Related