0
list1 = ['SO', 'AE', 'AP'] list2 = ['NM', 'NV', 'OR'] 

I want to create a dictionary from this with adding defined values for each list so it should be:

list1's value = 'Midwest' list2's value = 'Northeast' map = { 'SO': 'Midwest', 'AE': 'Midwest', 'AP': 'Midwest', 'NM': 'Northeast', 'NV': 'Northeast', 'OR': 'Northeast', } 

I'm new to PySpark, not able to figure out how to solve it.

Thanks

2
  • Is this just a Python problem rather than spark? I don't see any dataframes Commented Nov 10, 2020 at 13:18
  • @mck I'm working on Pyspark, so need to create a dictionary from different lists which I'm further using to map values in a data frame from this dictionary using pyspark Commented Nov 10, 2020 at 13:35

1 Answer 1

0

Here's an example. This has nothing to do with Spark though.

>>> list1 = ['SO', 'AE', 'AP'] >>> list2 = ['NM', 'NV', 'OR'] >>> dict1 = {k: "Midwest" for k in list1} >>> dict1 {'SO': 'Midwest', 'AE': 'Midwest', 'AP': 'Midwest'} >>> dict2 = {k: "Northeast" for k in list2} >>> dict2 {'NM': 'Northeast', 'NV': 'Northeast', 'OR': 'Northeast'} >>> dict3 = {**dict1, **dict2} >>> dict3 {'SO': 'Midwest', 'AE': 'Midwest', 'AP': 'Midwest', 'NM': 'Northeast', 'NV': 'Northeast', 'OR': 'Northeast'} 

Alternatively, if line dict3 = {**dict1, **dict2} is giving you a SyntaxError (meaning you're using some pretty old Python, i.e. <3.5 or 2.x which is no longer supported), you can merge the dicts like this:

>>> dict3 = dict1.copy() >>> dict3.update(dict2) >>> dict3 {'AP': 'Midwest', 'SO': 'Midwest', 'NM': 'Northeast', 'AE': 'Midwest', 'OR': 'Northeast', 'NV': 'Northeast'} 
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks @Czaporka, as working on PySpark still struggling on how to merge these two dictionary, any suggestions.
@SkyMonster the code samples you posted are pure Python, so I gave you a pure Python solution. This will also work perfectly fine in PySpark, except it doesn't use Spark's parallel computation capabilities, so it's not suitable for huge amounts of data. In order to leverage Spark you'll need to operate on RDDs / DataFrames instead of dicts and lists.
@SkyMonster I guess you already have some DataFrame somewhere and just want to use this dict as a mapping for appending a new column, e.g. "region"? In this case it does sound reasonable to store it as a dict. Recently I answered a question about a similar use case, maybe you'll find it useful: link.
Yes, I have created a udf to create a new column in the table: region_map is my final dict,,,,,, user_func = udf (lambda x: region_map.get(x), StringType()) df= df.withColumn('region',user_func(df.state_name)) But the point is I have to make the final dictionary to make it work based on runtime values for each dictionary, (dict value are not pre-defined)
Ok in this case you'll have to add some more code to your question, showing where the runtime values come from, and explain what exactly is your problem because the way the question is formulated right now, my answer pretty much exhausts the topic.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.