Assume that I have a column A, every row is a list that contains:
[{"a":"1", "b":"2", "c":"3"}, {"a":"2", "b":"5", "c":"7"}] How do I count the number of "a"s?
I would like a solution like F.map().
Many thanks
Edited Answer:
Adjusting based on comment from OP. To get the occurrences of a particular key in a list of dictionaries, you can still use list comprehension (with a few adjustments):
A = [{"a":"1", "b":"2", "c":"3"}, {"a":"2", "b":"5", "c":"7"}] A_count = len([y for x in A for y in x if y == 'a']) print(A_count) Output:
2 We're essentially using the same logic, just in this case we're using nested list comprehension. x first iterates through A (the dictionaries), and y iterates through x (specifically, the keys in each dictionary). Finally, we use an if condition to make sure the key matches the specified value.
Old Answer: Not really sure this provides a solution like "map", but you can use list comprehension which is fairly straightforward:
A = [{"a":"1", "b":"2", "c":"3"}, {"a":"2", "b":"5", "c":"7"}] A_sum = sum([int(x['a']) for x in A]) print(A_sum) Output:
3 Explanation:
Essentially we are collecting the dictionary values based on your given key of 'a', parsing that value to a string, and then using sum to add all the resulting values in that list. Some good reference material is on W3Schools.
You can use a udf to achieve this , Assuming each row as you mentioned is a list with dictionaries -
import pyspark from pyspark.sql import SQLContext import pyspark.sql.functions as F from functools import partial temp_df = spark.createDataFrame( [ [[{"a":"1", "b":"2", "c":"3"}, {"a":"2", "b":"5", "c":"7"}]], [[{"a":"10", "b":"2", "c":"3"}, {"a":"20", "b":"5", "c":"7"}]], [[{"a":"10", "b":"2", "c":"3"}, {"a":"20", "b":"5", "c":"7"}]], ], ["A"] ) def key_occurence(inp,key=None): res = 0 for d in inp: if key in d: res += 1 return res partial_func = partial(key_occurence, key="a") key_occurence_udf = F.udf(partial_func,"int") temp_df = temp_df.withColumn("A_occurence",key_occurence_udf("A")) temp_df.show() +--------------------+-----------+ | A|A_occurence| +--------------------+-----------+ |[[a -> 1, b -> 2,...| 2| |[[a -> 10, b -> 2...| 2| |[[a -> 10, b -> 2...| 2| +--------------------+-----------+ The udf additionally takes in a argument to check for the corresponding key