Given an RDD with several key-value pairs, where each value is actually a list of values, how do I split the value lists so that I end up with simple key-value pairs?
from pyspark import SparkConf, SparkContext conf = SparkConf() sc = SparkContext(conf=conf) foo = sc.parallelize([(0,[1,1,4]),(1,[3,5])]) bar = foo.map(magic) bar.collect() >>>>[(0,1),(0,1),(0,4),(1,3),(1,5)] What would magic look like to achieve what I want?