I am new to Pyspark and I am actually trying to build a flatmap out of a Pyspark RDD object. However, even if this function clearly exists for pyspark RDD class, according to the documentation, I can't manage to use it and get the following error :
AttributeError: 'RDD' object has no attribute 'flatmap' I am calling the latter function in the following line :
my_rdd = my_rdd.flatmap(lambda r: (r[5].split('|'))) The imports are the followings :
from pyspark.sql import * from pyspark.sql.functions import * from pyspark.sql import SparkSession from pyspark import SparkContext as sc from pyspark import SparkFiles spark = SparkSession.builder.getOrCreate() Additionaly, some other functions, as my_rdd.count are working, which let me think that the SparkContext is correctly implemented.
Do you have any ideas about the reason why it could fail ?