5

I am new to Pyspark and I am actually trying to build a flatmap out of a Pyspark RDD object. However, even if this function clearly exists for pyspark RDD class, according to the documentation, I can't manage to use it and get the following error :

AttributeError: 'RDD' object has no attribute 'flatmap' 

I am calling the latter function in the following line :

my_rdd = my_rdd.flatmap(lambda r: (r[5].split('|'))) 

The imports are the followings :

from pyspark.sql import * from pyspark.sql.functions import * from pyspark.sql import SparkSession from pyspark import SparkContext as sc from pyspark import SparkFiles spark = SparkSession.builder.getOrCreate() 

Additionaly, some other functions, as my_rdd.count are working, which let me think that the SparkContext is correctly implemented.

Do you have any ideas about the reason why it could fail ?

1 Answer 1

11
my_rdd = my_rdd.flatMap(lambda r: (r[5].split('|'))) 

uppercase !!!

Sign up to request clarification or add additional context in comments.

5 Comments

These things happen, the forest for the trees
I'm still experiencing AttributeError: 'DataFrame' object has no attribute 'flatMap'
Not how it works. @jeremy
This post, call this function on DataFrame: stackoverflow.com/a/37955947/3710514
This question was about rdd's not dataframes. Pls post a new question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.