11

I have a dataframe with 15 columns (4 categorical and the rest numeric).

I have created dummy variables for every categorical variable. Now I want to find the number of variables in my new dataframe.

I tried calculating length of printSchema(), but is NoneType:

print type(df.printSchema()) 

2
  • What have you tried? Have you searched the web? Commented Mar 15, 2017 at 9:17
  • 1
    try to check len(df.columns) Commented Mar 15, 2017 at 9:23

1 Answer 1

24

You are finding it wrong way, Here is sample example for this and about printSchema:-

df = sqlContext.createDataFrame([ (1, "A", "X1"), (2, "B", "X2"), (3, "B", "X3"), (1, "B", "X3"), (2, "C", "X2"), (3, "C", "X2"), (1, "C", "X1"), (1, "B", "X1"), ], ["ID", "TYPE", "CODE"]) # Python 2: print len(df.columns) #3 # Python 3 print(len(df.columns)) #3 

columns provides list of all columns and we can check len. Instead printSchema prints schema of df which have columns and their data type, ex below:-

root |-- ID: long (nullable = true) |-- TYPE: string (nullable = true) |-- CODE: string (nullable = true) 
Sign up to request clarification or add additional context in comments.

3 Comments

On pyspark console len(df.columns) is enough, not needed print.
Really hope there's an OOP solution like .length or .size, etc.
What about RDD? IF I have RDD not dataframe, how to display number of columns @Rakesh Kumar @ chuck

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.