1

I am trying to figure out what data type my column in a spark data frame is and manipulate the column based on that dedeuction.

Here is what I have so far:

import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.appName('MyApp').getOrCreate() df = spark.read.csv('Path To csv File',inferSchema=True,header=True) for x in df.columns: if type(x) == 'integer': print(x+": inside if loop") 

The print(x+": inside if loop") statement never seems to get executed but I am sure there are several columns that are integer data type. What am I missing here?

3 Answers 3

3

You are iterating over the names of your columns so type(x) will never equal "integer" (it's always a string).

You need to use pyspark.sql.DataFrame.dtypes

for x, t in df.dtypes: if t=="int": print("{col} is integer type".format(col=x)) 

It can also be useful to look at the schema using df.printSchema().

Sign up to request clarification or add additional context in comments.

Comments

2

You can try:

dict(df.dtypes)['column name'] == 'int' 

df.dtypes returns list of tuples and the easiest way to get the type as string for each column is to convert it to dict.

Comments

-3

Try:

if type(x) == int: 

type(x) doesn't return 'integers', they should return int for integers.

1 Comment

You should use isinstance(x, int) rather than type(x) == int.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.