Spark Data Frames - Check if column is of type integer

Question

I am trying to figure out what data type my column in a spark data frame is and manipulate the column based on that dedeuction.

Here is what I have so far:

import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.appName('MyApp').getOrCreate() df = spark.read.csv('Path To csv File',inferSchema=True,header=True) for x in df.columns: if type(x) == 'integer': print(x+": inside if loop")

The print(x+": inside if loop") statement never seems to get executed but I am sure there are several columns that are integer data type. What am I missing here?

pault · Accepted Answer · 2018-04-11 23:22:32Z

You are iterating over the names of your columns so type(x) will never equal "integer" (it's always a string).

You need to use pyspark.sql.DataFrame.dtypes

for x, t in df.dtypes: if t=="int": print("{col} is integer type".format(col=x))

It can also be useful to look at the schema using df.printSchema().

Eddmik · Accepted Answer · 2019-05-23 07:31:30Z

You can try:

dict(df.dtypes)['column name'] == 'int'

df.dtypes returns list of tuples and the easiest way to get the type as string for each column is to convert it to dict.

kkcheng · Accepted Answer · 2018-04-11 20:39:09Z

-3

Try:

if type(x) == int:

type(x) doesn't return 'integers', they should return int for integers.

answered Apr 11, 2018 at 20:39

kkcheng

4224 silver badges8 bronze badges

1 Comment

fferri Over a year ago

You should use isinstance(x, int) rather than type(x) == int.

Collectives™ on Stack Overflow

Spark Data Frames - Check if column is of type integer

3 Answers 3

Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Linked

Related