I have pyspark dataframe with 3 columns. DDL of the hive table 'test1' is all having string data types. So if I do df.printSchema all are string data type as shown below,
>>> df = spark.sql("select * from default.test1") >>> df.printSchema() root |-- c1: string (nullable = true) |-- c2: string (nullable = true) |-- c3: string (nullable = true) +----------+--------------+-------------------+ |c1 |c2 |c3 | +----------+--------------+-------------------+ |April |20132014 |4 | |May |20132014 |5 | |June |abcdefgh |6 | +----------+--------------+-------------------+ Now I want to filter only those records which are of integer type in 'c2' column. So basically I need only first 2 records which are integer type like '20132014'. And exclude the other records.