18

Has been discussed that the way to find the column datatype in pyspark is using df.dtypes get datatype of column using pyspark. The problem with this is that for datatypes like an array or struct you get something like array<string> or array<integer>.

Question: Is there a native way to get the pyspark data type? Like ArrayType(StringType,true)

1 Answer 1

36

Just use schema:

df.schema[column_name].dataType 
Sign up to request clarification or add additional context in comments.

3 Comments

dict(df.dtypes)[column_name] also works (thanks @RobinL). But note you'll get the datatype as a string name rather than the formal type name; e.g., timestamp vs TimestampType.
@snark - you mean a str instead of a type object, like TimestampType()? Thanks for the comment, in the end I might want the string ;)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.