How to check the schema of PySpark DataFrame?

How to check the schema of PySpark DataFrame?

In PySpark, to check the schema of a DataFrame, you can use the printSchema() method. This method prints the schema in a tree format, which includes column names, data types, and other metadata (like if a column can be null).

Here's an example:

from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.appName("schema_example").getOrCreate() # Sample data data = [("John", 28), ("Sara", 30), ("Mike", 25)] columns = ["Name", "Age"] # Create DataFrame df = spark.createDataFrame(data, columns) # Check schema df.printSchema() 

This will produce an output similar to:

root |-- Name: string (nullable = true) |-- Age: long (nullable = true) 

If you want the schema in a programmatically accessible format (instead of just printing it), you can use the schema property:

schema = df.schema print(schema) 

This will return a StructType object that you can further process or inspect.


More Tags

html2canvas cloud mobile-safari macos-sierra core-graphics fragmentmanager git-log angular-validator pkcs#12 selectsinglenode

More Programming Guides

Other Guides

More Programming Examples