How to check the schema of PySpark DataFrame?

In PySpark, to check the schema of a DataFrame, you can use the printSchema() method. This method prints the schema in a tree format, which includes column names, data types, and other metadata (like if a column can be null).

Here's an example:

from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder.appName("schema_example").getOrCreate() # Sample data data = [("John", 28), ("Sara", 30), ("Mike", 25)] columns = ["Name", "Age"] # Create DataFrame df = spark.createDataFrame(data, columns) # Check schema df.printSchema()

This will produce an output similar to:

root |-- Name: string (nullable = true) |-- Age: long (nullable = true)

If you want the schema in a programmatically accessible format (instead of just printing it), you can use the schema property:

schema = df.schema print(schema)

This will return a StructType object that you can further process or inspect.

More Tags

html2canvas cloud mobile-safari macos-sierra core-graphics fragmentmanager git-log angular-validator pkcs#12 selectsinglenode

How to check the schema of PySpark DataFrame?

More Tags

More Programming Guides

Other Guides

More Programming Examples

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators