How to Get substring from a column in PySpark Dataframe?

To extract a substring from a column in a PySpark DataFrame, you can use the substr function available in the pyspark.sql.functions module. This function allows you to specify the start position and the length of the substring you want to extract.

Here's a step-by-step guide:

Initialize PySpark:

from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName("Substring Extraction") \ .getOrCreate()

Create a sample DataFrame:

data = [("JohnDoe",), ("JaneSmith",), ("MikeBrown",)] df = spark.createDataFrame(data, ["name"]) df.show()

Extract a substring: Use the substr function to extract a substring. For example, to extract the first four characters from the name column:

from pyspark.sql.functions import col df_substring = df.withColumn("short_name", col("name").substr(1, 4)) df_substring.show()

This will extract characters starting at position 1 and of length 4 from the name column.

Here's the output you'll get:

+---------+----------+ | name|short_name| +---------+----------+ | JohnDoe| John| |JaneSmith| Jane| |MikeBrown| Mike| +---------+----------+

You can adjust the start position and length parameters in the substr function to extract different parts of the string as needed.

More Tags

gsutil notification-icons gatt send broadcast word-cloud hsts svg.js sqlite fragment-tab-host

How to Get substring from a column in PySpark Dataframe?

More Tags

More Programming Guides

Other Guides

More Programming Examples

Fitness Calculators

Auto Calculators

Financial Calculators

Date and Time Calculators

Internet Calculators

Pregnancy Calculators

Investment Calculators

Math Calculators

Housing/Building Calculators

Health Calculators

Retirement Calculators

Statistics Calculators

Various Measurements/Units Calculators

Everyday Utility Calculators

Weather Calculators

Real Estate Calculators

Tax and Salary Calculators

Geometry Calculators

Electronics/Circuits Calculators

Transportation Calculators

Entertainment/Anecdotes Calculators