PySpark Linear Regression Get Coefficients

PySpark Linear Regression Get Coefficients

To get coefficients (weights) from a Linear Regression model in PySpark, you'll first need to fit a Linear Regression model using the LinearRegression class from the pyspark.ml.regression module. After fitting the model, you can retrieve the coefficients using the coefficients attribute of the fitted model.

Here's a step-by-step demonstration:

  1. Setup your PySpark environment.

  2. Create a sample DataFrame.

  3. Fit a Linear Regression model.

  4. Get and print the coefficients.

Here's the code for the above steps:

from pyspark.sql import SparkSession from pyspark.ml.regression import LinearRegression from pyspark.ml.linalg import Vectors from pyspark.ml.feature import VectorAssembler # 1. Setup PySpark environment spark = SparkSession.builder.appName("linear_regression_example").getOrCreate() # 2. Create a sample DataFrame data = [ (1.0, Vectors.dense(1.0)), (2.0, Vectors.dense(2.0)), (3.0, Vectors.dense(3.0)), (4.0, Vectors.dense(4.0)) ] df = spark.createDataFrame(data, ["label", "features"]) # 3. Fit a Linear Regression model lr = LinearRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8) lr_model = lr.fit(df) # 4. Get and print the coefficients print("Coefficients: " + str(lr_model.coefficients)) 

In this example, we've created a simple linear dataset with one feature. After fitting the Linear Regression model, the lr_model.coefficients will give you the weight(s) for the feature(s). If you have multiple features, this will return a dense vector with the weights for each feature.


More Tags

sendmail urlopen frame-rate oracle-xe mindate pod-install next-redux-wrapper artifact robotframework deployment

More Programming Guides

Other Guides

More Programming Examples