How to map entity primary key to Spark ML predictions?

Question

I trained a Spark ML model, scored my holdout dataset with it, and now need to look up the prediction for specific entities.

How can I figure out which prediction is for whom? Is there a way I can add the entity primary key (e.g. Member_ID) to my prediction output?

More specifically: to score the dataset, I used: predictions = trained_model.transform(holdout_data)

It produces a dataframe with columns: "features", "label", "prediction" (label is the response variable)

How do I find out the corresponding Member_ID for each prediction?

tpain · Accepted Answer · 2019-07-24 06:18:01Z

1

Does holdout_data only contain the columns: ["features", "label"]? If so then add the Member_ID to it.

The .transform() method of the pyspark.ml model adds the extra column prediction to the holdout_data, so if Member_ID is there to begin with, then problem solved.

answered Jul 24, 2019 at 6:18

tpain

1447 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Victor Z Over a year ago

Thanks, it works. It's good to know how .transform() method of the pyspark.ml works, I thought it only takes in "features" and "label" columns, so didn't include things like primary keys.

tpain Over a year ago

No problem. True, I assumed the same.

Collectives™ on Stack Overflow

How to map entity primary key to Spark ML predictions?

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related