0

I trained a Spark ML model, scored my holdout dataset with it, and now need to look up the prediction for specific entities.

How can I figure out which prediction is for whom? Is there a way I can add the entity primary key (e.g. Member_ID) to my prediction output?

More specifically: to score the dataset, I used: predictions = trained_model.transform(holdout_data)

It produces a dataframe with columns: "features", "label", "prediction" (label is the response variable)

How do I find out the corresponding Member_ID for each prediction?

1 Answer 1

1

Does holdout_data only contain the columns: ["features", "label"]? If so then add the Member_ID to it.

The .transform() method of the pyspark.ml model adds the extra column prediction to the holdout_data, so if Member_ID is there to begin with, then problem solved.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, it works. It's good to know how .transform() method of the pyspark.ml works, I thought it only takes in "features" and "label" columns, so didn't include things like primary keys.
No problem. True, I assumed the same.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.