2
$\begingroup$

I want to predict a feature using a NN, but some business logic require that the prediction will be no less than min_value.

I imposed it after the training by:

y_predict = np.maximum(y_predict, min_value) 

Is there a better way to do it? Should I impose the minimal value during training? Maybe by imposing the minimal value also on y_train and on y_test?

$\endgroup$
4
  • 2
    $\begingroup$ What business logic dictates that the prediction not be below a certain value? Your boss will be unhappy about predicted losses? There’s a physical mechanism that prevents values from being any low (e.g., height is not negative)? Those are not the same. $\endgroup$ Commented Mar 1, 2024 at 0:02
  • $\begingroup$ I am trying to predict the length of the next session of users in our app. So y < min_value is possible. I know that the best practice is to make the best prediction and only then to add the business logic, however this time I can not. $\endgroup$ Commented Mar 3, 2024 at 6:05
  • $\begingroup$ however this time I can not Why not? $\endgroup$ Commented Mar 3, 2024 at 6:09
  • $\begingroup$ One battle at a time. I try to convince the analytics team to change the prediction model from LR to NN. Currently they impose minimal value on the prediction, I want to keep everything the same but the prediction model. $\endgroup$ Commented Mar 3, 2024 at 6:20

1 Answer 1

3
$\begingroup$

I tend to prefer that business logic happen outside the model, because modeling the truth seems more likely to be easier than modeling the business. But, I think that can be data-, model-, and business-dependent.

Aside from implementation concerns, I think the question is whether modeling y_true or y_business is easier (for you and your model). Probably for parametric models (esp. linear ones), the relationship between your features and the true target continues past the business floor, and truncating there will cause the model to over- and under-estimate in different areas in an attempt to compensate:

two linear fits; the one to truncated data shows a shallower slope

As you move to more expressive models (whether by adding feature engineering, or just more complex models), the difference is probably less stark. For example, tree-based model will prefer the truncated target, not needing to make further splits in branches where the target has (mostly or entirely) been truncated.

tree-based fits; wiggly piecewise-constant fits
tree fit to raw data is large tree tree fit to truncated data is smaller

Splines plus a linear regression only have a problem near the kink:

spline fit

You mentioned neural networks. Depending on the activation function, approximating the constant region may be more or less easy. And I think the effect at the kink will be similar to the spline plot.

So I'd advocate for doing some EDA to determine how your situation compares, or just try them both (but be sure to measure their performance on fair grounds).

Colab notebook generating these plots

$\endgroup$
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.