2
$\begingroup$

Let's say I put the following two datasets in the best possible model (same model for both):

  • A raw dataset, the variables as they came just from the query.
  • A feature-engineered dataset, with hundreds of created variables, which came from the same raw dataset I just mentioned.

Could the difference between both AUCs be high? How much?

$\endgroup$
2
  • 1
    $\begingroup$ Any ground-rules here, on what "raw vs feature-engineered" and "best possible model" can mean? $\endgroup$ Commented Jan 17, 2020 at 21:58
  • $\begingroup$ Yes. Raw: The variables have missing values, none grouping variable is derived (mean by group or similar), no summations A+B, or A-B, ratios, A/B or similar are calculated. Feature-Engineered: Mean-encoding, Frequency encoding, Impact-encoding, separation in ranges, ranks, lagged variables. a new variable defined from cluster. Best model: Let's say XGBoost. $\endgroup$ Commented Jan 17, 2020 at 22:07

2 Answers 2

3
$\begingroup$

Yes, the performance can vary a lot using feature engineering.

Example: suppose a dataset where the response variable $y$ is true if $x$ is odd.

x y 346 F 13 T 178 F 64 F 987 T ... 

Most learning models will fail to identify the pattern and will perform poorly, usually falling back to always predicting the majority class. However simply adding a feature $x \% 2$ to the data will allow any model to perform perfectly.

Of course this a toy example, but the point is that a single well chosen feature can drastically change the performance. Naturally the increase in performance totally depends on the data and the nature of the features added.

$\endgroup$
2
$\begingroup$

I would say that the best possible model for the raw data would derive all the meaningful features that you would have created from the data anyway.

And I would say that the best possible model for the feature-engineered model will remove/ignore unnecessary features.

The best possible model would have AUC of 1 anyway. It makes all predictions correctly.

But even in the context of noise where AUC of 1 can not be achieved, I think the argument holds.

But learning rate/convergence speed may vary.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.