1
$\begingroup$

I'm using Scikit Learn with Python and making predictions using XGBRegressor.

Out of 81 features, 79 are floats and something like 2.537, so always rounded to 3 digits.

The prediction is pretty good but I asked myself if using ints for this would be better in regards of performance and outcome. The idea is to take each float feature, multiply it by 1000 and cast it as int.

After the prediction I can divide it by 1000 again.

What do you think about this? Or do you have any other suggestions?

Btw my train set is 130.000.000x81

Thanks in advance!

$\endgroup$
9
  • 2
    $\begingroup$ Welcome to Cross Validated! $1)$ What do you hope this will achieve? $//$ $2)$ When you try this, does it even change the predictions? $\endgroup$ Commented Apr 22 at 16:28
  • $\begingroup$ Thanks :) 1. Long time ago I learned that floats introduce errors. That's why I am asking. 2. The script duration is ~11h. I'm running it right now. Will tell you tomorrow! But basically I'm looking for the theoretical answer $\endgroup$ Commented Apr 22 at 17:17
  • $\begingroup$ "floats introduce errors": why would that be? $\endgroup$ Commented Apr 22 at 17:29
  • 1
    $\begingroup$ To be honest, I don't quite see the logic of rounding to three decimals in order to address possible imprecisions on the order of numerical representation accuracy... $\endgroup$ Commented Apr 22 at 19:47
  • 1
    $\begingroup$ If "even a small error can lead to very wrong results" then you have a very different problem than the one you have stated and it's not yet evident whether or to what extent integer representations will solve it. Could you tell us how the data got this way (three decimal digits), how they were measured, what they represent, and specifically what calculation is so incredibly sensitive? $\endgroup$ Commented Apr 23 at 13:33

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.