How can I compare the accuracy of imputation models if there is already missing dataset in the file?

Question

Let's say I have a dataset of 50,000 where about 2% were already missing from the beginning. From what I have learned, we need to use indicators to compare the imputation model with the ground truth value to check the accuracy of the imputed value. But, since, I already have some missing values in my raw dataset, how can I calculate the accuracy of different models and select the best?

Ansh · Accepted Answer · 2023-06-27 17:58:18Z

1

2 possible things:

you doubt the library and want to check the accuracy of that, then create custom fuction and check for some data if that imputation is accurate or not.- in general, its rare that standard libraby implementation will be wrong and will impute what it was not intended to do.
you want to compare the suitablity of imputed values from different methods- best way is create same Model(with same architecure and configs) on different set of imputed data, whatever gives you better performance is the better one.

answered Jun 27, 2023 at 17:58

Ansh

1873 bronze badges

$\begingroup$ And what shall be the indicator to compare those different models? Since there are already some data missing, we can't use mean squared error to compare imputed value to ground value to select the best model, right? $\endgroup$

Amisha Dhimal
– Amisha Dhimal

2023-06-29 13:31:17 +00:00
Commented Jun 29, 2023 at 13:31
$\begingroup$ we can use overall model accuracy, if you want you can use mean squared error for all different models the model have best performance overall in predicting target variable had the best imputation $\endgroup$

Ansh
– Ansh

2023-06-29 16:51:24 +00:00
Commented Jun 29, 2023 at 16:51
$\begingroup$ But, let's say I have the following raw data: | X | Y | |-------|------ | | 2 | 3 | | 3 | 3 | | 4 | 3 | | 5 | 2 | | 6 | ? | | 7 | 8 | | 8 | 5 | | 9 | ? | | 10 | 3 | Now, since I have raw data who already has missing data, how can calculate MSE for model A and B to choose one model as best for imputation process? $\endgroup$

Amisha Dhimal
– Amisha Dhimal

2023-07-01 18:06:04 +00:00
Commented Jul 1, 2023 at 18:06
$\begingroup$ the data what you will impute, you will use that to calculae in respective scenario. $\endgroup$

Ansh
– Ansh

2023-07-03 10:37:08 +00:00
Commented Jul 3, 2023 at 10:37

Add a comment |

Stack Exchange Network

How can I compare the accuracy of imputation models if there is already missing dataset in the file?

1 Answer 1

Hot Network Questions

How can I compare the accuracy of imputation models if there is already missing dataset in the file?

1 Answer 1

Related

Hot Network Questions