Questions tagged [data-imputation]
Refers to a general class of methods used to "fill in" missing data. Methods used for doing this typically are related to interpolation (https://en.wikipedia.org/wiki/Interpolation) and require assumptions about why the data is missing (e.g. "missing at random")
699 questions
0 votes
0 answers
12 views
Imputation with mice for multilevel data that is only missing level 1 values
This is my first time attempting data imputation with the mice package. I've read some tutorials but am still confused about how to apply the different examples to ...
0 votes
0 answers
24 views
What is the expected ideal values for the losses of discrimintor when using generative adversarial imputaiton network to impute missing values?
I am new to GAIN (generative adversarial imputation network). I am trying to use GAIN to impute missing values. I have a quesiton about the values of the losses for the discriminator. Are the values ...
4 votes
1 answer
87 views
Imputation of Data for CFA
I have a dataset with some latent variables, and my main one happens to have 9 dichotomous items. I did little MCAR's test which resulted in a very low p-value, so I should conduct imputation before ...
4 votes
2 answers
291 views
Use bigger sample for predictors in regression
I need to regress continuous y on multi-dimensional X (for prediction mostly, not inference, but do I need the betas to make ...
7 votes
1 answer
255 views
How to deal with missing data in the response variable
I am writing to seek your expertise with a project I am working on, which is described briefly here: The data are on disease occurrence ('number of cases' is my dependent variable) from 1960 to 2025. ...
3 votes
1 answer
177 views
Is it appropriate to use *missForest* for imputing missing data and critical/recommended threshold?
I'm currently working with a dataset from a molecular epidemiology study involving an controls and cases for a cardiovascular event. The dataset includes several categorical health and lifestyle-...
8 votes
2 answers
323 views
Test set imputation
Suppose that I have missing values in one of my features, and there are missing values both in the train and test sets. I want to impute using the median of the observed features. Should I: A) ...
1 vote
1 answer
94 views
How does MissRanger perform initial imputation before random forest iterations?
I'm using the MissRanger package in R to perform imputation on a very large dataset. While the package documentation clearly explains the iterative random forest imputation process, I couldn't find ...
0 votes
0 answers
49 views
Handling Missing Values in the dataset
I'm using this dataset for a regression project, and the goal is to predict the beneficiary risk score(Bene_Avg_Risk_Scre). Now, to protect beneficiary identities and safeguard this information, CMS ...
1 vote
0 answers
46 views
How to estimate the variance of a vector that is a mixture of known and modeled values?
Problem Imagine we have $X^k_i$, where $i = 1, 2, 3, ..., N$ and $k \in \{train, test, modeled\}$. Each $x^{train}_i$ was used to train a model from covariates to predict $x_i$, and each $X^{test}_i$ ...
0 votes
0 answers
61 views
Model prediction is more accurate with substitued left-censored data than with imputed
I have a set of environmental variables that are left-censored (measurements of elements in my samples). I have two datasets, one dataset with samples with known origins and one dataset with samples ...
1 vote
0 answers
106 views
Multiple imputation and causality [duplicate]
The question I am currently trying to write a guideline regarding variable selection for multiple imputation of incompletely observed longitudinal data, with a repeatedly measured outcome and a set of ...
0 votes
0 answers
37 views
Proper order of imputation, scaling, and sampling (for data balancing) before model training
I was preparing a scikit-learn pipeline with these three steps imputation, scaling, and sampling (for data balancing). I chose this order: Impute: first step because it would not make sense to apply ...
0 votes
1 answer
127 views
Do we handle missing demographic data the same way we handle missing data for other sort of variables?
I am missing data on demographic variables such as age, gender, ethnicity. I have used stochastic regression to impute the missing data on all other variables of interest, such as psychological ...
0 votes
0 answers
33 views
What is the right procedure for multivariate imputation of categorical data?
I wanted to know the multivariate approach to impute the categorical data. It is apparent that if I want to us sklearn.impute.IterativeImputer I need to encode the ...