Skip to main content

Questions tagged [data-imputation]

Refers to a general class of methods used to "fill in" missing data. Methods used for doing this typically are related to interpolation (https://en.wikipedia.org/wiki/Interpolation) and require assumptions about why the data is missing (e.g. "missing at random")

0 votes
0 answers
12 views

This is my first time attempting data imputation with the mice package. I've read some tutorials but am still confused about how to apply the different examples to ...
vcityx's user avatar
  • 1
0 votes
0 answers
24 views

I am new to GAIN (generative adversarial imputation network). I am trying to use GAIN to impute missing values. I have a quesiton about the values of the losses for the discriminator. Are the values ...
JonathonSoong's user avatar
4 votes
1 answer
87 views

I have a dataset with some latent variables, and my main one happens to have 9 dichotomous items. I did little MCAR's test which resulted in a very low p-value, so I should conduct imputation before ...
Kris Nenezic's user avatar
4 votes
2 answers
291 views

I need to regress continuous y on multi-dimensional X (for prediction mostly, not inference, but do I need the betas to make ...
valya's user avatar
  • 141
7 votes
1 answer
255 views

I am writing to seek your expertise with a project I am working on, which is described briefly here: The data are on disease occurrence ('number of cases' is my dependent variable) from 1960 to 2025. ...
Akhil Kshirsagar's user avatar
3 votes
1 answer
177 views

I'm currently working with a dataset from a molecular epidemiology study involving an controls and cases for a cardiovascular event. The dataset includes several categorical health and lifestyle-...
Javier Hernando's user avatar
8 votes
2 answers
323 views

Suppose that I have missing values in one of my features, and there are missing values both in the train and test sets. I want to impute using the median of the observed features. Should I: A) ...
WeakLearner's user avatar
  • 1,631
1 vote
1 answer
94 views

I'm using the MissRanger package in R to perform imputation on a very large dataset. While the package documentation clearly explains the iterative random forest imputation process, I couldn't find ...
water seethrow's user avatar
0 votes
0 answers
49 views

I'm using this dataset for a regression project, and the goal is to predict the beneficiary risk score(Bene_Avg_Risk_Scre). Now, to protect beneficiary identities and safeguard this information, CMS ...
Anirudh's user avatar
1 vote
0 answers
46 views

Problem Imagine we have $X^k_i$, where $i = 1, 2, 3, ..., N$ and $k \in \{train, test, modeled\}$. Each $x^{train}_i$ was used to train a model from covariates to predict $x_i$, and each $X^{test}_i$ ...
Mark White's user avatar
  • 11.7k
0 votes
0 answers
61 views

I have a set of environmental variables that are left-censored (measurements of elements in my samples). I have two datasets, one dataset with samples with known origins and one dataset with samples ...
AnneA's user avatar
  • 11
1 vote
0 answers
106 views

The question I am currently trying to write a guideline regarding variable selection for multiple imputation of incompletely observed longitudinal data, with a repeatedly measured outcome and a set of ...
BlackForestStats's user avatar
0 votes
0 answers
37 views

I was preparing a scikit-learn pipeline with these three steps imputation, scaling, and sampling (for data balancing). I chose this order: Impute: first step because it would not make sense to apply ...
hlfernandez's user avatar
0 votes
1 answer
127 views

I am missing data on demographic variables such as age, gender, ethnicity. I have used stochastic regression to impute the missing data on all other variables of interest, such as psychological ...
Lee Zhiyuan's user avatar
0 votes
0 answers
33 views

I wanted to know the multivariate approach to impute the categorical data. It is apparent that if I want to us sklearn.impute.IterativeImputer I need to encode the ...
Redowan Sakib's user avatar

15 30 50 per page
1
2 3 4 5
47