Skip to main content

Questions tagged [data-preprocessing]

A step of cleaning data in data mining for analysis purposes

0 votes
0 answers
32 views

I have a dataset with ~20.000 entries containing mean values for different groups. The groups are defined with 4 categorical columns and I have the week number, the number of samples per week and the ...
Dee Vee's user avatar
1 vote
0 answers
22 views

The cross-validation function cv.glmnet, for regularized regression, does not seem to allow for separate transformation/preprocessing of training and validation ...
DriesB's user avatar
  • 11
3 votes
1 answer
32 views

I'm working on a classification problem where the goal is to maximize the F1-score, hopefully above 80%. Despite a very thorough EDA and preprocessing workflow, I've hit a hard performance ceiling ...
hijunyng's user avatar
0 votes
0 answers
77 views

I am currently conducting an online survey in a factorial setting ("vignette study"). I have 8 vignettes in total, varying in three dimensions (let us call them Dimension A, Dimension B and ...
trimmu's user avatar
  • 11
0 votes
0 answers
44 views

I am currently working on the project where I need to assign customers across N recipes before AB testing such that KPIs for each customer are balanced across recipes (reduce pre-test bias) Dataset ...
Rishab's user avatar
  • 1
4 votes
1 answer
133 views

Conceptually, I understand that models should be built totally blind to the test set in order to most faithfully estimate performance on future data. However, I'm struggling to understand the extent ...
Evan's user avatar
  • 329
1 vote
0 answers
36 views

I am doing an analysis on NIR spectra of which I am trying to measure a physical property which I mostly expect to be scatter. However my samples have a complex surface morphology and I need some ...
phil27's user avatar
  • 11
1 vote
0 answers
72 views

I've recently learnt unsupervised learning methods such as KMeans and DBSCAN. While working on this dataset, I applied KMeans clustering but faced the following issues: The Elbow Method showed no ...
ssmalik's user avatar
  • 41
3 votes
2 answers
576 views

I'm applying K-Means clustering to a dataset of ship voyages. The goal is to group voyages into performance-based clusters like cost-efficient, underperforming, etc. I have 12 features in total: 10 ...
ssmalik's user avatar
  • 41
3 votes
1 answer
189 views

I am working with some data in which the output target values $(Y)$ are all strictly positive values, essentially in the range of 0.001 to 100. Since these values can inherently never be negative or ...
Applesauce44's user avatar
1 vote
0 answers
111 views

I've been reading into different Gaussian processes recently to better fit some data that I'm working with. My data clearly does not follow a multivariate Gaussian as required for a standard exact ...
Applesauce44's user avatar
0 votes
1 answer
98 views

Can outlier removal be done only on one class in a binary classification problem? when facing with class imbalance for example, can it be done only on majority class? if so, is there any paper on this ...
vhd's user avatar
  • 25
6 votes
0 answers
320 views

Assume we are only able to observe two-way entry table counting the number of observations of a pair of categorical features $x_i,x_j$. $$ \begin{array}{c|ccc} & & x_j & \\ \hline ...
Three Diag's user avatar
1 vote
0 answers
60 views

I have some time series data with multiple features. The output is shifted (I mean the times at which I have the output values are shifted from the corresponding inputs and also irregularly). I have ...
Ash Ketchump's user avatar
1 vote
0 answers
41 views

I am working on two health-related datasets. And I use Python. One tabular dataset (called A) contains patient-level information (by id) and a bunch of other features which I have already transformed ...
hiu's user avatar
  • 77

15 30 50 per page
1
2 3 4 5
36