Questions tagged [dimensionality-reduction]
Dimensionality reduction refers to techniques for reducing many variables into a smaller number while keeping as much information as possible. One prominent method is [tag pca]
283 questions
0 votes
0 answers
64 views
With multiple identical datapoints, should I use UMAP min_dist = 0?
Most (if not all) implementations/ examples of UMAP dimensionality reduction I have seen use a min_dist value of slightly above zero in order to avoid too tight ...
0 votes
1 answer
66 views
The latest approach for feature dimenesion reduction
I have a feature matrix with 1200 rows and 18930 columns. The matrix is sparse and the original paper has used a stacked denoising autoencoder for dimensionality reduction. Since I want to enhance the ...
0 votes
1 answer
203 views
TSNE plots of random data subsets are vastly different but labels are still clearly separated - what conclusions can we draw about the dataset?
I scraped a dataset of match data in a video game and labeled them according to their outcome (0 for loss, 1 for win). I wanted to see if there was actually any inherent relationship between the ...
0 votes
1 answer
120 views
Beginner clustering project, what are the input features and how do I analyze the data?
I am a beginner to data science. I have this dataset on natural disaster events in Afghanistan from 2016 - 2017. Columns: REGION (ex. North, North West, etc) PROVINCE_NAME (kind of like US 50 states) ...
1 vote
1 answer
594 views
How to explain the new features after a PCA?
Let's say I made a PCA in which I reduced from 10 dimensions to 3. And it clusters the classes correctly, but how do I explain which dimensions are better to predict? It is obvious that the 3 ...
1 vote
0 answers
48 views
Density distribution for feature analysis
I trained a ML model on original data with 6373 features, then I trained the same model on compressed data (using autoencoder) and I got an improvement. Finally, I trained the same model on reduced ...
2 votes
1 answer
135 views
How to improve the preservation of the global data structure in UMAP?
I have a dataset, where the features are comprised of points arranged in a regular grid on a simplex. Each of these points are defined as follows: A point $\mathbf{x}$ on the simplex can be ...
0 votes
1 answer
85 views
important feature selection using dimensionality reduction algorithms
I have a dataset having more than 25000 features. I did perform noise removal using the histogram approach, and this dataset gets reduced to more than 5000 features. There are two classes, healthy and ...