Questions tagged [parallel]
The parallel tag on Data Science Stack Exchange encompasses questions related to parallel computing and processing within data science workflows. This includes discussions on distributing tasks across multiple processors or machines to enhance computational efficiency.
39 questions
2 votes
1 answer
125 views
XGBoost GPU version not outperforming CPU on small dataset despite parameter tuning – suggestions needed
I'm currently working on a Parallel and Distributed Computing project where I'm comparing the performance of both XGBoost and CatBoost when trained on CPU vs GPU. The goal is to demonstrate how GPU ...
2 votes
1 answer
316 views
Parallel Data preprocessing
I am looking for a suggestion. Is it possible to implement the data preprocessing steps like missing value imputation, outlier detection, normalization, label encoding in parallel? Can I implement ...
1 vote
2 answers
342 views
How to load and run feature selection on a dataset with 5,000 samples and 500,000 features?
I have a dataset with 5000 samples and 500,000 features (all categorical with a cardinality of 3). Two problems I'm trying to solve: Loading the dataset - I can't load it in memory despite using a ...
12 votes
3 answers
18k views
What needs to be done to make n_jobs work properly on sklearn? in particular on ElasticNetCV?
The constructor of sklearn.linear_model.ElasticNetCV takesn_jobs as an argument. Quoting the documentation here n_jobs: int, ...
1 vote
0 answers
64 views
Parallelization of a MIMO linear filter
I would like to implement a Multi Input Multi Output filtering operation, acting as fast as possible on batches of data. Here is my current implementation: ...
0 votes
3 answers
3k views
Specifying number of threads using XGBoost.train
When using the xgboost.train() function, all the threads are used. I would like to use a specific amount. Unfortunately, this function does not accept the ...
0 votes
1 answer
1k views
CUDA 8.0 is compatible with my GeForce GTX 670M Wikipedia says, but TensorFlow rises an error: GTX 670M's Compute Capability is < 3.0
According to Wikipedia, the GeForce GTX 670M has a Compute Capability of 2.1 (and a Fermi micro-architecture), which is confirmed by TensorFlow (I can read "2.1" in the error it rises). ...
1 vote
0 answers
113 views
Updating Weight Using Updates on Related Data
Suppose $$ x=Ay $$ The $x$ is $M\times 1$, $y$ is $N \times 1$ and $A$ is $M\times N$ We have the data $x$ and would like to know what $y$ is. However, the matrix $A$ is too large for pseudo-...