Training models from sklearn using tf.distribute.MirroredStrategy

Question

I want to distribute the training of a simple model, such as a support vector classifier like sklearn.svm.SVC() across some or all CPUs and GPUs on a single device. I have never utilized a GPU before and I'm confused as to how this works or if using tensorflow is even the right choice for this simple task. What i think I need to do is something like this:

import tensorflow as tf from sklearn.svm import SVC from sklearn import datasets from sklearn.model_selection import train_test_split strategy = tf.distribute.MirroredStrategy() with strategy.scope(): iris = datasets.load_iris() X = iris.data y = iris.target class_names = iris.target_names X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) classifier = svm.SVC(kernel='linear', C=0.01).fit(X_train, y_train)

The actual dataset I'm using is $\approx 3 \times 10^8$ training examples with 11 features. Does this code do what I think it does? If not, what would be the best way to go about this task? If so, is there anything that can be improved?

EDIT:

After doing some more googling I discovered that sklearn does not support GPU utilization. See this post: https://stackoverflow.com/questions/41567895/will-scikit-learn-utilize-gpu

I'm still not sure how I can go about utilizing a GPU for simple ML models.

n1k31t4 · Accepted Answer · 2020-06-05 14:38:52Z

There are a few approaches that allow you to do basic ML modelling using a GPU.

First of all, in the code as you presented it, the tensorflow MirroredStrategy unfortunately has no effect. It will only work with tensorflow models themselves, not those from sklearn. In fact, sklearn does not offer any GPU support at all.

1. CUML

An Nvidia library that provides some basic ML model types and other things, often offering the exact same API (classes and functions) as SciKit-Learn. Coming from Nvidia, this of course means everything is built with the GPU in mind.

This is part of the larger RAPIDS toolset (incubated by Nvidia). Maybe there are other tools there that can be helpful, like their XGBoost library.

2. Tensorflow / PyTorch + NumPy

These framework are not just for complicated Deep Learning, you can really use them to perform any basic modelling and leverage their GPU support. Their documentation contains examples, otherwise something like Hands On Machine Learning (a book with an accompanying set of Jupyter notebooks) is a nice way to dig in.

These frameworks work well with the normal scientific stack in Python (such as NumPy, Scipy, Pandas) because numpy arrays and the frameworks' Tensor objects are plug-and-play for most cases.

3. Another option:

Stick to sklearn while you are learning about the models, how they work and so on. If you want to just do anything with the goal of learning about about GPU usage, the two options above are the most modern ways to get started.

Stack Exchange Network

Training models from sklearn using tf.distribute.MirroredStrategy

1 Answer 1

1. CUML

2. Tensorflow / PyTorch + NumPy

3. Another option:

Hot Network Questions

Training models from sklearn using tf.distribute.MirroredStrategy

1 Answer 1

1. CUML

2. Tensorflow / PyTorch + NumPy

3. Another option:

Related

Hot Network Questions