|
11 | 11 | "cell_type": "markdown", |
12 | 12 | "metadata": {}, |
13 | 13 | "source": [ |
14 | | - "This notebook contains steps and code to demonstrate how serverless computing can provide great benefit for AI data preprocessing. We demonstrate Face Recognition deep learning over Watson Machine Learning service, while letting IBM Cloud Function to do the data preparation phase. As we will show this makes an entire process up to 50 times faster comparing to running the same code without leveraging serverless computing.\n", |
| 14 | + "This notebook contains steps and code to demonstrate how serverless computing can provide great benefit for AI data preprocessing. We demonstrate face recognition using deep learning over the Watson Machine Learning service, while letting IBM Cloud Functions do the data preparation phase. As we will show this makes an entire process up to 50 times faster comparing to running the same code without leveraging serverless computing.\n", |
15 | 15 | "\n", |
16 | | - "Our notebook is based on a blog <a href=\"https://hackernoon.com/building-a-facial-recognition-pipeline-with-deep-learning-in-tensorflow-66e7645015b8\" target=\"_blank\" rel=\"noopener no referrer\">Building a Facial Recognition Pipeline with Deep Learning in Tensorflow </a> written by Cole Murray who kindly allowed us to use code and text from his blog.\n", |
| 16 | + "Our notebook is based on a blog <a href=\"https://hackernoon.com/building-a-facial-recognition-pipeline-with-deep-learning-in-tensorflow-66e7645015b8\" target=\"_blank\" rel=\"noopener no referrer\">Building a Facial Recognition Pipeline with Deep Learning in Tensorflow</a> written by Cole Murray who kindly allowed us to use code and text from his blog.\n", |
17 | 17 | "\n", |
18 | | - "This notebook introduces commands for getting data, training_definition persistance to Watson Machine Learning repository, model training, deployment and scoring.\n", |
| 18 | + "This notebook introduces commands for interacting with your Watson Machine Learning service such as uploading training definitions and kicking off a training session.\n", |
19 | 19 | "\n", |
20 | | - "Some familiarity with Python is helpful. This notebook uses \n", |
| 20 | + "Some familiarity with Python is helpful. This notebook uses:\n", |
21 | 21 | "\n", |
22 | 22 | "- Python 3 \n", |
23 | 23 | "- <a href=\"https://dataplatform.cloud.ibm.com/docs/content/analyze-data/environments-parent.html\" target=\"_blank\" rel=\"noopener no referrer\">Watson Studio environments.</a>\n", |
24 | | - "- IBM Cloud Functions\n", |
| 24 | + "- <a href=\"https://cloud.ibm.com/openwhisk\" target=\"_blank\" rel=\"noopener no referrer\">IBM Cloud Functions</a>\n", |
25 | 25 | "- <a href=\"https://github.com/pywren/pywren-ibm-cloud\" target=\"_blank\" rel=\"noopener no referrer\">PyWren for IBM Cloud</a>\n", |
26 | 26 | "\n", |
27 | 27 | "\n", |
28 | 28 | "\n", |
29 | 29 | "## Learning goals\n", |
30 | 30 | "\n", |
31 | | - "In this notebook, you will learn how to:\n", |
| 31 | + "In this notebook, you will learn:\n", |
32 | 32 | "\n", |
33 | | - "- Work with Watson Machine Learning experiments to train Deep Learning models (Tensorflow)\n", |
34 | | - "- Save trained models in the Watson Machine Learning repository\n", |
35 | | - "- Deploy a trained model online and score\n", |
36 | | - "- How IBM Cloud Functions can be used for data preparation phase\n", |
37 | | - "- Value of PyWren for IBM Cloud\n", |
| 33 | + "- How IBM Cloud Functions can be used for the data preparation phase\n", |
| 34 | + "- The value of PyWren for IBM Cloud\n", |
| 35 | + "- How to work with Watson Machine Learning to train Deep Learning models (TensorFlow + scikit-learn)\n", |
| 36 | + "- How to retrieve and use models trained in WML\n", |
38 | 37 | "\n", |
| 38 | + "## Contents\n", |
39 | 39 | "\n", |
40 | | - "## Contents\n" |
| 40 | + "1. [Set up related IBM Cloud Services](#setup)\n", |
| 41 | + "2. [Dependencies installation](#dependencies-install)\n", |
| 42 | + "3. [Configuration](#configuration)\n", |
| 43 | + "4. [Preprocessing Data using Dlib and Docker](#preprocessing)\n", |
| 44 | + "5. [Setup for WML](#wml-setup)\n", |
| 45 | + "6. [Create the training definitions](#training-definitions)\n", |
| 46 | + "7. [Train the model](#train)\n", |
| 47 | + "8. [Work with the Trained Model](#work)\n", |
| 48 | + "9. [Summary](#summary)\n" |
41 | 49 | ] |
42 | 50 | }, |
43 | 51 | { |
|
87 | 95 | "metadata": {}, |
88 | 96 | "source": [ |
89 | 97 | "### 1.3 Create IBM Cloud Functions account\n", |
90 | | - "Setup IBM Cloud Functions account as described here. Please follow all the steps and make sure you can run \"Hello World\" example based on Python code. This will assure your Cloud Functions service is running" |
| 98 | + "Setup IBM Cloud Functions account as described here. Please follow all the steps and make sure you can run the \"Hello World\" example based on Python code. This will assure your Cloud Functions service is running." |
91 | 99 | ] |
92 | 100 | }, |
93 | 101 | { |
94 | 102 | "cell_type": "markdown", |
95 | 103 | "metadata": {}, |
96 | 104 | "source": [ |
| 105 | + "<a id=\"dependencies-install\"></a>\n", |
97 | 106 | "## <span style=\"color:blue\"> 2. Dependencies installation </span>" |
98 | 107 | ] |
99 | 108 | }, |
100 | 109 | { |
101 | 110 | "cell_type": "markdown", |
102 | 111 | "metadata": {}, |
103 | 112 | "source": [ |
104 | | - "Install the needed libraries for the Face Recognition. \n", |
105 | | - "\"dlib\" dependency need to be installed via new environment. Create new environment based on Python 3.5 and add dependency in the customizaion section, as follows\n", |
| 113 | + "Install the needed libraries for the face recognition preprocessing.\n", |
| 114 | + "The \"dlib\" dependency needs to be installed via new environment. Create a new environment based on Python 3.5 and add the dependency in the customization section as follows:\n", |
106 | 115 | "\n", |
| 116 | + " channels:\n", |
| 117 | + " - conda-forge\n", |
107 | 118 | " dependencies:\n", |
108 | | - " - dlib" |
| 119 | + " - dlib" |
109 | 120 | ] |
110 | 121 | }, |
111 | 122 | { |
|
139 | 150 | "cell_type": "markdown", |
140 | 151 | "metadata": {}, |
141 | 152 | "source": [ |
| 153 | + "<a id=\"configuration\"></a>\n", |
142 | 154 | "## <span style=\"color:blue\">3. Configuration </span>\n", |
143 | | - "This section explains how to configure services" |
| 155 | + "This section explains how to configure the needed services." |
144 | 156 | ] |
145 | 157 | }, |
146 | 158 | { |
|
149 | 161 | "source": [ |
150 | 162 | "### 3.1 Setup a bucket in IBM Cloud Object Storage\n", |
151 | 163 | "\n", |
152 | | - "You need an IBM COS bucket, which you will use to store the input data. If you don't have any existing bucket, please navigate to IBM COS Dashboard UI and create new bucket in the region you prefer. Make sure you copy the correct endpoint for the bucket\n", |
| 164 | + "You need an IBM COS bucket which you will use to store the input data. If you don't know of any of your existing buckets or would like like to create a new one, please navigate to your <a href=\"https://cloud.ibm.com/resources\" target=\"_blank\" rel=\"noopener no referrer\">cloud resource list</a>, then find and select your storage instance. From here, you will be able to view all your buckets and can create a new bucket in the region you prefer. Make sure you copy the correct endpoint for the bucket from the `Endpoint` tab of this COS service dashboard.\n", |
153 | 165 | "\n", |
154 | 166 | "**Note:** The bucket names must be unique." |
155 | 167 | ] |
|
179 | 191 | "metadata": {}, |
180 | 192 | "source": [ |
181 | 193 | "### 3.2 COS Connection\n", |
182 | | - "You need obtain both credentials to the Cloud Functions and COS.\n", |
| 194 | + "Now connect to the Cloud Object Storage service by first obtaining your credentials.\n", |
183 | 195 | "\n", |
184 | | - "You can find COS credentials in your COS instance dashboard under the Service credentials tab.\n", |
| 196 | + "You can find COS credentials in your COS instance dashboard under the `Service credentials` tab.\n", |
185 | 197 | "Note: the HMAC key, described in set up the environment is included in these credentials.\n", |
186 | 198 | "\n" |
187 | 199 | ] |
|
207 | 219 | "}" |
208 | 220 | ] |
209 | 221 | }, |
210 | | - { |
211 | | - "cell_type": "markdown", |
212 | | - "metadata": {}, |
213 | | - "source": [ |
214 | | - "Define the endpoint.\n", |
215 | | - "\n", |
216 | | - "To do this, go to the **Endpoint** tab in the COS instance's dashboard to get the endpoint information, then enter it in the cell below:" |
217 | | - ] |
218 | | - }, |
219 | 222 | { |
220 | 223 | "cell_type": "markdown", |
221 | 224 | "metadata": {}, |
|
237 | 240 | "cell_type": "markdown", |
238 | 241 | "metadata": {}, |
239 | 242 | "source": [ |
240 | | - "Install the boto library. This library allows Python developers to manage Cloud Object Storage (COS)." |
| 243 | + "Install the boto library if necessary. This library allows Python developers to manage IBM Cloud Object Storage (COS). However, most environments on Watson Studio have this preinstalled." |
241 | 244 | ] |
242 | 245 | }, |
243 | 246 | { |
|
253 | 256 | "metadata": {}, |
254 | 257 | "outputs": [], |
255 | 258 | "source": [ |
256 | | - "# Run the command if ibm_boto3 is not installed.\n", |
| 259 | + "# Uncomment and run the following command if ibm_boto3 is not installed.\n", |
257 | 260 | "# !pip install ibm-cos-sdk" |
258 | 261 | ] |
259 | 262 | }, |
|
263 | 266 | "metadata": {}, |
264 | 267 | "outputs": [], |
265 | 268 | "source": [ |
266 | | - "# Install the boto library.\n", |
| 269 | + "# Import the boto library.\n", |
267 | 270 | "import ibm_boto3\n", |
268 | 271 | "from ibm_botocore.client import Config" |
269 | 272 | ] |
|
296 | 299 | "source": [ |
297 | 300 | "### 3.3 Verify you can access your COS Bucket\n", |
298 | 301 | "\n", |
299 | | - "If you fail to access your bucket, make sure you use correct bucket name and endpoint URL for the region where bucket was created" |
| 302 | + "If you fail to access your bucket, make sure you use the correct bucket name and endpoint URL for the region where the bucket was created. If you see no errors after running the following cell, you are good to go." |
300 | 303 | ] |
301 | 304 | }, |
302 | 305 | { |
303 | 306 | "cell_type": "code", |
304 | | - "execution_count": 9, |
305 | | - "metadata": {}, |
306 | | - "outputs": [ |
307 | | - { |
308 | | - "name": "stdout", |
309 | | - "output_type": "stream", |
310 | | - "text": [ |
311 | | - "Error. Bucket can not be empty. Please create bucket in COS Dashboard UI and update 'BUCKET'\n", |
312 | | - "Error. Please create ibm_boto3 instance\n" |
313 | | - ] |
314 | | - } |
315 | | - ], |
| 307 | + "execution_count": null, |
| 308 | + "metadata": {}, |
| 309 | + "outputs": [], |
316 | 310 | "source": [ |
317 | 311 | "try: BUCKET\n", |
318 | 312 | "except NameError: BUCKET = None\n", |
|
327 | 321 | " print(\"Error. Please create ibm_boto3 instance\")\n", |
328 | 322 | "\n", |
329 | 323 | "if cos and not cos.Bucket(BUCKET) in cos.buckets.all():\n", |
330 | | - " print (\"Bucket not found. Please make sure cos_endpoint target the region of the bucket\")" |
| 324 | + " print (\"Error. Bucket not found. Please make sure cos_endpoint targets the region of the bucket\")" |
331 | 325 | ] |
332 | 326 | }, |
333 | 327 | { |
|
341 | 335 | "cell_type": "markdown", |
342 | 336 | "metadata": {}, |
343 | 337 | "source": [ |
344 | | - "Obtain api key and endpoint to the IBM Cloud Functions service. Navigate the \"API Key\" menu and copy namespace, host and key. Make sure to add \"https://\" to the host" |
| 338 | + "Obtain the API key and endpoint to the <a href=\"https://cloud.ibm.com/openwhisk\" target=\"_blank\" rel=\"noopener no referrer\">IBM Cloud Functions service</a>. Navigate to `Getting Started` > `API Key` from the side menu and copy the values for \"Current Namespace\", \"Host\" and \"Key\" into the config below. Make sure to add \"https://\" to the host when adding it as the endpoint." |
345 | 339 | ] |
346 | 340 | }, |
347 | 341 | { |
|
365 | 359 | "cell_type": "markdown", |
366 | 360 | "metadata": {}, |
367 | 361 | "source": [ |
368 | | - "PyWren engine requires it's server side component to be deployed in advance. This step creates a new IBM Cloud Function function with PyWren server side runtime. This action will be used internally by PyWren during execution phases." |
| 362 | + "The PyWren engine requires its server side component to be deployed in advance. This step creates a new IBM Cloud Functions function with the PyWren server side runtime. This action will be used internally by PyWren during execution phases." |
369 | 363 | ] |
370 | 364 | }, |
371 | 365 | { |
|
382 | 376 | "cell_type": "markdown", |
383 | 377 | "metadata": {}, |
384 | 378 | "source": [ |
| 379 | + "<a id=\"preprocessing\"></a>\n", |
385 | 380 | "## <span style=\"color:blue\">4. Preprocessing Data using Dlib and Docker</span>" |
386 | 381 | ] |
387 | 382 | }, |
|
391 | 386 | "source": [ |
392 | 387 | "### 4.1 Upload input data into IBM Cloud Object Storage\n", |
393 | 388 | "\n", |
394 | | - "Your COS Bucket should contain raw dataset of images by following structure:\n", |
| 389 | + "Your COS Bucket should contain the raw dataset of images with the following structure:\n", |
395 | 390 | "\n", |
396 | 391 | " Directory Structure\n", |
397 | 392 | " ├── Tyra_Banks\n", |
|
401 | 396 | " │ ├── Tyron_Garner_0001.jpg\n", |
402 | 397 | " │ └── Tyron_Garner_0002.jpg\n", |
403 | 398 | " \n", |
404 | | - "If you don't have any images, we will demonstrate how to use the [LFW](http://vis-www.cs.umass.edu/lfw/) (Labeled Faces in the Wild) dataset as training data. Below are instructions how you can upload this dataset into your private COS bucket. \n", |
| 399 | + "If you don't have any images, we will demonstrate how to use the [LFW](http://vis-www.cs.umass.edu/lfw/) (Labeled Faces in the Wild) dataset as training data. Below are instructions how you can upload this dataset into your private COS bucket.\n", |
405 | 400 | "\n", |
406 | | - "**You should run this only once. If images were already created in the previos run you can skip this section**" |
| 401 | + "**You should run this only once. If images were already created in any previous run, you can skip this section.**" |
407 | 402 | ] |
408 | 403 | }, |
409 | 404 | { |
410 | 405 | "cell_type": "markdown", |
411 | 406 | "metadata": {}, |
412 | 407 | "source": [ |
413 | 408 | "The following step copies images from Labeled Faces in the Wild into your COS bucket.\n", |
414 | | - "We demonstrate with small data set of 6MB. If you wish to use entire data set, then use \n", |
| 409 | + "We demonstrate with small data set of about 14MB. If you wish to use entire data set, then use \n", |
415 | 410 | "\n", |
416 | 411 | " url = \"http://vis-www.cs.umass.edu/lfw/lfw.tgz\"" |
417 | 412 | ] |
|
456 | 451 | "\n", |
457 | 452 | " for p in procs:\n", |
458 | 453 | " p.join()\n", |
459 | | - " \n", |
460 | 454 | "\n", |
461 | | - "url = \"http://vis-www.cs.umass.edu/lfw/lfw-bush.tgz\"\n", |
| 455 | + "\n", |
| 456 | + "url = \"http://vis-www.cs.umass.edu/lfw/lfw-a.tgz\"\n", |
462 | 457 | "extractFromStream(url, cos, \"images\")\n" |
463 | 458 | ] |
464 | 459 | }, |
|
476 | 471 | "cell_type": "markdown", |
477 | 472 | "metadata": {}, |
478 | 473 | "source": [ |
479 | | - "### 4.2 Data preprocessing with serveless\n", |
| 474 | + "### 4.2 Data preprocessing with serverless\n", |
| 475 | + "\n", |
| 476 | + "Below, you’ll preprocess the images before passing them into the FaceNet model. Image pre-processing in a facial recognition context typically solves a few problems. These problems range from lighting differences, occlusion, alignment, and segmentation. Below, you’ll address segmentation and alignment.\n", |
480 | 477 | "\n", |
481 | | - "Below, you’ll pre-process the images before passing them into the FaceNet model. Image pre-processing in a facial recognition context typically solves a few problems. These problems range from lighting differences, occlusion, alignment, segmentation. Below, you’ll address segmentation and alignment.\n", |
482 | 478 | "First, you’ll solve the segmentation problem by finding the largest face in an image. This is useful as our training data does not have to be cropped for a face ahead of time.\n", |
| 479 | + "\n", |
483 | 480 | "Second, you’ll solve alignment. In photographs, it is common for a face to not be perfectly center aligned with the image. To standardize input, you’ll apply a transform to center all images based on the location of eyes and bottom lip.\n" |
484 | 481 | ] |
485 | 482 | }, |
|
491 | 488 | "\n", |
492 | 489 | "Upload dlib’s face landmark predictor into your COS bucket. You’ll use this face landmark predictor to find the location of the inner eyes and bottom lips of a face in an image. These coordinates will be used to center align the image.\n", |
493 | 490 | "\n", |
494 | | - "**You should run this only once. If preictor was already created in the previos run you can skip this section**\n" |
| 491 | + "**You should run this only once. If the predictor was already created in a previous run, you can skip this section.**\n" |
495 | 492 | ] |
496 | 493 | }, |
497 | 494 | { |
|
640 | 637 | "\n", |
641 | 638 | "Using Dlib, you detected the largest face in an image and aligned the center of the face by the inner eyes and bottom lip. This alignment is a method for standardizing each image for use as feature input.\n", |
642 | 639 | "\n", |
643 | | - "Verify that images were processed with dlib" |
| 640 | + "Verify that images were processed with dlib:" |
644 | 641 | ] |
645 | 642 | }, |
646 | 643 | { |
|
658 | 655 | "cell_type": "markdown", |
659 | 656 | "metadata": {}, |
660 | 657 | "source": [ |
| 658 | + "<a id=\"wml-setup\"></a>\n", |
661 | 659 | "## <span style=\"color:blue\">5. Setup for WML</span>\n", |
662 | 660 | "\n", |
663 | 661 | "Now that we've preprocessed the data, we’ll generate vector embeddings of each identity. These embeddings can then be used as input to a classification, regression, or clustering task. We will use TensorFlow to create the embeddings and then scikit-learn to create the classifier with these embeddings. However, before we do all this, some preliminary setup is needed.\n", |
|
915 | 913 | "cell_type": "markdown", |
916 | 914 | "metadata": {}, |
917 | 915 | "source": [ |
918 | | - "<a id=\"model\"></a>\n", |
| 916 | + "<a id=\"training-definitions\"></a>\n", |
919 | 917 | "## <span style=\"color:blue\">6. Create the training definitions</span>\n", |
920 | 918 | "\n", |
921 | 919 | "With us now connected to our WML service instance, we can now create the training definitions.\n", |
|
959 | 957 | " client.repository.DefinitionMetaNames.EXECUTION_COMMAND: \" \\\n", |
960 | 958 | " python3 train_classifier.py \\\n", |
961 | 959 | " --model-path $DATA_DIR/pretrained-model/20180402-114759.pb \\\n", |
962 | | - " --input-dir $DATA_DIR/processed-images \\\n", |
| 960 | + " --input-dir $DATA_DIR/output/images \\\n", |
963 | 961 | " --output-path $RESULT_DIR/output-classifier.pkl \\\n", |
964 | 962 | " --num-epochs 3\"\n", |
965 | 963 | "}" |
|
991 | 989 | "cell_type": "markdown", |
992 | 990 | "metadata": {}, |
993 | 991 | "source": [ |
994 | | - "The files in this zip file can be viewed in the GitHub <a href=\"https://github.com/IBM/data-pre-processing-with-pywren/tree/master/data/code\" target=\"_blank\" rel=\"noopener noreferrer\">repository</a>" |
| 992 | + "The files in this zip file can be viewed in the GitHub <a href=\"https://github.com/IBM/data-pre-processing-with-pywren/tree/master/data/code\" target=\"_blank\" rel=\"noopener noreferrer\">repository</a>." |
995 | 993 | ] |
996 | 994 | }, |
997 | 995 | { |
|
1384 | 1382 | "cell_type": "markdown", |
1385 | 1383 | "metadata": {}, |
1386 | 1384 | "source": [ |
1387 | | - "In this notebook, we used PyWren with IBM Cloud Functions to increase preprocessing performance and stored the resulting images in an IBM Cloud Object Storage bucket. From here, we used this bucket along with IBM Watson Machine Learning to create embeddings of each identity using a pretrained TensorFlow FaceNet model, and then create a custom face classifer. " |
| 1385 | + "In this notebook, we used PyWren with IBM Cloud Functions to increase preprocessing performance and stored the resulting images in an IBM Cloud Object Storage bucket. From here, we used this bucket along with IBM Watson Machine Learning to create embeddings of each identity using a pretrained TensorFlow FaceNet model, and then created a custom face classifer. " |
1388 | 1386 | ] |
1389 | 1387 | }, |
1390 | 1388 | { |
|
1412 | 1410 | "source": [ |
1413 | 1411 | "### Authors\n", |
1414 | 1412 | "\n", |
1415 | | - "**Gil Vernik?**\n", |
| 1413 | + "**Gil Vernik**\n", |
1416 | 1414 | "\n", |
1417 | 1415 | "**Paul Van Eck**" |
1418 | 1416 | ] |
|
0 commit comments