4

I trained a small CNN on my GPU using NCHW data format, now I want to export a .pb file that I can then use to do inference in other applications.

I wrote a small helper function to call Tensorflow's freeze_graph function with default values, given a directory containing the checkpoint files and graph.pbtxt:

import os import argparse #os.environ['CUDA_VISIBLE_DEVICES'] = '-1' import tensorflow as tf dir(tf.contrib) #fix for tf.contrib undefined ops bug from tensorflow.python.tools.freeze_graph import freeze_graph def my_freeze_graph_2(model_dir, output_node_names): """Extract the sub graph defined by the output nodes and convert all its variables into constant Args: model_dir: the root folder containing the checkpoint state file output_node_names: a string, containing all the output node's names, comma separated """ if not tf.gfile.Exists(model_dir): raise AssertionError( "Export directory doesn't exists. Please specify an export " "directory: %s" % model_dir) if not output_node_names: print("You need to supply the name of a node to --output_node_names.") return -1 # We retrieve our checkpoint fullpath checkpoint = tf.train.get_checkpoint_state(model_dir) input_checkpoint = checkpoint.model_checkpoint_path # We precise the file fullname of our freezed graph absolute_model_dir = os.path.abspath(model_dir) output_graph = os.path.join(absolute_model_dir, "frozen_model.pb") freeze_graph(input_graph=os.path.join(model_dir, 'graph.pbtxt'), input_saver='', input_binary=False, input_checkpoint=input_checkpoint, output_node_names=output_node_names, restore_op_name="save/restore_all", filename_tensor_name="save/Const:0", output_graph=output_graph, clear_devices=True, initializer_nodes='') 

I then have a small script that attempts to build the graph from frozen_model.pb to test that the freezing actually worked:

import os #os.environ['CUDA_VISIBLE_DEVICES'] = '-1' import argparse import tensorflow as tf from freeze_graph import load_graph import cv2 if __name__ == '__main__': # Let's allow the user to pass the filename as an argument parser = argparse.ArgumentParser() parser.add_argument("--frozen_model_filename", default="model-multiple_starts/frozen_model.pb", type=str, help="Frozen model file to import") args = parser.parse_args() # We use our "load_graph" function graph = load_graph(args.frozen_model_filename) # We can verify that we can access the list of operations in the graph for op in graph.get_operations(): print(op.name) # We access the input and output nodes x = graph.get_tensor_by_name('prefix/Reshape:0') y = graph.get_tensor_by_name('prefix/softmax_tensor:0') # We launch a Session with tf.Session(graph=graph, config=tf.ConfigProto(log_device_placement=True)) as sess: # Note: we don't nee to initialize/restore anything # There is no Variables in this graph, only hardcoded constants # Load an image to use as test im = cv2.imread('57_00000000.png', cv2.IMREAD_GRAYSCALE) im = im.T im = im / 255 - 0.5 im = im[None,:,:,None] y_out = sess.run(y, feed_dict={ x: im }) print(y_out) 

If I try to run my test script, I get the following error:

InvalidArgumentError: CPU BiasOp only supports NHWC. [[Node: prefix/conv2d/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](prefix/conv2d/convolution, prefix/conv2d/bias/read)]]

I tried different configurations:

  • generate .pb file from CPU-only script, run on CPU-only
  • generate .pb file from script with GPU visible, run with GPU visible
  • generate .pb file from CPU-only script, run with GPU visible

All of them raise the same error.

The problem lies in the fact that the checkpoint which I want to freeze has operations defined with data_format='NCHW'. How do I freeze the checkpoint with NHWC data format?

Update:

Poking around the files, I see that in graph.pbtxt for many operations data_format is hardcoded to NCHW. I guess, then, I'll need to make a new model with NHWC format, selectively load from the checkpoint the weights for the layers and use that graph to manually save out a .pb file... I'd assume there would be a process to do this already, but I can't find any documentation about this, nor examples.

Update 2:

After trying to import the .pb files in OpenCV's DNN module, I found out the following:

  • Freezing together a checkpoint from a training with data format NCHW and a graph.pbtxt with data format NHWC leads to an unusable .pb file. I haven't found the exact reason yet, but converting the .pb to .pbtxt and comparing it to a working frozen graph, the files differ only in the values stored in the weights and biases constants.
  • Freezing together a checkpoint from a training and a graph.pbtxt both with data format NHWC produces a working frozen graph.

It seems, then, that checkpoints are not transferable between graphs with different data formats (even if no error or warning is raised during the freezing process).

2
  • So are you trying to "save" your model to be able to load it in the future and do predictions? Commented Nov 9, 2017 at 3:32
  • Yes, that's the freezing part. Commented Nov 9, 2017 at 7:47

1 Answer 1

1
+50

Typically, you'll want to wrap graph construction up in functions, so that you can rebuild your graph conditionally for the prediction case, because usually quite a few pieces of the graph change from training to prediction. As you've discovered NCHW and NWHC versions of, for example the convolutional layers, are actually different Ops in the graph proto, and they are hardcoded this way because GPU optimizations are only possible for one of the formats.

Editing graph protos is very difficult to do correctly, which is why most TensorFlow code that does this follows the pattern I described above. At a very high level:

def build_graph(data_format='NCHW'): # Conditionally use proper ops based on data_format arg training_graph = tf.Graph() with training_graph.as_default(): build_graph(data_format='NCHW') with tf.Session() as sess: # train # checkpoint session prediction_graph = tf.Graph() with prediction_graph.as_default(): build_graph(data_format='NHWC') # load checkpoint # freeze graph 

Note that the tf.estimator.Estimator framework makes this relatively easy. You can use the mode argument in your model_fn to decide between data formats and then have two different input_fns for training and prediction, and the framework will do the rest. You can find an end to end example of this here: https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10_estimator/cifar10_main.py#L77 (I've linked to the relevant lines)

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for your answer. My problem is, I'll use OpenCV's DNN module to run inference, not Tensorflow. For that, I need a .pb file in NHWC format, which I thought I could get by generating a graph definition with the correct format and using that, plus the checkpoint of my training (which was done with NCHW), in freeze_graph() to get out the .pb file. As I detail in my 2nd update to the question, this is not working, because it seems the checkpoint itself is not compatible
Instead of manually trying to freeze together the graph.pbtxt with NHWC and the checkpoint with NCHW, can you try building a new graph context (with OpenCV's DNN module) then loading the checkpoint, and then use the graph proto from that graph context to freeze. It's possible your graph protos and checkpoints are compatible, but things are not named deterministicly and so there are some incompatibilities in names.
I'm not sure I understand what I should do. How would I build a graph context with opencv's dnn module?
However you built the graph in the first place? Can you point me to the call your are making in the OpenCV DNN library? That might help me clarify things.
Sorry that was supposed to be "how did you build the graph in the first place?"
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.