0

I'm trying to optimize the input pipeline for .h5 data with tf.data. But I encountered a TypeError: expected str, bytes or os.PathLike object, not Tensor. I did a research but can't find anything about converting a tensor of string to string.

This simplified code is executable and return the same error:

batch_size = 1000 conv_size = 3 nb_conv = 32 learning_rate = 0.0001 # define parser function def parse_function(fname): with h5py.File(fname, 'r') as f: #Error comes from here X = f['X'].reshape(batch_size, patch_size, patch_size, 1) y = f['y'].reshape(batch_size, patch_size, patch_size, 1) return X, y # create a list of files path flist = [] for dirpath, _, fnames in os.walk('./proc/'): for fname in fnames: if fname.startswith('{}_{}'.format(patch_size, batch_size)) and fname.endswith('h5'): flist.append(fname) # prefetch data dataset = tf.data.Dataset.from_tensor_slices((flist)) dataset = dataset.shuffle(len(flist)) dataset = dataset.map(parse_function, num_parallel_calls=4) dataset = dataset.batch(1) dataset = dataset.prefetch(3) # simplest model that I think of X_ph = tf.placeholder(tf.float32, shape=None) y_ph = tf.placeholder(tf.float32, shape=None) W = tf.get_variable('w', shape=[conv_size, conv_size, 1, 1], initializer=tf.contrib.layers.xavier_initializer()) loss = tf.reduce_mean(tf.losses.mean_squared_error(tf.nn.softmax(labels=y_ph, predictions=tf.matmul(X_ph, W)))) train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss) # start session with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print(sess.run(train_op, feed_dict={X_ph: dataset[0], y_ph: dataset[1]})) 

Apparently the fname is a tensor of string but the positional argument waits for only a string. I can't find any documentation on this. And the answer of another post doesn't solve this problem. In my case, I work only with h5 where one h5 store one batch.


Update Solution: Thanks to the comment of @kvish, the part of loading .h5 file is solved. The code is upgraded with a simple conv layer, the placeholders have been taken. Each .h5 is one batch. I want to prefetch in parallele multiple batches(h5py doesn't support multithread reading so I write batches into multiple files). One can copy-paste-and-launch:

import h5py import threading import numpy as np import tensorflow as tf # generate some img data for i in range(5): with h5py.File('./test_{}.h5'.format(i), 'w') as f: f.create_dataset('X', shape=(1000, 100, 100), dtype='float32', data=np.random.rand(10**7).reshape(1000, 100, 100)) f.create_dataset('y', shape=(1000, 100, 100), dtype='float32', data=np.random.rand(10**7).reshape(1000, 100, 100)) print(threading.get_ident()) # params num_cores = 3 shuffle_size = 1 batch_size = 1 # read .h5 file def parse_file(f): print(f.decode('utf-8')) with h5py.File(f.decode("utf-8"), 'r') as fi: X = fi['X'][:].reshape(1000, 100, 100, 1) y = fi['y'][:].reshape(1000, 100, 100, 1) print(threading.get_ident()) # to see the thread id return X, y # py_func wrapper def parse_file_tf(filename): return tf.py_func(parse_file, [filename], [tf.float32, tf.float32]) # tf.data input pipeline files = tf.data.Dataset.list_files('./test_*.h5') dataset = files.map(parse_file_tf, num_parallel_calls=num_core) dataset = dataset.batch(batch_size).shuffle(shuffle_size).prefetch(3) it = dataset.make_initializable_iterator() iter_init_op = it.initializer X_it, y_it = it.get_next() # simplest model that I can think of with tf.name_scope("Conv1"): W = tf.get_variable("W", shape=[3, 3, 1, 1], initializer=tf.contrib.layers.xavier_initializer()) b = tf.get_variable("b", shape=[1], initializer=tf.contrib.layers.xavier_initializer()) layer1 = tf.nn.conv2d(X_it, W, strides=[1, 1, 1, 1], padding='SAME') + b out = tf.nn.relu(layer1) loss = tf.reduce_mean(tf.losses.mean_squared_error(labels=y_it, predictions=out)) train_op = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss) # session sess = tf.Session() sess.run(tf.global_variables_initializer()) sess.run(iter_init_op) sess.run([train_op]) sess.close() 

Somehow there will be another cudnn issue which isn't related to this post.

tensorflow-cpu v1.12: work fine

tensorflow-gpu v1.12: runtime issue happens

Traceback (most recent call last): File "/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked! [[{{node Conv1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients/Conv1/Conv2D_grad/Conv2DBackpropFilter-0-TransposeNHWCToNCHW-LayoutOptimizer, W/read)]] [[{{node mean_squared_error/num_present/broadcast_weights/assert_broadcastable/AssertGuard/Assert/Switch_2/_37}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_63_me...t/Switch_2", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]] tensorflow-cpu v1.12: works fine!

6
  • You can take a look at py_func, which helps you run python functions on tensors. Commented Mar 27, 2019 at 0:17
  • @kvish Could you please be more explicit? I wrap the parse_function with it? Commented Mar 27, 2019 at 8:12
  • I have added an example as an answer! Let me know if that works! Commented Mar 27, 2019 at 14:52
  • @kvish I have an update for the issue Commented Mar 27, 2019 at 16:31
  • sorry I am held up a bit with work! I will take a look at it today! Commented Mar 28, 2019 at 15:14

1 Answer 1

1

Here is an example of how you can wrap the function with the help of py_func. Do note that this is deprecated in TF V2. You can follow the documentation for further details.

def parse_function_wrapper(filename): # Assuming your data and labels are float32 # Your input is parse_function, who arg is filename, and you get X and y as output # whose datatypes are indicated by the tuple argument features, labels = tf.py_func( parse_function, [filename], (tf.float32, tf.float32)) return features, labels # Create dataset of filenames. dataset = tf.data.Dataset.from_tensor_slices(flist) dataset = dataset.shuffle(len(flist)) dataset = dataset.map(parse_function_wrapper) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.