0

I'm struggling with the basics of writing a tensorflow tfrecord file. I'm writing a simple example with an ndarray in python, but for some reason when I read it it's required to be variable-length and reads it as a SparseTensor.

Here's the example

def serialize_tf_record(features, targets): record = { 'shape': tf.train.Int64List(value=features.shape), 'features': tf.train.FloatList(value=features.flatten()), 'targets': tf.train.Int64List(value=targets), } return build_tf_example(record) def deserialize_tf_record(record): tfrecord_format = { 'shape': tf.io.VarLenFeature(tf.int64), 'features': tf.io.VarLenFeature(tf.float32), 'targets': tf.io.VarLenFeature(tf.int64), } features_tensor = tf.io.parse_single_example(record, tfrecord_format) return features_tensor 

Can anybody explain to me why this writes a variable-length record? It is fixed in code, but I can't seem to write it in a way tensorflow knows its fixed. The tensorflow documentation is pretty horrific here. Can anybody clarify the API for me?

1 Answer 1

1

You should provide more contextual code, like your build_tf_example function and examples of your features and targets.

Here is an example which return Dense Tensors:

 import numpy as np import tensorflow as tf def build_tf_example(record): return tf.train.Example(features=tf.train.Features(feature=record)).SerializeToString() def serialize_tf_record(features, targets): record = { 'shape': tf.train.Feature(int64_list=tf.train.Int64List(value=features.shape)), 'features': tf.train.Feature(float_list=tf.train.FloatList(value=features.flatten())), 'targets': tf.train.Feature(int64_list=tf.train.Int64List(value=targets)), } return build_tf_example(record) def deserialize_tf_record(record): tfrecord_format = { 'shape': tf.io.FixedLenSequenceFeature((), dtype=tf.int64, allow_missing=True), 'features': tf.io.FixedLenSequenceFeature((), dtype=tf.float32, allow_missing=True), 'targets': tf.io.FixedLenSequenceFeature((), dtype=tf.int64, allow_missing=True), } features_tensor = tf.io.parse_single_example(record, tfrecord_format) return features_tensor def main(): features = np.zeros((3, 5, 7)) targets = np.ones((4,), dtype=int) tf.print(deserialize_tf_record(serialize_tf_record(features, targets))) if __name__ == '__main__': main() 
  • I converted record to a dictionary of Features (to easily serialize it)
  • From what I understood, each of your feature can be an array (opposed to a scalar value), hence you can parse it using FixedLenSequenceFeature input feature to build a dense tensor instead of a sparse one.
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.