Skip to content

RuntimeError in test_step with multi-GPU: Model input shape mismatch #21800

@innat

Description

@innat

Say,

  • I've dataloader gives input shape batch_size, 2 * model_input_shape
  • Build model batch_size, model_input_shape
  • In order to run model.evaluate(dataloader), we override test_step.

This works in single GPU. But doesn't on multi-GPU (possibly will fail in TPU as well). I've tested the code with keras-nightly.

strategy = tf.distribute.MirroredStrategy() class CustomModel(keras.Model): def __init__(self, model, split=1, *args, **kwargs): super().__init__(*args, **kwargs) self.model = model self.split = split def call(self, inputs, training=None): return self.model(inputs, training=None) def test_step(self, data): x, y = data x_list = ops.split( x, indices_or_sections=self.split, axis=-1 ) results = [] for x_in_list in x_list: y_pred = self(x_in_list, training=False) results.append(y_pred) y_pred = ops.mean( ops.stack(results, axis=0), axis=0 ) loss = self.compute_loss(y=y, y_pred=y_pred) self.compiled_metrics.update_state( y, y_pred ) return {m.name: m.result() for m in self.metrics} inputs = keras.Input(shape=(32,)) outputs = keras.layers.Dense(1)(inputs) with strategy.scope(): model = keras.Model(inputs, outputs) custom_model = CustomModel(model, split=2) custom_model.compile(loss="mse", metrics=["mae"]) x = np.random.random((10, 64)) y = np.random.random((10, 1)) custom_model.evaluate(x, y)
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /tmp/ipykernel_37/3196480519.py in <cell line: 0>() 42 x = np.random.random((10, 64)) 43 y = np.random.random((10, 1)) ---> 44 custom_model.evaluate(x, y) /usr/local/lib/python3.11/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs) 120 # To get the full stack trace, call: 121 # `keras.config.disable_traceback_filtering()` --> 122 raise e.with_traceback(filtered_tb) from None 123 finally: 124 del filtered_tb /usr/local/lib/python3.11/dist-packages/keras/src/trainers/trainer.py in _symbolic_build(self, iterator, data_batch) 1098 y_pred = backend.compute_output_spec(self, x, training=False) 1099 except Exception as e: -> 1100 raise RuntimeError( 1101 "Unable to automatically build the model. " 1102 "Please build it yourself before calling " RuntimeError: Unable to automatically build the model. Please build it yourself before calling fit/evaluate/predict. A model is 'built' when its variables have been created and its `self.built` attribute is True. Usually, calling the model on a batch of data is the right way to build it. Exception encountered: 'Exception encountered when calling CustomModel.call().  Input 0 of layer "functional" is incompatible with the layer: expected shape=(None, 32), found shape=(5, 64)  Arguments received by CustomModel.call():  • inputs=tf.Tensor(shape=(5, 64), dtype=float32)  • training=False'

Again, running without strategy (single GPU), code works.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions