0

I'm trying to run a demo of TF Object Detection model with Faster RCNN on Google Colab Pro GPU (RAM: 25GB, Disk: 147GB), but it fails and gives me the following error:

Tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 7.18GiB (rounded to 7707033600)requested by op MultiLevelMatMulCropAndResize/MultiLevelRoIAlign/AvgPool-0-TransposeNHWCToNCHW-LayoutOptimizer If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 

Then it gives me these stats:

I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 7.46GiB I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 15034482688 memory_limit_: 16183459840 available bytes: 1148977152 curr_region_allocation_bytes_: 8589934592 I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: Limit: 16183459840 InUse: 8013051904 MaxInUse: 8081602560 NumAllocs: 6801 MaxAllocSize: 7707033600 Reserved: 0 PeakReserved: 0 LargestFreeBlock: 0 

And

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2400,1024,28,28] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node MultiLevelMatMulCropAndResize/MultiLevelRoIAlign/AvgPool-0-TransposeNHWCToNCHW-LayoutOptimizer}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [Op:__inference__dummy_computation_fn_32982] 

I don't really understand why it runs out of memory allocating only 7GB on a 25GB system? How can I fix it? Here is my config file for this task:

# Faster R-CNN with Resnet-50 (v1) # Trained on COCO, initialized from Imagenet classification checkpoint # Achieves -- mAP on COCO14 minival dataset. # This config is TPU compatible. model { faster_rcnn { num_classes: 7 image_resizer { keep_aspect_ratio_resizer { min_dimension: 640 max_dimension: 640 pad_to_max_dimension: true } } feature_extractor { type: 'faster_rcnn_resnet50_keras' batch_norm_trainable: true } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 16 width_stride: 16 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } share_box_across_classes: true } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 300 } score_converter: SOFTMAX } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 use_static_shapes: true use_matmul_crop_and_resize: true clip_anchors_to_image: true use_static_balanced_label_sampler: true use_matmul_gather_in_matcher: true } } train_config: { batch_size: 8 sync_replicas: true startup_delay_steps: 0 replicas_to_aggregate: 8 num_steps: 25000 optimizer { momentum_optimizer: { learning_rate: { cosine_decay_learning_rate { learning_rate_base: .04 total_steps: 25000 warmup_learning_rate: .013333 warmup_steps: 2000 } } momentum_optimizer_value: 0.9 } use_moving_average: false } fine_tune_checkpoint_version: V2 fine_tune_checkpoint: "faster_rcnn_resnet50_v1_640x640_coco17_tpu-8/checkpoint/ckpt-0" fine_tune_checkpoint_type: "detection" data_augmentation_options { random_horizontal_flip { } } max_number_of_boxes: 100 unpad_groundtruth_tensors: false use_bfloat16: true # works only on TPUs } train_input_reader: { label_map_path: "label_map.pbtxt" tf_record_input_reader { input_path: "train.record" } } eval_config: { metrics_set: "coco_detection_metrics" use_moving_averages: false batch_size: 1; } eval_input_reader: { label_map_path: "label_map.pbtxt" shuffle: false num_epochs: 1 tf_record_input_reader { input_path: "test.record" } } 
3
  • The GPU seems to have only 16 GB of RAM, and around 8 GB is already allocated, so its not a case of allocating 7 GB of 25 GB, because some RAM is already allocated already, this is a very common misconception, allocations do not happen on a vacuum. Also, there is no code or anything here that we can suggest to change. Commented Jul 11, 2021 at 18:12
  • @Dr.Snoopy Thanks for the comment, I just edit to add the config file I used to train this model. This task doesn't involve codes to build the model since I only use the Object Detection API. Second, the resource allocation on my Google Colab says that I have 24GB of GPU, is there any way to make use of that 24GB then? Thank you! Commented Jul 11, 2021 at 18:35
  • Ah I just realized it's because of the images in a sample that take up a lot of memory, I changed batch-size to 2 and it worked! Commented Jul 11, 2021 at 18:53

2 Answers 2

1

I also ran into the same problem. It took me a week to figure it out. I am using colab pro+. tesla p100 GPU(16GB) was allocated to me. My image dim is (256,256,4) and the batch size is 32. The thing is when we design a architecture we don't think about the size of params until we ran into problems like RESOURCE EXHAUST ERROR. Then we make some changes, trying to minimize the params. But there is also another factor that takes up the memory. That is in my case I use four separate variables to hold the tensors.

 block_0=layers.Conv2D(filters=32, kernel_size=(1,1),strides=(1, 1), activation="LeakyReLU",padding='same',name='block_5_layer_1'))(x) block_1=tfa.layers.SpectralNormalization(layers.Conv2D(filters=64, kernel_size=(3,3),strides=(1,1), activation="LeakyReLU",padding='same',name='block_5_layer_2'))(x) block_1=tfa.layers.SpectralNormaliza``tion(layers.Conv2D(filters=64, kernel_size=(5,5),strides=(1, 1),padding='same',name='block_5_layer_3'))(block_1) # block_1=layers.BatchNormalization()(block_1) block_1=layers.Activation('LeakyReLU')(block_1) block_1=layers.Dropout(0.7)(block_1) block_2=tfa.layers.SpectralNormalization(layers.Conv2D(filters=32, kernel_size=(1,1),strides=(1, 1), activation="LeakyReLU",padding='same',name='block_5_layer_4'))(x) block_2=tfa.layers.SpectralNormalization(layers.Conv2D(filters=96, kernel_size=(3,3),strides=(1, 1), activation="LeakyReLU",padding='same',name='block_5_layer_5'))(block_2) block_2=layers.Dropout(0.7)(block_2) block_3=layers.MaxPool2D(pool_size=(3,3),strides=(1,1 ),padding='same',name='block_5_maxpool_1') (x) block_3=tfa.layers.SpectralNormalization(layers.Conv2D(filters=64, kernel_size=(3,3),strides=(1, 1), activation="LeakyReLU",padding='same',name='block_5_layer_6'))(block_3) x = layers.concatenate( [block_0, block_1, block_2, block_3], axis=3, name='block_5') 

For me RESOURCE EXHAUST ERROR occurred at the very first layer. The resulted shape is (32,256,256,352) which is huge and I suppose those tensors are stored in the GPU itself. So this takes up lots of space and that is why TensorFlow can't allocate memory to the layers. And when I reduced its dims it worked. So I think we should also consider the shape of the variables holding the convoluted images along with the param size. Correct me if I am wrong.

Sign up to request clarification or add additional context in comments.

Comments

0

I realized it's the problem with the images taking up too much memory in a sample size, as per https://github.com/tensorflow/models/issues/1817, so I went and changed my batch size to 2 and it worked

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.