0

I am trying to replicate a GAN study (Stargan-V2). So, I want to train a model (using less data) in Google Colab. But, I got this problem:

Start training... /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) Traceback (most recent call last): File "main.py", line 182, in <module> main(args) File "main.py", line 59, in main solver.train(loaders) File "/content/drive/My Drive/stargan-v2/core/solver.py", line 131, in train nets, args, x_real, y_org, y_trg, x_refs=[x_ref, x_ref2], masks=masks) File "/content/drive/My Drive/stargan-v2/core/solver.py", line 259, in compute_g_loss x_rec = nets.generator(x_fake, s_org, masks=masks) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/stargan-v2/core/model.py", line 181, in forward x = block(x, s) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/stargan-v2/core/model.py", line 117, in forward out = self._residual(x, s) File "/content/drive/My Drive/stargan-v2/core/model.py", line 109, in _residual x = F.interpolate(x, scale_factor=2, mode='nearest') File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 3132, in interpolate return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors) RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.90 GiB total capacity; 14.73 GiB already allocated; 195.88 MiB free; 14.89 GiB reserved in total by PyTorch) 

I changed batch_size but It didn't work for me. Did you have any idea? How can I fix this problem?

Paper: StarGAN v2: Diverse Image Synthesis for Multiple Domains

Original github repo: stargan-v2

3
  • What GAN study are you trying to replicate? I can revise my answer to address it specifically if you give me the name. Commented Nov 16, 2020 at 16:30
  • 1
    @micpap25 Stargan-V2 arxiv.org/pdf/1912.01865.pdf However, the problem was related to my not premium colab account. Even though I tried different methods to be able to run it on free colab, I could not succeed. Commented Aug 23, 2022 at 2:51
  • i am also trying to run Llama-2-7b in the free version of Google Colab on a T4 instance, but keep getting a very similar CUDA out of memory error (in terms of memory allocated, reserved, etc.). i also tried reducing batch size to 1 but to no avail. Is it impossible to run this on a free Colab version at all ? Commented Dec 11, 2023 at 19:37

1 Answer 1

1

If you aren't using the Pro version of Google Colab, then you're going to run into somewhat restrictive maximums for your memory allocation. From the Google Colab FAQ...

The amount of memory available in Colab virtual machines varies over time (but is stable for the lifetime of the VM)... You may sometimes be automatically assigned a VM with extra memory when Colab detects that you are likely to need it. Users interested in having more memory available to them in Colab, and more reliably, may be interested in Colab Pro.

You already have a good grasp of this issue, since you understand that lowering batch_size is a good way to get around it for a little while. Ultimately, though, if you want to replicate this study, you'll have to switch to a training method that can accommodate for the amount of data you seem to need.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.