1

I'm getting "can't optimize a non-leaf Tensor" on this bit of code

self.W_ch1 = nn.Parameter( torch.rand(encoder_feature_dim, encoder_feature_dim), requires_grad=True ).to(self.device) self.W_ch1_optimizer = torch.optim.Adam([self.W_ch1], lr=encoder_lr) 

Don't know why it's happening that should be the leaf tensor, because it has no children connected to it. It's just a torch.rand inside a nn.Parameter variable. It throws the error at the initialization of self.w_ch1_optmizer

enter image description here

6
  • Can you show the code where you are performing inference and backpropagation? Commented Jun 19, 2022 at 22:55
  • @Ivan actually it throws that error at the optimizer initialization. Not sure why. Commented Jun 19, 2022 at 23:00
  • The below code works for me. So I don't think there is a problem with this code. ``` W_ch1 = nn.Parameter(torch.rand(10,10), requires_grad=True) W_ch1_optimizer = torch.optim.Adam([W_ch1], lr=1e-3) ``` Commented Jun 19, 2022 at 23:43
  • @UmangGupta that's so weird. I added a screen shot to show that's exactly where it's breaking. Commented Jun 20, 2022 at 0:26
  • There may be something else wrong with your code that may be causing this to break. Can you run the two-line code that I wrote in the previous comment and check if you get the error? Also, which torch version? Commented Jun 20, 2022 at 0:32

1 Answer 1

4

The reason why it throws an error is that torch.Tensor.cuda has the effect of creating a reference for transferring the data doing so by registering a new node in the graph. In other words your parameter module W_ch1 is no longer a leaf node since you already have this "computation" tree:

nn.Parameter -> cuda:parameter = W_ch1 

You can compare the following two results:

>>> p = nn.Parameter(torch.rand(1)).cuda() >>> p.is_leaf False 

What you need to be doing is first instantiate your modules, and define your optimizer(s). Only then can you transfer them to the desired device. Not before:

>>> p = nn.Parameter(torch.rand(1)) >>> optimizer = optim.Adam([p], lr=lr) 

Then you can transfer everything:

>>> p.cuda() >>> optimizer.cuda() 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.