137

How does torch.nn.Parameter() work?

0

3 Answers 3

191

I will break it down for you. Tensors, as you might know, are multi dimensional matrices. Parameter, in its raw form, is a tensor i.e. a multi dimensional matrix. It sub-classes the Variable class.

The difference between a Variable and a Parameter comes in when associated with a module. When a Parameter is associated with a module as a model attribute, it gets added to the parameter list automatically and can be accessed using the 'parameters' iterator.

Initially in Torch, a Variable (which could for example be an intermediate state) would also get added as a parameter of the model upon assignment. Later on there were use cases identified where a need to cache the variables instead of having them added to the parameter list was identified.

One such case, as mentioned in the documentation is that of RNN, where in you need to save the last hidden state so you don't have to pass it again and again. The need to cache a Variable instead of having it automatically register as a parameter to the model is why we have an explicit way of registering parameters to our model i.e. nn.Parameter class.

For instance, run the following code -

import torch import torch.nn as nn from torch.optim import Adam class NN_Network(nn.Module): def __init__(self,in_dim,hid,out_dim): super(NN_Network, self).__init__() self.linear1 = nn.Linear(in_dim,hid) self.linear2 = nn.Linear(hid,out_dim) self.linear1.weight = torch.nn.Parameter(torch.zeros(in_dim,hid)) self.linear1.bias = torch.nn.Parameter(torch.ones(hid)) self.linear2.weight = torch.nn.Parameter(torch.zeros(in_dim,hid)) self.linear2.bias = torch.nn.Parameter(torch.ones(hid)) def forward(self, input_array): h = self.linear1(input_array) y_pred = self.linear2(h) return y_pred in_d = 5 hidn = 2 out_d = 3 net = NN_Network(in_d, hidn, out_d) 

Now, check the parameter list associated with this model -

for param in net.parameters(): print(type(param.data), param.size()) """ Output <class 'torch.FloatTensor'> torch.Size([5, 2]) <class 'torch.FloatTensor'> torch.Size([2]) <class 'torch.FloatTensor'> torch.Size([5, 2]) <class 'torch.FloatTensor'> torch.Size([2]) """ 

Or try,

list(net.parameters()) 

This can easily be fed to your optimizer -

opt = Adam(net.parameters(), learning_rate=0.001) 

Also, note that Parameters have require_grad set by default.

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for the nice explanation. I have one quick question regarding the code you provided. As self.linear2 Linear net has the (hid,out_dim) as its input and output dimension, and how does its corresponding parameters self.linear2.weight has the dimension (in_dim, hid) as in torch.zeros(in_dim,hid)? thank you
If I can turn off gradient computation via require_grad=False what is the point of having the Parameter?
@anurag Parameter is the correct way to tell Pytorch that some parameters are learnable. require_grad is the flag telling Pyotrch whether you want to modify the parameter.
@KatherineChen I'm not the author of that code but per inspection, it's most probably a typo. As soon as you run the network it will fail. Here's my correction for it: self.linear1.weight = torch.nn.Parameter(torch.zeros(hid, in_dim)) self.linear2.weight = torch.nn.Parameter(torch.zeros(out_dim,hid)) self.linear2.bias = torch.nn.Parameter(torch.ones(out_dim))
Note when author talking about Variable, it's just a 'Tensor' with requires_grad set to True, and Parameter makes a differentiation between temporary Variable (with fixed values or variable at forward pass) and actual learnable parameter. pytorch.org/docs/stable/autograd.html#variable-deprecated
|
56

Recent PyTorch releases just have Tensors, it came out the concept of the Variable has been deprecated.

Parameters are just Tensors limited to the module they are defined in (in the module constructor __init__ method).

They will appear inside module.parameters(). This comes handy when you build your custom modules that learn thanks to these parameters gradient descent.

Anything that is true for the PyTorch tensors is true for parameters, since they are tensors.

Additionally, if a module goes to the GPU, parameters go as well. If a module is saved parameters will also be saved.

There is a similar concept to model parameters called buffers.

These are named tensors inside the module, but these tensors are not meant to learn via gradient descent, instead you can think these are like variables. You will update your named buffers inside module forward() as you like.

For buffers, it is also true they will go to the GPU with the module, and they will be saved together with the module.

enter image description here

3 Comments

Are Parameters only limited being used in __init__()?
No, but it is most common to define them inside the __init__ method.
If we want, can the parameters be learned during the training process using gradient descent ?..... If yes, then what is the difference between dense layer having n neurons and a parameter tensor containing n elements ?
11

torch.nn.Parameter is used to explicitly specify which tensors should be treated as the model's learnable parameters. So that those tensors are learned (updated) during the training process to minimize the loss function.

For example, if you are creating a simple linear regression using Pytorch then, in "W * X + b", W and b need to be nn.Parameter.

weight = torch.nn.Parameter(torch.rand(1))

bias = torch.nn.Parameter(torch.rand(1))

Here, I have randomly created 1 value for weight and bias each which will be of type float32, and assigned it to torch.nn.Parameter.

2 Comments

In this documentation here pytorch.org/docs/stable/generated/… it says that require_grad will still be true in no_grad state. I am confused. Does it mean that the gradients will be calculated in the no_grad state too?
@Dev_Man, no it isn't related to gradient calculation. Instead, it's a small clarification that it does not matter if you create a parameter inside or outside no_grad context — the default value for requires_grad of this parameter will be True. So, no_grad does not change default values for Parameter init func.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.