The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

Question

When I wanna assign part of pre-trained model parameters to another module defined in a new model of PyTorch, I got two different outputs using two different methods.

The Network is defined as follows:

class Net: def __init__(self): super(Net, self).__init__() self.resnet = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True) self.resnet = nn.Sequential(*list(self.resnet.children())[:-1]) self.freeze_model(self.resnet) self.classifier = nn.Sequential( nn.Dropout(), nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, 3), ) def forward(self, x): out = self.resnet(x) out = out.flatten(start_dim=1) out = self.classifier(out) return out

What I want is to assign pre-trained parameters to classifier in the net module. Two different ways were used for this task.

# First way net.load_state_dict(torch.load('model_CNN_pretrained.ptl')) # Second way params = torch.load('model_CNN_pretrained.ptl') net.classifier[1].weight = nn.Parameter(params['classifier.1.weight'], requires_grad =False) net.classifier[1].bias = nn.Parameter(params['classifier.1.bias'], requires_grad =False) net.classifier[3].weight = nn.Parameter(params['classifier.3.weight'], requires_grad =False) net.classifier[3].bias = nn.Parameter(params['classifier.3.bias'], requires_grad =False)

The parameters were assigned correctly but got two different outputs from the same input data. The first method works correctly, but the second doesn't work well. Could some guys point what the difference of these two methods?

net.classifier[4].bias - is it intentionally [4] instead of [3]? or just a typo in your question? — Shai
– Shai, Commented Dec 3, 2020 at 15:11

boyden · Accepted Answer · 2020-12-04 03:08:59Z

Finally, I find out where is the problem.

During the pre-trained process, buffer parameters in BatchNorm2d Layer of ResNet18 model were changed even if we set require_grad of parameters False. Buffer parameters were calculated by the input data after model.train() was processed, and unchanged after model.eval().

There is a link about how to freeze the BN layer.

How to freeze BN layers while training the rest of network (mean and var wont freeze)

Collectives™ on Stack Overflow

The difference of loading model parameters between load_state_dict and nn.Parameter in pytorch

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related