From the documentation:
Parameters are Tensor subclasses, that have a very special property when used with Modules - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as Parameter, these temporaries would get registered too.
Think for example when you initialize an optimizer:
optim.SGD(model.parameters(), lr=1e-3)
The optimizer will update only registered Parameters of the model.
Variables are still present in Pytorch 0.4 but they are deprecated. From the docs:
The Variable API has been deprecated: Variables are no longer necessary to use autograd with tensors. Autograd automatically supports Tensors with requires_grad set to True.
Pytorch pre-0.4
In Pytorch before version 0.4 one needed to wrap a Tensor in a torch.autograd.Variable in order to keep track of the operations applied to it and perform differentiation. From the docs of Variable in 0.3:
Wraps a tensor and records the operations applied to it. Variable is a thin wrapper around a Tensor object, that also holds the gradient w.r.t. to it, and a reference to a function that created it. This reference allows retracing the whole chain of operations that created the data. If the Variable has been created by the user, its grad_fn will be None and we call such objects leaf Variables. Since autograd only supports scalar valued function differentiation, grad size always matches the data size. Also, grad is normally only allocated for leaf variables, and will be always zero otherwise.
The difference wrt Parameter was more or less the same. From the docs of Parameters in 0.3:
A kind of Variable that is to be considered a module parameter. Parameters are Variable subclasses, that have a very special property when used with Modules - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in parameters() iterator. Assigning a Variable doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as Parameter, these temporaries would get registered too.
Another difference is that parameters can’t be volatile and that they require gradient by default.