detach()
One example without detach():
from torchviz import make_dot x=torch.ones(2, requires_grad=True) y=2*x z=3+x r=(y+z).sum() make_dot(r)

The end result in green r is a root of the AD computational graph and in blue is the leaf tensor.
Another example with detach():
from torchviz import make_dot x=torch.ones(2, requires_grad=True) y=2*x z=3+x.detach() r=(y+z).sum() make_dot(r)

This is the same as:
from torchviz import make_dot x=torch.ones(2, requires_grad=True) y=2*x z=3+x.data r=(y+z).sum() make_dot(r)
But, x.data is the old way (notation), and x.detach() is the new way.
What is the difference with x.detach()
print(x) print(x.detach())
Out:
tensor([1., 1.], requires_grad=True) tensor([1., 1.])
So x.detach() is a way to remove requires_grad and what you get is a new detached tensor (detached from AD computational graph).
torch.no_grad
torch.no_grad is actually a class.
x=torch.ones(2, requires_grad=True) with torch.no_grad(): y = x * 2 print(y.requires_grad)
Out:
False
From help(torch.no_grad):
Disabling gradient calculation is useful for inference, when you are sure | that you will not call :meth:Tensor.backward(). It will reduce memory | consumption for computations that would otherwise have requires_grad=True. |
| In this mode, the result of every computation will have | requires_grad=False, even when the inputs have requires_grad=True.