Projected gradient descent on probability simplex in pytorch

Question

I have a matrix A of dimension 1000x70000. my loss function includes A and I want to find optimal value of A using gradient descent where the constraint is that the rows of A remain in probability simplex (i.e. every row sums up to 1). I have initialised A as given below

A=np.random.dirichlet(np.ones(70000),1000) A=torch.tensor(A,requires_grad=True)

and my training loop looks like as given below

for epoch in range(500): y_pred=forward(X) y=model(torch.mm(A.float(),X)) l=loss(y,y_pred) l.backward() A.grad.data=-A.grad.data optimizer.step() optimizer.zero_grad() if epoch%2==0: print("Loss",l,"\n")

ayandas · Accepted Answer · 2021-12-01 03:51:24Z

An easy way to accomplish that is not to use A directly for computation but use a row normalized version of A.

# you can keep 'A' unconstrained A = torch.rand(1000, 70000, requires_grad=True)

then divide each row by its summation (keeping row sum always 1)

for epoch in range(500): y_pred = forward(X) B = A / A.sum(-1, keepdim=True) # normalize rows manually y = model(torch.mm(B, X)) l = loss(y,y_pred) ...

So now, at each step, B is the constrained matrix - i.e. the quantity of your interest. However, the optimization would still be on (unconstrained) A.

Edit: @Umang Gupta remined me in the comment section that OP wanted to have "probability simplex" which means there would be another constraint, i.e. A >= 0.

To accomplish that, you may simply apply some appropriate activation function (e.g. torch.exp, torch.sigmoid) on A in each iteration

A_ = torch.exp(A) B = A_ / A_.sum(-1, keepdim=True) # normalize rows

the exact choice of function depends on the behaviour of training dynamics which needs to be experimented with.

Collectives™ on Stack Overflow

Projected gradient descent on probability simplex in pytorch

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related