TL;DR
It's not clear that the coefficients $b_i$ and $\theta_i$ have the same meanings in the "multi-task logistic regression" (MTLR) formula for $P(\vec y|\vec x)$ as they do in the formula for the probability $P(T \ge t_i|\vec x)$. The way MTLR is parameterized, they don't have to. MTLR is set up similarly to how state probabilities are presented in statistical mechanics: a set of values, one for each state, normalized by the partition function, the sum of all the values for the individual states.
MTLR is an attempt to re-invent discrete-time survival models with a focus on the survival function over time, $S(t)$. That leads to some awkwardness that isn't present in standard discrete-time models that focus on hazards. It's not clear that MTLR does anything that a standard discrete-time model can't do.
Standard discrete-time survival
Principles and methods of discrete-time survival analysis are covered in detail for example in Tutz and Schmid, Modeling Discrete Time-to-Event Data (Springer, 2016). There are also several pages about discrete-time survival models on this site, for example here and here.
A standard discrete-time survival model, as desecribed by Tutz and Schmid, is a binomial regression that evaluates the discrete-time hazard, the probability of having an event during one time interval given that there was survival until the start of that time interval, as a function of covariate values. It can be thought of as a sequence of binomial regressions over the time intervals, each only evaluating individuals that are still at risk for an event during the time interval. That ultimately provides a cumulative event probability over time, often written as $F(t)$, as a function of covariate values. The survival function is then simply the complement of the cumulative event probability, $S(t)=1-F(t)$.
Focus on hazards starting from time 0 simplifies two common situations in survival analysis: those who have events and those who are lost to follow-up and have right-censored event times. Those individuals are simply omitted from the analysis of time intervals during which they were no longer at risk.
MTLR
MTLR attempts to model the survival function $S(t)=1-F(t)$ directly instead of evaluating the hazard for each interval to first get $F(t)$ and then $S(t)$. It's not clear what advantage that provides, and it leads to several difficulties.
The authors note the following problem arising from their focus on $S(t)$ instead of on hazards, and the way they chose to deal with it:
a death event at or before time $t_i$ implies death at all subsequent time points $t_j$ for all $j > i$. MTLR enforces the dependency of the outputs by predicting the survival status of a patient at each of the time snapshots $t_i$ jointly instead of independently.
To try to estimate the entire survival function, MTLR thus evaluates the set of all possible sequences of alive/dead 0/1 indicators for an individual, encoded in the outcome vector $\vec y$. As events are terminal, all its elements for an individual equal 1 at and after the time of the event, with 0 values prior to that. If there are $m$ time intervals, then there are $m+1$ possible $\vec y$ sequences (one with all 0 values, and $m$ for events at each of the evaluation times).
The authors parameterize a score for each of those $m+1$ sequences, given by the numerator of the formula for $P(\vec y|\vec x)$. Consider the values of the numerator for the 3 possible sequences with $m=2$:
(0,0): $\exp(0)=1$;
(0,1): $\exp(\vec\theta_2\cdot x+b_2)$;
(1,1): $\exp(\vec\theta_2\cdot x+b_2 + \vec\theta_1\cdot x+b_1)$.
The denominator of the formula for $P(\vec y|\vec x)$ with $m=2$ is just the sum of those 3 scores. As a result, you can think of MTLR as parameterizing the probability of each of those sequences in this way.
The interpretations of the coefficients don't really matter; I'm not sure whether or how they relate to standard logistic regression coefficients. The authors call MTLR a "generalization of the logistic regression model." The solution will find parameter values that maximize the likelihood of the data under this parameterization (subject to the penalization constraints for smoothing described in the paper), whether or not the parameterization makes any sense.
Potential problems with MTLR
First, it includes an individual with an early event in the calculations for the coefficients associated with all later events. That somehow seems wrong (although it might be similar to how Fine-Gray models handle competing risks after the first type of event).
Second, MTLR needs to go through an additional expectation maximization or gradient descent step to deal with individuals having right-censored event times. Those individuals are much more readily handled in standard discrete-time survival analysis: just ignore them after they no longer have data to provide.
Third, although the magnitudes and time courses of the regression coefficient vectors $\vec \theta_i$ are smoothed by penalization, I don't see that the "thresholds" $b_i$ are smoothed/penalized at all. I suspect that can lead to overfitting.
Fourth, if you allow for time-varying covariate values, what do you choose for the values of an individual after death?
Is MTLR an advance?
I think that the claims made by the authors for the advantages of MTLR are overstated. With time-varying coefficients, MTLR can handle strange shapes of survival curves, survival curves that cross in time, etc. They only compared it, however, against methods that by design cannot do that: standard Cox proportional hazards and Aalen additive hazards models. Standard survival models with time-varying coefficients, as Tutz and Schmid describe for standard discrete-time models in Section 5.3, "Time-Varying Coefficients," or Cox models with time-varying coefficients, can also handle a wider variety of survival curve shapes.
Another alleged advantage of MTLR, providing individual-specific survival curves, can also be done by other survival analysis methods once the covariate values are specified and the baseline hazard that the covariates' regression coefficients alter has been calculated. Several standard methods accommodate time-varying covariate values; it's not clear how MTLR does that properly for an individual with an early event in time.
Peer review questions
First, I don't see that the paper has been seriously peer reviewed by experts in survival analysis. It was included in peer-reviewed conference proceedings, Advances in Neural Information Processing Systems 24 (NIPS 2011), Edited by: J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira and K.Q. Weinberger; ISBN: 9781618395993. Given the conference's focus on machine learning, I wonder whether peer review for this paper at NIPS in 2011 would have included a reviewer with substantive expertise in survival analysis per se.
Second, the organizers of the NIPS conference bravely did a study involving re-review of 10% of the papers submitted for the 2014 meeting. See Wikipedia, and this page that notes:
between half and two-thirds of papers accepted at NIPS would have been rejected if reviewed a second time.
That's certainly not a problem specific to the NIPS meeting, but it does raise questions about how thoroughly MTLR was vetted in 2011. I have been unable to find any subsequent peer-reviewed publications documenting the MTLR method itself, and all of the few dozen references to it that I could find were to the 2011 NIPS proceedings.