Revisions to What is the parametrization of 'survreg' in the 'survival' R package?

added 389 characters in body

edited Jul 16, 2020 at 19:26

2.2k
11
15

Calculating the hazard The general expression for the hazard function at time $x$ is $h(x) = f_T(x)/\Pr(T > x)$, where $f_T(x)$ is the pdf of $T$ at $x$. When $T = \exp\{\mu + \alpha^\top z +\sigma W\}$, then itsT's distribution is determined by the distribution of $W$. WhenAnd when $W$ is the extreme value distribution, then $T$ given $z$ is Weibull, and the hazard function is as given above.

The form of the hazard will be different when $W$ is differently distributed.

For example, when $W$ is standard normal, then $T$ given $z$ is log-normal. So, $f_T(x) = \frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)$ and $\Pr(T > x) = 1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)$, and $$h(x|z) = \dfrac{\frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)}{1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)}$$

For other distributions of $W$, the form of the hazard will be different yet. As you may know, the choice of Weibull $T$ (equivalently, Extreme Value $W$) is the only choice that is both an AFT model as well as a proportional hazards model.

I'm not aware of functionality in R to automatically calculate and extract the hazard curves for all observations. So you would likely need to write up a function yourself.

Calculating the hazard The general expression for the hazard function at time $x$ is $h(x) = f_T(x)/\Pr(T > x)$. When $T = \exp\{\mu + \alpha^\top z +\sigma W\}$, then its distribution is determined by the distribution of $W$. When $W$ is the extreme value distribution, then $T$ given $z$ is Weibull, and hazard function is as given above. The form of the hazard will be different when $W$ is differently distributed.

For example, when $W$ is standard normal, then $T$ given $z$ is log-normal. So, $f_T(x) = \frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)$ and $\Pr(T > x) = 1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)$, and $$h(x|z) = \dfrac{\frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)}{1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)}$$

For other distributions of $W$, the form of the hazard will be different yet.

Calculating the hazard The general expression for the hazard function at time $x$ is $h(x) = f_T(x)/\Pr(T > x)$, where $f_T(x)$ is the pdf of $T$ at $x$. When $T = \exp\{\mu + \alpha^\top z +\sigma W\}$, then T's distribution is determined by the distribution of $W$. And when $W$ is the extreme value distribution, then $T$ given $z$ is Weibull, and the hazard function is as given above.

The form of the hazard will be different when $W$ is differently distributed. For example, when $W$ is standard normal, then $T$ given $z$ is log-normal. So, $f_T(x) = \frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)$ and $\Pr(T > x) = 1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)$, and $$h(x|z) = \dfrac{\frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)}{1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)}$$

For other distributions of $W$, the form of the hazard will be different yet. As you may know, the choice of Weibull $T$ (equivalently, Extreme Value $W$) is the only choice that is both an AFT model as well as a proportional hazards model.

I'm not aware of functionality in R to automatically calculate and extract the hazard curves for all observations. So you would likely need to write up a function yourself.

added section on calculating the hazard

Source Link

edited Jul 16, 2020 at 19:15

psboonstra

2.2k
11
15

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from the first page:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model as parametrized in the survreg function, larger values of $\alpha^\top z$ correspond to an increase in expected survival time (longer survival), whereas in the Cox model as parametrized in coxph, larger values of $\beta^\top z$ correspond to an increase in the hazard (shorter survival), and when the AFT error follows the Weibull distribution, they are related by $\beta^\top z = -(\alpha^\top z)/ \sigma$

To confirm directly that the AFT model in R uses $\alpha^\top z$, compare the linear predictors from a fitted AFT model using predict.survreg to the linear predictors calculated 'by hand':

library(survival); library(tidyverse); # fit aft to lung data using defaults lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung) all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients)) # TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale # age sex factor(ph.ecog)1 # 0.009938741 -0.541893431 0.397749179 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.906020113 1.857645211 coef(lung_model_cox) # age sex factor(ph.ecog)1 # 0.01079468 -0.54583052 0.41004801 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.90330301 1.95454338 ```

Calculating the hazard The general expression for the hazard function at time $x$ is $h(x) = f_T(x)/\Pr(T > x)$. When $T = \exp\{\mu + \alpha^\top z +\sigma W\}$, then its distribution is determined by the distribution of $W$. When $W$ is the extreme value distribution, then $T$ given $z$ is Weibull, and hazard function is as given above. The form of the hazard will be different when $W$ is differently distributed.

For example, when $W$ is standard normal, then $T$ given $z$ is log-normal. So, $f_T(x) = \frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)$ and $\Pr(T > x) = 1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)$, and $$h(x|z) = \dfrac{\frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)}{1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)}$$

For other distributions of $W$, the form of the hazard will be different yet.

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from the first page:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model as parametrized in the survreg function, larger values of $\alpha^\top z$ correspond to an increase in expected survival time (longer survival), whereas in the Cox model as parametrized in coxph, larger values of $\beta^\top z$ correspond to an increase in the hazard (shorter survival), and when the AFT error follows the Weibull distribution, they are related by $\beta^\top z = -(\alpha^\top z)/ \sigma$

To confirm directly that the AFT model in R uses $\alpha^\top z$, compare the linear predictors from a fitted AFT model using predict.survreg to the linear predictors calculated 'by hand':

library(survival); library(tidyverse); # fit aft to lung data using defaults lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung) all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients)) # TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale # age sex factor(ph.ecog)1 # 0.009938741 -0.541893431 0.397749179 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.906020113 1.857645211 coef(lung_model_cox) # age sex factor(ph.ecog)1 # 0.01079468 -0.54583052 0.41004801 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.90330301 1.95454338 ```

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from the first page:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model as parametrized in the survreg function, larger values of $\alpha^\top z$ correspond to an increase in expected survival time (longer survival), whereas in the Cox model as parametrized in coxph, larger values of $\beta^\top z$ correspond to an increase in the hazard (shorter survival), and when the AFT error follows the Weibull distribution, they are related by $\beta^\top z = -(\alpha^\top z)/ \sigma$

To confirm directly that the AFT model in R uses $\alpha^\top z$, compare the linear predictors from a fitted AFT model using predict.survreg to the linear predictors calculated 'by hand':

library(survival); library(tidyverse); # fit aft to lung data using defaults lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung) all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients)) # TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale # age sex factor(ph.ecog)1 # 0.009938741 -0.541893431 0.397749179 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.906020113 1.857645211 coef(lung_model_cox) # age sex factor(ph.ecog)1 # 0.01079468 -0.54583052 0.41004801 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.90330301 1.95454338

Calculating the hazard The general expression for the hazard function at time $x$ is $h(x) = f_T(x)/\Pr(T > x)$. When $T = \exp\{\mu + \alpha^\top z +\sigma W\}$, then its distribution is determined by the distribution of $W$. When $W$ is the extreme value distribution, then $T$ given $z$ is Weibull, and hazard function is as given above. The form of the hazard will be different when $W$ is differently distributed.

For example, when $W$ is standard normal, then $T$ given $z$ is log-normal. So, $f_T(x) = \frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)$ and $\Pr(T > x) = 1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)$, and $$h(x|z) = \dfrac{\frac 1 {x\sigma\sqrt{2\pi}}\ \exp\left(-\frac{\left(\ln x-\mu -\alpha^\top z\right)^2}{2\sigma^2}\right)}{1 - \Phi\left( \frac{\ln x - \mu - \alpha^\top z} \sigma \right)}$$

For other distributions of $W$, the form of the hazard will be different yet.

cleaned up language regarding AFT assumptions; need the tidyverse

Source Link

edited Jul 16, 2020 at 16:45

psboonstra

2.2k
11
15

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from therethe first page:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model as parametrized in the survreg function, larger values of a coefficient$\alpha^\top z$ correspond to an increase in expected survival time (longer survival), whereas in the Cox model as parametrized in coxph, larger values of a coefficient$\beta^\top z$ correspond to an increase in the hazard (shorter survival), and when the AFT error follows the Weibull distribution, they are related by the $\beta = -\alpha/ \sigma$$\beta^\top z = -(\alpha^\top z)/ \sigma$

We can RequestTo confirm directly that the AFT model in R uses $\alpha^\top z$, compare the linear predictors from a fitted AFT model using predict.survreg and compare to the linear predictors calculated 'by hand':

library(survival); library(tidyverse); # fit aft to lung data using defaults lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung) all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients)) # TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale # age sex factor(ph.ecog)1 # 0.009938741 -0.541893431 0.397749179 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.906020113 1.857645211 coef(lung_model_cox) # age sex factor(ph.ecog)1 # 0.01079468 -0.54583052 0.41004801 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.90330301 1.95454338 ```

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from there:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model, larger values of a coefficient correspond to an increase in expected survival time (longer survival), whereas in the Cox model, larger values of a coefficient correspond to an increase in the hazard (shorter survival), and they are related by the $\beta = -\alpha/ \sigma$

We can Request the linear predictors using predict.survreg and compare to the linear predictors calculated 'by hand':

library(survival) # fit aft to lung data using defaults lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung) all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients)) # TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale # age sex factor(ph.ecog)1 # 0.009938741 -0.541893431 0.397749179 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.906020113 1.857645211 coef(lung_model_cox) # age sex factor(ph.ecog)1 # 0.01079468 -0.54583052 0.41004801 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.90330301 1.95454338 ```

This might be helpful: https://cran.r-project.org/web/packages/SurvRegCensCov/vignettes/weibull.pdf

Quoting from the first page:

Weibull accelerated failure time regression can be performed in R using the survreg function. The results are not, however, presented in a form in which the Weibull distribution is usually given. Accelerated failure time models are usually given by $\log T=Y=\mu+\alpha^\top z +\sigma W$ ,where $z$ are set of covariates, and $W$ has the extreme value distribution. Given transformations $\gamma = 1/\sigma$, $\lambda =\exp(−\mu/\sigma)$, $\beta =−\alpha/\sigma$, we have a Weibull model with baseline hazard of $h(x|z) = (\gamma \lambda t^{\gamma−1}) exp(\beta^\top z)$.

So, in the AFT model as parametrized in the survreg function, larger values of $\alpha^\top z$ correspond to an increase in expected survival time (longer survival), whereas in the Cox model as parametrized in coxph, larger values of $\beta^\top z$ correspond to an increase in the hazard (shorter survival), and when the AFT error follows the Weibull distribution, they are related by $\beta^\top z = -(\alpha^\top z)/ \sigma$

To confirm directly that the AFT model in R uses $\alpha^\top z$, compare the linear predictors from a fitted AFT model using predict.survreg to the linear predictors calculated 'by hand':

library(survival); library(tidyverse); # fit aft to lung data using defaults lung_model_aft <- survreg(Surv(time, status) ~ age + sex + factor(ph.ecog), lung) all(near(predict(lung_model_aft, type = 'lp'), model.matrix(lung_model_aft) %*% lung_model$coefficients)) # TRUE

We can also check for similarity to the fitted Cox model, but we should only expect them to be similar, not identical, since the Cox model estimates the baseline hazard nonparametrically but the AFT model estimates it parametrically:

- coef(lung_model_aft)[-1] / lung_model_aft$scale # age sex factor(ph.ecog)1 # 0.009938741 -0.541893431 0.397749179 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.906020113 1.857645211 coef(lung_model_cox) # age sex factor(ph.ecog)1 # 0.01079468 -0.54583052 0.41004801 #factor(ph.ecog)2 factor(ph.ecog)3 # 0.90330301 1.95454338 ```

Source Link

answered Jul 16, 2020 at 16:39

psboonstra

2.2k
11
15

Loading

Stack Exchange Network

Return to Answer