7 Survival outcomes
- Perform causal mediation analysis with survival outcomes using R software
7.1 Time-to-event outcomes
There are many studies conducting mediation analyses with time-to-event outcomes. Survival analysis allows investigators to study these important outcomes with appropriate consideration for variable follow-up times, censoring, and competing risks.
Take the red meat example for instance, we previously demonstrated higher red meat intake increased blood glucose. However, we might be even more interested in whether red meat intake increase the onset of type 2 diabetes (T2D). If we have the diagnosis date or duration of T2D, we can conduct survival analyses on red meat and T2D onset.
Another example is the effect of risk factor on patient survival time. A study investigated the effect of socioeconomic status (SES) on the survival time of cancer patients and how much cancer stage mediated the effect. They found the effect of SES on survival time was partially mediated by stage diagnosis, explaining 12% for lung cancer.
The Cox proportional hazards model is commonly used for dealing with survival data in medical literature. Cox regression estimates the hazard ratios and the values are then used to determine the effect of the mediator variable between the exposure and the survival time of outcome.
7.2 Cox model for common outcomes
Could we use the traditional approach for time-to-event outcomes? We have introduced the difference and product methods for continuous and binary outcomes in previous session. It is tempting to run a linear regression model for the mediator and proportional hazard model for the outcome, then use product or difference method to estimate the direct effect and indirect effect.
However, ‘non-collaspsibility’ is a problem of the hazard ratio as odds ratio (1). Therefore, use of Cox regression to approximately estimate indirect effects via difference or product of coefficients rests on the assumption that the outcome is rare (1).
Where the outcome is common, measures of the indirect effect or proportion mediated will be incorrect. Tein and Mackinnon considered whether the product method and difference method yield comparable results with respect to time-to-event outcomes (2). They found that the methods coincides for the accelerated failure time model but not for the proportional hazards model.
When the outcome is common, we can use weight approach (Lange 2012).
To sum up, we can only use the traditional approaches for rare outcomes. Otherwise, we can use the product method to get an indication of whether there is mediation, but be aware that the estimate is not accurate.
7.2.1 Causal mediation for time-to-event outcomes
For a survival outcome, the outcome of interest will be survival time (SV).
- \(SV (t) = P(V ≥ t)\) the survival function at time t
- \(SV (t\|c)=P(V ≥ t\|c)\) the survival function conditional on covariates C
- \(λV (t)\) : the hazard at time t
- \(λV (t\|c)\): conditional hazard at time t
If we consider the survival functions for a time-to-event outcome T, we could decompose the survival function as follows:
\(ST_a(t) - ST_a*(t) = [ST_aM_a(t)-ST_aM_a*(t)] + [ST_aM_a*(t)-ST_a*M_a*(t)]\)
The first expression in brackets is the natural indirect effect on the survival function scale and the second is the natural direct effect on the survival function scale.
Similarly, we can demcompose the overall difference in hazards on the hazard scale:
\(λT_a(t) - λT_a*(t) = [λT_aM_a(t)-λT_aM_a*(t)] + [λT_aM_a*(t)-λT_a*M_a*(t)]\)
7.2.2 Assumptions of mediation analysis with a time-to-event outcome
Similar as our context, mediation analysis with a time-to-event outcome have to satisfy below assumptions:
- no unmeasured confounding of the exposure-outcome relation;
- no unmeasured confounding of the mediator-outcome relation;
- no unmeasured confounding of the exposure-mediation relation;
- no exposure induced mediation-outcome confounding
- Additionally, we assume that the mediator is measured for everyone before the outcome occurs.
7.3 Example
We will continue working on the obesity-CVD example in the Framingham dataset.
The outcome of interest is death from cardiovascular diseases (CVD). The underlying time scale is in days, starting at participants entered the cohort. The exposure of interest is obesity status at baseline, where a=1 indicates obese, a=0 indicates non-obese. The mediator is blood pressure, the counterfactual mediator was M(a), where M(a) is the mediator when the exposure is obesity (1) and M(0) is the mediator values when the exposure is not-obese.
The question of interest here is whether the blood pressure mediates the impact of obesity on CVD-related death (measured in years).
library(tidyverse)
framingham <- read_csv(here::here("data/framingham_dataset.csv"))
framingham <- framingham %>%
select(
id = randid,
bmi,
m = sysbp,
y = cvd,
w1 = sex,
w2 = cursmoke,
w3 = age,
y_time = timecvd
) %>%
na.omit() %>%
mutate(
a = case_when(
bmi >= 25 ~ 1,
TRUE ~ 0
),
y_time = y_time / 365.25 #change time-scale to years
)
The new thing here is that we need to calculate the time to event. This is already done in the dataset.
We need the outcome to be the time to event and specify event as the outcome variable. Then we use the package as before.
library(CMAverse)
library(survival)
res_rb_coxph <- cmest(
data = framingham, model = "rb", outcome = "y_time", event = "y", exposure = "a",
mediator = "m", basec = c("w1", "w2", "w3"), EMint = TRUE,
mreg = list("linear"), yreg = "coxph",
astar = 0, a = 1, mval = list(120),
estimation = "imputation", inference = "bootstrap"
)
#>
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========= | 14%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================= | 34%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|============================== | 44%
|
|=============================== | 44%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|===================================== | 54%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 56%
|
|======================================== | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|========================================================== | 84%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================= | 94%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
#> Causal Mediation Analysis
#>
#> # Outcome regression:
#> Call:
#> survival::coxph(formula = Surv(y_time, y) ~ a + m + a * m + w1 +
#> w2 + w3, data = getCall(x$reg.output$yreg)$data, weights = getCall(x$reg.output$yreg)$weights)
#>
#> n= 11575, number of events= 2879
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> a 0.561695 1.753642 0.222973 2.519 0.0118 *
#> m 0.018012 1.018175 0.001243 14.491 < 2e-16 ***
#> w1 -0.785559 0.455865 0.039143 -20.069 < 2e-16 ***
#> w2 0.323100 1.381404 0.039835 8.111 5.02e-16 ***
#> w3 0.041113 1.041970 0.002182 18.843 < 2e-16 ***
#> a:m -0.002688 0.997316 0.001526 -1.761 0.0782 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> a 1.7536 0.5702 1.1328 2.7148
#> m 1.0182 0.9821 1.0157 1.0207
#> w1 0.4559 2.1936 0.4222 0.4922
#> w2 1.3814 0.7239 1.2777 1.4936
#> w3 1.0420 0.9597 1.0375 1.0464
#> a:m 0.9973 1.0027 0.9943 1.0003
#>
#> Concordance= 0.707 (se = 0.005 )
#> Likelihood ratio test= 1532 on 6 df, p=<2e-16
#> Wald test = 1503 on 6 df, p=<2e-16
#> Score (logrank) test = 1563 on 6 df, p=<2e-16
#>
#>
#> # Mediator regressions:
#>
#> Call:
#> glm(formula = m ~ a + w1 + w2 + w3, family = gaussian(), data = getCall(x$reg.output$mreg[[1L]])$data,
#> weights = getCall(x$reg.output$mreg[[1L]])$weights)
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 78.58792 1.40219 56.047 < 2e-16 ***
#> a 8.62088 0.39394 21.884 < 2e-16 ***
#> w1 2.97594 0.39564 7.522 5.8e-14 ***
#> w2 -0.51009 0.40731 -1.252 0.21
#> w3 0.88577 0.02068 42.840 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for gaussian family taken to be 421.4903)
#>
#> Null deviance: 6003812 on 11574 degrees of freedom
#> Residual deviance: 4876643 on 11570 degrees of freedom
#> AIC: 102812
#>
#> Number of Fisher Scoring iterations: 2
#>
#>
#> # Effect decomposition on the hazard ratio scale via the regression-based approach
#>
#> Direct counterfactual imputation estimation with
#> bootstrap standard errors, percentile confidence intervals and p-values
#>
#> Estimate Std.error 95% CIL 95% CIU P.val
#> Rcde 1.270225 0.073876 1.122775 1.399 <2e-16 ***
#> Rpnde 1.196575 0.048878 1.093248 1.289 <2e-16 ***
#> Rtnde 1.169171 0.049772 1.058512 1.264 <2e-16 ***
#> Rpnie 1.167981 0.016451 1.139200 1.201 <2e-16 ***
#> Rtnie 1.141232 0.012236 1.119934 1.165 <2e-16 ***
#> Rte 1.365570 0.056912 1.254930 1.465 <2e-16 ***
#> ERcde 0.193799 0.049864 0.094114 0.278 <2e-16 ***
#> ERintref 0.002776 0.031676 -0.061817 0.066 0.87
#> ERintmed 0.001013 0.019462 -0.039025 0.039 0.87
#> ERpnie 0.167981 0.016451 0.139200 0.201 <2e-16 ***
#> ERcde(prop) 0.530129 0.105676 0.296793 0.719 <2e-16 ***
#> ERintref(prop) 0.007594 0.090467 -0.206858 0.176 0.87
#> ERintmed(prop) 0.002772 0.055793 -0.131490 0.105 0.87
#> ERpnie(prop) 0.459505 0.089013 0.337848 0.728 <2e-16 ***
#> pm 0.462277 0.064126 0.374533 0.615 <2e-16 ***
#> int 0.010366 0.146214 -0.338348 0.283 0.87
#> pe 0.469871 0.105676 0.281092 0.703 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Rcde: controlled direct effect hazard ratio; Rpnde: pure natural direct effect hazard ratio; Rtnde: total natural direct effect hazard ratio; Rpnie: pure natural indirect effect hazard ratio; Rtnie: total natural indirect effect hazard ratio; Rte: total effect hazard ratio; ERcde: excess relative hazard due to controlled direct effect; ERintref: excess relative hazard due to reference interaction; ERintmed: excess relative hazard due to mediated interaction; ERpnie: excess relative hazard due to pure natural indirect effect; ERcde(prop): proportion ERcde; ERintref(prop): proportion ERintref; ERintmed(prop): proportion ERintmed; ERpnie(prop): proportion ERpnie; pm: overall proportion mediated; int: overall proportion attributable to interaction; pe: overall proportion eliminated)
#>
#> Relevant variable values:
#> $a
#> [1] 1
#>
#> $astar
#> [1] 0
#>
#> $mval
#> $mval[[1]]
#> [1] 120