11  Types of path-specific causal mediation effects

11.1 Controlled direct effects

ψCDE=E(Y1,mY0,m)

  • Set the mediator to a reference value M=m uniformly for everyone in the population
  • Compare A=1 vs A=0 with M=m fixed

11.1.1 Identification assumptions:

  • Confounder assumptions:
    • AYa,mW
    • MYa,mW,A
  • Positivity assumptions:
    • P(M=mA=a,W)>0 a.e.
    • P(A=aW)>0 a.e.

Under the above identification assumptions, the controlled direct effect can be identified:

E(Y1,mY0,m)=E{E(YA=1,M=m,W)E(YA=0,M=m,W)}

  • For intuition about this formula in R, let’s continue with a toy example:

    n <- 1e6
    w <- rnorm(n)
    a <- rbinom(n, 1, 0.5)
    m <- rnorm(n, w + a)
    y <- rnorm(n, w + a + m)
  • First we fit a correct model for the outcome

    lm_y <- lm(y ~ m + a + w)
  • Assume we would like the CDE at m=0

  • Then we generate predictions

E(YA=1,M=m,W) and E(YA=0,M=m,W) :

pred_y1 <- predict(lm_y, newdata = data.frame(a = 1, m = 0, w = w))
pred_y0 <- predict(lm_y, newdata = data.frame(a = 0, m = 0, w = w))
  • Then we compute the difference between the predicted values E(YA=1,M=m,W)E(YA=0,M=m,W), and average across values of W

    ## CDE at m = 0
    mean(pred_y1 - pred_y0)
    #> [1] 1.003848

11.1.2 Is this the estimand I want?

  • Makes the most sense if can intervene directly on M
    • And can think of a policy that would set everyone to a constant level mM.
    • Judea Pearl calls this prescriptive.
    • Can you think of an example? (Air pollution, rescue inhaler dosage, hospital visits…)
    • Does not provide a decomposition of the average treatment effect into direct and indirect effects.

What if our research question doesn’t involve intervening directly on the mediator?

What if we want to decompose the average treatment effect into its direct and indirect counterparts?

11.2 Natural direct and indirect effects

Still using the same DAG as above,

  • Recall the definition of the nested counterfactual:

Y1,M0=fY(W,1,M0,UY)

  • Interpreted as the outcome for an individual in a hypothetical world where treatment was given but the mediator was held at the value it would have taken under no treatment

  • Recall that, because of the definition of counterfactuals

Y1,M1=Y1

Then we can decompose the average treatment effect E(Y1Y0) as follows

E[Y1,M1Y0,M0]=E[Y1,M1Y1,M0]natural indirect effect+E[Y1,M0Y0,M0]natural direct effect
  • Natural direct effect (NDE): Varying treatment while keeping the mediator fixed at the value it would have taken under no treatment
  • Natural indirect effect (NIE): Varying the mediator from the value it would have taken under treatment to the value it would have taken under control, while keeping treatment fixed

11.2.1 Identification assumptions:

  • AYa,mW
  • MYa,mW,A
  • AMaW
  • M0Y1,mW
  • and positivity assumptions

11.2.2 Cross-world independence assumption

What does M0Y1,mW mean?

  • Conditional on W, knowledge of the mediator value in the absence of treatment, M0, provides no information about the outcome under treatment, Y1,m.
  • Can you think of a data-generating mechanism that would violate this assumption?
  • Example: in a randomized study, whenever we believe that treatment assignment works through adherence (i.e., almost always), we are violating this assumption (more on this later).
  • Cross-world assumptions are problematic for other reasons, including:
    • You can never design a randomized study where the assumption holds by design.

If the cross-world assumption holds, can write the NDE as a weighted average of controlled direct effects at each level of M=m.

Em{E(Y1,mW)E(Y0,mW)}P(M0=mW)

  • If CDE(m) is constant across m, then CDE = NDE.

11.2.3 Identification formula:

  • Under the above identification assumptions, the natural direct effect can be identified:

E(Y1,M0Y0,M0)=E[E{E(YA=1,M,W)E(YA=0,M,W)A=0,W}]

  • The natural indirect effect can be identified similarly.

  • Let’s dissect this formula in R:

    n <- 1e6
    w <- rnorm(n)
    a <- rbinom(n, 1, 0.5)
    m <- rnorm(n, w + a)
    y <- rnorm(n, w + a + m)
  • First we fit a correct model for the outcome

    lm_y <- lm(y ~ m + a + w)
  • Then we generate predictions E(YA=1,M,W) and E(YA=0,M,W) with A fixed but letting M and W take their observed values

    pred_y1 <- predict(lm_y, newdata = data.frame(a = 1, m = m, w = w))
    pred_y0 <- predict(lm_y, newdata = data.frame(a = 0, m = m, w = w))
  • Then we compute the difference between the predicted values E(YA=1,M,W)E(YA=0,M,W),

  • and use this difference as a pseudo-outcome in a regression on A and W: E{E(YA=1,M,W)E(YA=0,M,W)A=0,W}

    pseudo <- pred_y1 - pred_y0
    lm_pseudo <- lm(pseudo ~ a + w)
  • Now we predict the value of this pseudo-outcome under A=0, and average the result

    pred_pseudo <- predict(lm_pseudo, newdata = data.frame(a = 0, w = w))
    ## NDE:
    mean(pred_pseudo)
    #> [1] 0.9943349

11.2.4 Is this the estimand I want?

  • Makes sense to intervene on A but not directly on M.
  • Want to understand a natural mechanism underlying an association / total effect. J. Pearl calls this descriptive.
  • NDE + NIE = total effect (ATE).
  • Okay with the assumptions.

What if our data structure involves a post-treatment confounder of the mediator-outcome relationship (e.g., adherence)?

11.2.5 Unidentifiability of the NDE and NIE in this setting

  • In this example, natural direct and indirect effects are not generally point identified from observed data O=(W,A,Z,M,Y).

  • The reason for this is that the cross-world counterfactual assumption Y1,mM0W does not hold in the above directed acyclic graph.

  • To give intuition, we focus on the counterfactual outcome YA=1,MA=0.

    • This counterfactual outcome involves two counterfactual worlds simultaneously: one in which A=1 for the first portion of the counterfactual outcome, and one in which A=0 for the nested portion of the counterfactual outcome.
    • Setting A=1 induces a counterfactual treatment-induced confounder, denoted ZA=1. Setting A=0 induces another counterfactual treatment-induced confounder, denoted ZA=0.
    • The two treatment-induced counterfactual confounders, ZA=1 and ZA=0 share unmeasured common causes, UZ, which creates a spurious association.
    • Because ZA=1 is causally related to YA=1,M=m, and ZA=0 is also casually related to MA=0, the path through UZ means that the backdoor criterion is not met for identification of YA=1,MA=0, i.e., M0YA=1,mW, where W denotes baseline covariates.

However:

  • We can actually actually identify the NIE/NDE in the above setting if we are willing to invoke monotonicity between a treatment and one or more binary treatment-induced confounders ().
  • Assuming monotonicity is also sometimes referred to as assuming “no defiers” – in other words, assuming that there are no individuals who would do the opposite of the encouragement.
  • Monotonicity may seem like a restrictive assumption, but may be reasonable in some common scenarios (e.g., in trials where the intervention is randomized treatment assignment and the treatment-induced confounder is whether or not treatment was actually taken – in this setting, we may feel comfortable assuming that there are no “defiers”, frequently assumed when using IVs to identify causal effects)

Note: CDEs are still identified in this setting. They can be identified and estimated similarly to a longitudinal data sructure with a two-time-point intervention.

11.3 Interventional (in)direct effects

  • Let Ga denote a random draw from the distribution of MaW
  • Define the counterfactual Y1,G0 as the counterfactual variable in a hypothetical world where A is set A=1 and M is set to M=G0 with probability one.

  • Define Y0,G0 and Y1,G1 similarly

  • Then we can define:

E[Y1,G1Y0,G0]=E[Y1,G1Y1,G0]interventional indirect effect+E[Y1,G0Y0,G0]interventional direct effect

  • Note that E[Y1,G1Y0,G0] is still a total effect of treatment, even if it is different from the ATE E[Y1Y0] - We gain in the ability to solve a problem, but lose in terms of interpretation of the causal effect (cannot decompose the ATE)
An alternative definition of the effects:
  • Above we defined Ga as a random draw from the distribution of MaW
  • What if instead we define Ga as a random draw from the distribution of Ma(Za,W)
  • It turns out the indirect effect defined in this way only measures the path AMY, and not the path AZMY
  • There may be important reasons to choose one over another (e.g., survival analyses where we want the distribution conditional on Z, instrumental variable designs where it doesn’t make sense to condition on Z)

11.3.1 Identification assumptions:

  • AYa,mW
  • MYa,mW,A,Z
  • AMaW
  • and positivity assumptions.

Under these assumptions, the population interventional direct and indirect effect is identified:

\E &(Y_{a, G_{a'}}) = \\ & \E\left[\color{Purple}{\E\left\{\color{Goldenrod}{\sum_z} \color{ForestGreen}{\E(Y \mid A=a, Z=z, M, W)} \color{Purple}{\P(Z=z \mid A=a, W)}\mid A=a', W\right\}} \right]

  • Let’s dissect this formula in R:

    n <- 1e6
    w <- rnorm(n)
    a <- rbinom(n, 1, 0.5)
    z <- rbinom(n, 1, 0.5 + 0.2 * a)
    m <- rnorm(n, w + a - z)
    y <- rnorm(n, w + a + z + m)
  • Let us compute E(Y1,G0) (so that a=1, and a=0).

  • First, fit a regression model for the outcome, and compute E(YA=a,Z=z,M,W) for all values of z

    lm_y <- lm(y ~ m + a + z + w)
    pred_a1z0 <- predict(lm_y, newdata = data.frame(m = m, a = 1, z = 0, w = w))
    pred_a1z1 <- predict(lm_y, newdata = data.frame(m = m, a = 1, z = 1, w = w))
  • Now we fit the true model for ZA,W and get the conditional probability that Z=1 fixing A=1

    prob_z <- lm(z ~ a)
    pred_z <- predict(prob_z, newdata = data.frame(a = 1))
  • Now we compute the following pseudo-outcome: zE(YA=a,Z=z,M,W)P(Z=zA=a,w)

    pseudo_out <- pred_a1z0 * (1 - pred_z) + pred_a1z1 * pred_z
  • Now we regress this pseudo-outcome on A,W, and compute the predictions setting A=0, that is, []

    fit_pseudo <- lm(pseudo_out ~ a + w)
    pred_pseudo <- predict(fit_pseudo, data.frame(a = 0, w = w))
  • And finally, just average those predictions!

    ## Mean(Y(1, G(0)))
    mean(pred_pseudo)
    #> [1] 1.197622
  • This was for (a,a)=(1,0). Can do the same with (a,a)=(1,1), and (a,a)=(0,0) to obtain an effect decomposition

    E[Y1,G1Y0,G0]=E[Y1,G1Y1,G0]interventional indirect effect+E[Y1,G0Y0,G0]interventional direct effect

11.3.2 Is this the estimand I want?

  • Makes sense to intervene on A but not directly on M.
  • Goal is to understand a descriptive type of mediation.
  • Okay with the assumptions!

11.3.3 But, there is an important limitation of interventional effects

() recently uncovered an important limitation of these effects, which can be described as follows. The sharp mediational hull hypothesis can be defined as

H0:Y(a,M(a))=Y(a,M(a)); for all a,a,a .

The problem is that interventional effects are not guaranteed to be null when the sharp mediational hypothesis is true.

This could present a problem in practice if some subgroup of the population has a relationship between A and M, but not between M and Y. Then, another distinct subgroup of the population has a relationship between M and Y but not between A and M. In such a scenario, the interventional indirect effect would be nonzero, but there would be no one person in the population whose effect of A on Y would be mediated by M.

More details in the original paper.

11.4 Estimand Summary