17 Time-varying treatments, mediators, and covariates

We’ll now turn to thinking about how to define and estimate effects for cases where treatment, mediators, and covariates are time-varying.

Before we delve into the specific, here are a couple of papers on this topic that are interesting/useful:

Mediation analysis with time varying exposures and mediators by Tyler J. VanderWeele and Eric J. Tchetgen Tchetgen
Longitudinal Mediation Analysis with Time-varying Mediators and Exposures, with Application to Survival Outcomes by Wenjing Zheng and Mark J. van der Laan
Efficient and flexible causal mediation with time-varying mediators, treatments, and confounders by Iván Díaz, Nicholas Williams, and Kara E. Rudolph
Identification and estimation of mediational effects of longitudinal modified treatment policies by Brian Gilbert, Katherine L. Hoffman, Nicholas Williams, Kara E. Rudolph, Edward J. Schenck, Iván Díaz.

In this chapter, we will be using the lcm R package, which supports binary time-varying treatments and categorical mediators, and the lcmmtp R package package, which supports continuous, multivariate, and binary time-varying treatments and categorical mediators.

17.1 Illustrative example

As an illustrative example, we will use a dataset from an observational study looking at the effect of invasive mechanical ventilation (IMV) on the survival of COVID-19 patients, considering acute kidney injury (AKI) as a mediating factor.

Briefly, IMV is a treatment for acute respiratory distress syndrome (ARDS). While IMV is a potentially life-saving therapy in patients with ARDS, its usage has been associated with several iatrogenic risks. Of interest to our illustrative study is acute kidney injury (AKI), a critical condition that complicates ICU stays and is associated with increased mortality. The causal model underlying this problem is as follows:

The data on a single observational unit can be represented by the vector $O = (L_{1}, A_{1}, Z_{1}, M_{1}, L_{2}, \dots, A_{τ}, Z_{τ}, M_{τ}, Y)$ , with the data pooled across all participants denoted $O_{1}, \dots, O_{n}$ , for a of $n$ i.i.d. observations of $O$ . The associated DAG is

\dimendef\prevdepth=0
\pgfdeclarelayer{background}
\pgfsetlayers{background,main}
\usetikzlibrary{arrows,positioning}
\tikzset{
>=stealth',
punkt/.style={
rectangle,
rounded corners,
draw=black, very thick,
text width=6.5em,
minimum height=2em,
text centered},
pil/.style={
->,
thick,
shorten <=2pt,
shorten >=2pt,}
}
\newcommand{\Vertex}[3]
{\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {#3};
}
\pgfarrowsdeclare{arcs}{arcs}{...}
{
  \pgfsetdash{}{0pt} % do not dash
  \pgfsetroundjoin   % fix join
  \pgfsetroundcap    % fix cap
  \pgfpathmoveto{\pgfpoint{-5pt}{5pt}}
  \pgfpatharc{180}{270}{5pt}
  \pgfpatharc{90}{180}{5pt}
  \pgfusepathqstroke
}
\newcommand{\VertexR}[2]
{\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\ArrowR}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend right=30] (#2);
\end{pgfonlayer}
}
\newcommand{\ArrowB}[3]%
{ \begin{pgfonlayer}{background}
    \draw[|-arcs,line width=0.4mm,shorten <= 0.3cm,shorten >= 0.3cm,#3] (#1) -- +(#2);
  \end{pgfonlayer}
}
\newcommand{\ArrowL}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend left=45] (#2);
\end{pgfonlayer}
}
\newcommand{\EdgeL}[3]
{ \begin{pgfonlayer}{background}
\draw[dashed,#3] (#1) to[bend right=-45] (#2);
\end{pgfonlayer}
}
\newcommand{\Arrow}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) -- +(#2);
\end{pgfonlayer}
}
\begin{tikzpicture}
    \Vertex{-1, 0}{W}{$L_1$}
    \Vertex{1, 0}{A1}{$A_1$}
    \Vertex{2, 0}{M1}{$M_1$}
    \Vertex{3, 0}{Z1}{$L_2$}
    \Vertex{2, -1}{L1}{$Z_1$}
    \ArrowB{W}{A1}{black}
    \Arrow{A1}{M1}{black}
    \Arrow{A1}{L1}{black}
    \Arrow{M1}{Z1}{black}
    \Arrow{L1}{M1}{black}
    \Arrow{L1}{Z1}{black}
    \ArrowL{A1}{Z1}{black}
    \Vertex{5, 0}{A2}{$A_2$}
    \Vertex{6, 0}{M2}{$M_2$}
    \Vertex{7, 0}{Z2}{$L_3$}
    \Vertex{6, -1}{L2}{$Z_2$}
    \node (dots) at (8, 0) {$\cdots$};
    \Arrow{A2}{M2}{black}
    \Arrow{A2}{L2}{black}
    \Arrow{M2}{Z2}{black}
    \Arrow{L2}{M2}{black}
    \Arrow{L2}{Z2}{black}
    \ArrowL{A2}{Z2}{black}
    \ArrowB{Z1}{A2}{black}
    \Vertex{9, 0}{At}{$A_\tau$}
    \Vertex{10, 0}{Mt}{$M_\tau$}
    \Vertex{11, 0}{Zt}{$Y$}
    \Vertex{10, -1}{Lt}{$Z_\tau$}
    \Arrow{At}{Mt}{black}
    \Arrow{At}{Lt}{black}
    \Arrow{Mt}{Zt}{black}
    \Arrow{Lt}{Mt}{black}
    \Arrow{Lt}{Zt}{black}
    \ArrowL{At}{Zt}{black}
\end{tikzpicture}

Where we are using the following notation:

$W$ : baseline variables such as comorbidities, demographics, etc.
$L_{t}$ and $Z_{t}$ : time-varying covariates such as lab results, vitals, treatments, etc.
$A_{t}$ : type of oxygen support at time $t$ (0: no oxygen support, 1: oxygen support excluding IMV, 2: IMV)
$M_{t}$ : indicator of AKI at time $t$
$Y$ : mortality at end of study
We will use $H_{t}$ as a shorthand to denote all the data measured up until right before $A_{t}$ occurs

17.2 Defining causal effects in this example

How can we define (total) causal effects in this example?

Main challenge: cannot consider static treatmemt regimes (e.g., do not intubate)
Such regimes would not be supported in the data (doctors would always intubate a person whose blood oxygen is too low)
We use modified treatment policies to address this
Main idea: consider a slight modification to the treatment a patient actually received
For example, can consider the effect of a small delay in receiving IMV
Specifically, we will consider a delay of one day in receiving IMV
In notation, the treatment regime would be as follows:

\begin{matrix} (1) & d_{t} (a_{t}, h_{t}) = {\begin{cases} 1 & if a_{t} = 2 and a_{s} \leq 1 for all s < t, \\ a_{t} & otherwise. \end{cases} \end{matrix}

We could then define the total effect as $E [Y (d) - Y]$ , where $Y (d)$ is the counterfactual mortality if the above rule had been implemented every day, i.e., the intervention is $d = (d_{1}, d_{2}, \dots, d_{τ})$ .
This is a contrast of the mortality rate under a treatment rule that would delay intubation by one day vs the mortality rate that was actually observed.

17.3 How do we define mediation causal effects with time-varying data?

The above causal effect could be decomposed into natural direct and indirect effects as follows

\begin{aligned} E [Y (d) - Y] & = E [Y (d, M (d)) - Y (A, M)] \\ = \underset{natural indirect effect}{\underset{⏟}{E [Y (d, M (d)) - Y (d, M)]}} + \underset{natural direct effect}{\underset{⏟}{E [Y (d, M) - Y (A, M)]}} \end{aligned}

However, as before, these natural mediation effects are not identified.
The reason is that time-varying mediators exacerbate the issue of intermediate confounding. To see why, let us look at the DAG again:

\dimendef\prevdepth=0
\pgfdeclarelayer{background}
\pgfsetlayers{background,main}
\usetikzlibrary{arrows,positioning}
\tikzset{
>=stealth',
punkt/.style={
rectangle,
rounded corners,
draw=black, very thick,
text width=6.5em,
minimum height=2em,
text centered},
pil/.style={
->,
thick,
shorten <=2pt,
shorten >=2pt,}
}
\newcommand{\Vertex}[3]
{\node[minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {#3};
}
\pgfarrowsdeclare{arcs}{arcs}{...}
{
  \pgfsetdash{}{0pt} % do not dash
  \pgfsetroundjoin   % fix join
  \pgfsetroundcap    % fix cap
  \pgfpathmoveto{\pgfpoint{-5pt}{5pt}}
  \pgfpatharc{180}{270}{5pt}
  \pgfpatharc{90}{180}{5pt}
  \pgfusepathqstroke
}
\newcommand{\VertexR}[2]
{\node[rectangle, draw, minimum width=0.6cm,inner sep=0.05cm] (#2) at (#1) {$#2$};
}
\newcommand{\ArrowR}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend right=30] (#2);
\end{pgfonlayer}
}
\newcommand{\ArrowB}[3]%
{ \begin{pgfonlayer}{background}
    \draw[|-arcs,line width=0.4mm,shorten <= 0.3cm,shorten >= 0.3cm,#3] (#1) -- +(#2);
  \end{pgfonlayer}
}
\newcommand{\ArrowL}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) to[bend left=45] (#2);
\end{pgfonlayer}
}
\newcommand{\EdgeL}[3]
{ \begin{pgfonlayer}{background}
\draw[dashed,#3] (#1) to[bend right=-45] (#2);
\end{pgfonlayer}
}
\newcommand{\Arrow}[3]
{ \begin{pgfonlayer}{background}
\draw[->,#3] (#1) -- +(#2);
\end{pgfonlayer}
}
\begin{tikzpicture}
    \Vertex{-1, 0}{W}{$L_1$}
    \Vertex{1, 0}{A1}{$A_1$}
    \Vertex{2, 0}{M1}{$M_1$}
    \Vertex{3, 0}{Z1}{$L_2$}
    \Vertex{2, -1}{L1}{$Z_1$}
    \ArrowB{W}{A1}{black}
    \Arrow{A1}{M1}{black}
    \Arrow{A1}{L1}{black}
    \Arrow{M1}{Z1}{black}
    \Arrow{L1}{M1}{black}
    \Arrow{L1}{Z1}{black}
    \ArrowL{A1}{Z1}{black}
    \Vertex{5, 0}{A2}{\textcolor{teal}{$A_2$}}
    \Vertex{6, 0}{M2}{\textcolor{orange}{$M_2$}}
    \Vertex{7, 0}{Z2}{\textcolor{orange}{$L_3$}}
    \Vertex{6, -1}{L2}{\textcolor{orange}{$Z_2$}}
    \node (dots) at (8, 0) {$\cdots$};
    \Arrow{A2}{M2}{black}
    \Arrow{A2}{L2}{black}
    \Arrow{M2}{Z2}{black}
    \Arrow{L2}{M2}{black}
    \Arrow{L2}{Z2}{black}
    \ArrowL{A2}{Z2}{black}
    \ArrowB{Z1}{A2}{black}
    \Vertex{9, 0}{At}{\textcolor{orange}{$A_\tau$}}
    \Vertex{10, 0}{Mt}{\textcolor{violet}{$M_\tau$}}
    \Vertex{11, 0}{Zt}{\textcolor{violet}{$Y$}}
    \Vertex{10, -1}{Lt}{\textcolor{orange}{$Z_\tau$}}
    \Arrow{At}{Mt}{black}
    \Arrow{At}{Lt}{black}
    \Arrow{Mt}{Zt}{black}
    \Arrow{Lt}{Mt}{black}
    \Arrow{Lt}{Zt}{black}
    \ArrowL{At}{Zt}{black}
\end{tikzpicture}

Note that all the variables in orange are confounders of the mediator $M_{τ}$ and the outcome, and are also affected by treatment at time $t = 2$ .

One possible solution to the above issues involves considering randomized versions of the above effects. Specifically:

Define $G (d)$ to be a random draw from the distribution of $M (d)$ conditional on baseline variables $W$ .
We can then obtain the decomposition

$E [Y (d, G (d)) - Y (A, G (A))] = \underset{randomized interventional indirect effect}{\underset{⏟}{E [Y (d, G (d)) - Y (d, G (A))]}} + \underset{randomized interventional direct effect}{\underset{⏟}{E [Y (d, G (A)) - Y (A, G (A))]}}$

As an example, consider the counterfactual $Y (d, G (d))$ . - $M (d)$ is the observed AKI status of patients under a delay in intubation. If $W$ is age, and we are deciding how to intervene on a patient who is 45 years old, we take all the AKI statuses of 45 year olds and draw one of these AKI values at random. Call this random draw $G (d)$ - For a 45 year old patient, $Y (d, G (d))$ is the counterfactual mortality of a patient if intubation had been delayed, and their AKI status would have been assigned to a random draw from the AKI status of 45 year patients.

17.4 Identification assumptions and formula

The above effects are identified under the following assumptions:

All the common causes of $A_{t}$ and $(Z_{s}, M_{s}, A_{s + 1}, L_{s + 1})$ are measured for $s \geq t$
All the common causes of $M_{t}$ and $(Z_{s + 1}, A_{s + 1}, L_{s + 1})$ are measured for $s \geq t$
The intervention $d$ is supported in the data, meaning that for every patient with covariates $h_{t}$ who had treatment status $a_{t}$ , it is possible to find a patient with covariates $h_{t}$ who had treatment status $d (a_{t}, h_{t})$
- In our example, this translates roughly as: for every patient with covariate history $h_{t}$ who was intubated at time $t$ , it is possible to find a patient covariate history $h_{t}$ who was intubated at time $t + 1$ .
There is a positive probability of the mediator $M_{t}$ for all feasible covariate histories.

The identification formula is complex, but we will explain it in the case of two time points. That is, assume the data are $O = (W, A_{1}, Z_{1}, M_{1}, L_{1}, A_{2}, Z_{2}, M_{2}, Y)$ .

Identification can be based on the following procedure. First, for each value $m_{1}$ and $m_{2}$ of the mediator, compute outcome regressions as follows:

Regress $Y$ on $W, A_{1}, Z_{1}, M_{1}, L_{1}, A_{2}, Z_{2}, M_{2}$ . Use this regression to predict the outcome had the mediator been set to $M_{2} = m_{2}$ . Let ${\tilde{Y}}_{2} (m_{2})$ denote this prediction.
Regress ${\tilde{Y}}_{2} (m_{2})$ on $W, A_{1}, Z_{1}, M_{1}, L_{1}, A_{2}$ . Use this regression to predict the outcome had the treatment $A_{2}$ been set to $d (A_{2}, H_{2})$ . Let ${\tilde{Y}}_{2}^{d} (m_{2})$ denote this prediction.
Regress ${\tilde{Y}}_{2} (m_{2}, d_{2})$ on $W, A_{1}, Z_{1}, M_{1}$ . Use this regression to predict the outcome had the mediator been set to $M_{1} = m_{1}$ . Let ${\tilde{Y}}_{1} (m_{1}, m_{2})$ denote this prediction.
Regress ${\tilde{Y}}_{1} (m_{1}, m_{2})$ on $W, A_{1}$ . Use this regression to predict the outcome had the treatment $A_{1}$ been set to $d (A_{1}, H_{1})$ . Let ${\tilde{Y}}_{1}^{d} (m_{1}, m_{2})$ denote this prediction.

Then, for each value $m_{1}$ and $m_{2}$ of the mediator, compute the mediator distribution as follows:

Regress the binary variable $I (M_{2} = m_{2})$ on $W, A_{1}, Z_{1}, M_{1}, L_{1}, A_{2}$ . Use this model to predict the probability of $M_{2} = m_{2}$ under an intervention that sets $A_{2}$ to $d (A_{2}, H_{2})$ . Let this predicted probability be denoted with $P (m_{2})$ .
Regress the binary variable $I (M_{1} = m_{1}) P (m_{2})$ on $W, A_{1}$ . Use this model to predict under an intervention that sets $A_{1}$ to $d (A_{1}, H_{1})$ . Let this prediction be denoted with $P (m_{1}, m_{2})$ .

At the end of these two sequential regression procedures, we have values ${\tilde{Y}}_{1}^{d} (m_{1}, m_{2})$ and $P (m_{1}, m_{2})$ for each value of the mediator $(m_{1}, m_{2})$ . Then, under identification assumptions, we have:

$E [Y (d, G (d)) = \sum_{m_{1}, m_{2}} {\tilde{Y}}_{1}^{d} (m_{1}, m_{2}) P (m_{1}, m_{2}) .$

17.5 Estimators and R package

As before, we can develop inverse probability weighted estimators, as well as substitution estimators based on the g-computation formula and doubly robust (DR) estimators.

All of these estimators get significantly more complex. For instance, an g-computation estimator may be developed by running the regressions indicated in the above sequential regression procedures.

Fortunately, the doubly robust estimators are coded in a package that can be used off-the-shelf without having to code any complicated sequential regression strategies on your own. Let us look at an example from the lcmmtp R package. First, let’s take a look at a simulated dataset available in the package:

library(lcmmtp)
data <- as.data.frame(apply(lcmmtp_foo, 2, as.numeric))
head(data)
dim(lcmmtp_foo)

Now, let us perform an analysis where we assume our intent is to estimate $E [Y (1), G (0)]$ :

library(mlr3extralearners)
vars <- lcmmtp:::lcmmtp_variables$new(
    L = list(c("L_1"), c("L_2")),
    A = c("A_1", "A_2"),
    Z = list(c("Z_1"), c("Z_2")),
    M = c("M_1", "M_2"),
    Y = "Y",
    cens = c("c1", "c2")
)
lrnrs <- c("mean", "earth", "glm")
d_ap <- function(data, trt) rep(1, length(data[[trt]]))
d_as <- function(data, trt) rep(0, length(data[[trt]]))

EY10 <- lcmmtp(
  data, vars, d_ap, d_as,
  control = .lcmmtp_control(folds = 2,
                            learners_trt = lrnrs,
                            learners_mediator = lrnrs,
                            learners_QL = lrnrs,
                            learners_QZ = lrnrs,
                            learners_QM = lrnrs)
)

Now, assume that we want to estimate the direct effect by contrasting $E [Y (1), G (0)] - E [Y (0), G (0)]$ :

EY00 <- lcmmtp(
  lcmmtp_foo, vars, d_as, d_as,
  control = .lcmmtp_control(folds = 2,
                            learners_trt = lrnrs,
                            learners_mediator = lrnrs,
                            learners_QL = lrnrs,
                            learners_QZ = lrnrs,
                            learners_QM = lrnrs)
)

And we can contrast the two using a convenient function from the lmtp R package:

library(lmtp)
class(EY00) <- class(EY10) <- 'lmtp'
EY00$estimator <- EY10$estimator <- 'SDR'
names(EY00)[3] <- names(EY10)[3] <- 'eif'
lmtp_contrast(EY10, ref = EY00)

17.6 Pros and cons of this methodology

Pros

Allows the non-parametric definition, identification, and estimation of mediational causal effects for general longitudinal data structures
Allows for the use of machine learning to alleviate model misspecification bias, and is equipped with formulas for the computation of correct standard errors and confidence intervals
Easy-to-use software

Cons

Some limitations remain: mediators $M$ need to be discrete random variables
As before, interventional effects do not satisfy the mediational sharp null criteria, meaning that they may be different from zero when no individual in the population experiences mediational effects
- This is probably not a big worry in practice, but it is something we are keeping in mind as we develop novel estimators