5  Day 1 group work

Objective of this session
  • Conduct mediation analysis using the traditional approaches
  • Conduct mediation analysis using the ‘mediation’ R package
  • Understanding summary output with binary outcome variable
  • The limitations of the traditional approaches

First, load the dataset and take a few minutes to explore the variables in the dataset.

library(here)
library(tidyverse)

load(here::here("data/frmghamdata.RData"))

5.1 Variables

The dataset includes 11627 participants and 39.

str(frmgham)
#> 'data.frame':    11627 obs. of  39 variables:
#>  $ randid  : int  2448 2448 6238 6238 6238 9428 9428 10552 10552 11252 ...
#>  $ sex     : int  1 1 2 2 2 1 1 2 2 2 ...
#>  $ totchol : int  195 209 250 260 237 245 283 225 232 285 ...
#>  $ age     : int  39 52 46 52 58 48 54 61 67 46 ...
#>  $ sysbp   : num  106 121 121 105 108 ...
#>  $ diabp   : num  70 66 81 69.5 66 80 89 95 109 84 ...
#>  $ cursmoke: int  0 0 0 0 0 1 1 1 1 1 ...
#>  $ cigpday : int  0 0 0 0 0 20 30 30 20 23 ...
#>  $ bmi     : num  27 NA 28.7 29.4 28.5 ...
#>  $ diabetes: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ bpmeds  : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ heartrte: int  80 69 95 80 80 75 75 65 60 85 ...
#>  $ glucose : int  77 92 76 86 71 70 87 103 89 85 ...
#>  $ educ    : int  4 4 2 2 2 1 1 3 3 3 ...
#>  $ prevchd : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ prevap  : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ prevmi  : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ prevstrk: int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ prevhyp : int  0 0 0 0 0 0 0 1 1 0 ...
#>  $ time    : int  0 4628 0 2156 4344 0 2199 0 1977 0 ...
#>  $ period  : int  1 3 1 2 3 1 2 1 2 1 ...
#>  $ hdlc    : int  NA 31 NA NA 54 NA NA NA NA NA ...
#>  $ ldlc    : int  NA 178 NA NA 141 NA NA NA NA NA ...
#>  $ death   : int  0 0 0 0 0 0 0 1 1 0 ...
#>  $ angina  : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ hospmi  : int  1 1 0 0 0 0 0 0 0 0 ...
#>  $ mi_fchd : int  1 1 0 0 0 0 0 0 0 0 ...
#>  $ anychd  : int  1 1 0 0 0 0 0 0 0 0 ...
#>  $ stroke  : int  0 0 0 0 0 0 0 1 1 0 ...
#>  $ cvd     : int  1 1 0 0 0 0 0 1 1 0 ...
#>  $ hyperten: int  0 0 0 0 0 0 0 1 1 1 ...
#>  $ timeap  : int  8766 8766 8766 8766 8766 8766 8766 2956 2956 8766 ...
#>  $ timemi  : int  6438 6438 8766 8766 8766 8766 8766 2956 2956 8766 ...
#>  $ timemifc: int  6438 6438 8766 8766 8766 8766 8766 2956 2956 8766 ...
#>  $ timechd : int  6438 6438 8766 8766 8766 8766 8766 2956 2956 8766 ...
#>  $ timestrk: int  8766 8766 8766 8766 8766 8766 8766 2089 2089 8766 ...
#>  $ timecvd : int  6438 6438 8766 8766 8766 8766 8766 2089 2089 8766 ...
#>  $ timedth : int  8766 8766 8766 8766 8766 8766 8766 2956 2956 8766 ...
#>  $ timehyp : int  8766 8766 8766 8766 8766 8766 8766 0 0 4285 ...

5.2 Research question

In this exercise, we will investigate the effect of obesity on the risk of cardiovascular disease (CVD), and how much this relation is mediated by high blood pressure. Mediation analysis could help us answer this research question.

For the exercise on day 1, we will work with the following variables:

  • bmi = body mass index (BMI) (continuous variable in kg/m2)
  • sysbp = systolic blood pressure (continous)
  • cvd = (binary: 0=no, 1=yes).
  • cursmoke = smoking status (binary: 0 = no smoking, 1 = smoking)
  • educ = education (categorical variable)
  • sex = sex (0: male, 1: female)
  • age = age (continuous variable)

Note: in this example, the outcome of interest is the presence of CVD, which is a binary variable. We will not think about time-to-event outcomes for now. More on this will come later.

Before we start, let’s think about the research question and draw a directed acyclic graphs. You can either draw by hand or use daggity (https://www.dagitty.net/).

Address the following questions:

  1. Estimate the relationship between obesity and blood pressure (fit the mediator model).

Cleaning the data:

frmgham_clean <- frmgham %>%
  select(
    id = randid, 
    w1 = sex,
    w2 = age,
    w3 = cursmoke,
    w4 = educ,
    a = bmi, # this is the exposure
    m = sysbp, # this is the mediator
    y = cvd # this is the outcome
  ) %>% 
  na.omit()

frmgham_clean <- frmgham_clean %>%
  mutate(a = case_when(a >= 30 ~ 1,
                       TRUE ~ 0))
  1. Estimate the effect of obesity on the prevalence of CVD, adjusting for SES, or other confounding factors based on your DAG.

  2. Estimate the effect of blood pressure on the prevalence of CVD, adjusting for obesity and other confounding factors based on your DAG.

  3. Now, calculate the direct effect and indirect effect using the product method and difference method, and compare results from the two approaches. Note: you do not need to estimate the confidence intervals.

Product method:

Difference method:

  1. Calculate the direct effect and indirect effect using the mediation package.
library(CMAverse)

res_rb_confounders <- cmest(
  data = frmgham_clean,
  model = "rb",
  outcome = "y",
  exposure = "a",
  mediator = "m",
  basec = c("w1", "w2", "w3", "w4"),
  EMint = FALSE,
  mreg = list("linear"),
  yreg = "logistic",
  astar = 0,
  a = 1,
  mval = list(120),
  estimation = "paramfunc",
  inference = "delta"
)

summary(res_rb_confounders)
  1. Discuss the limitations of the traditional approach.