5 Day 1 group work
- Conduct mediation analysis using the traditional approaches
- Conduct mediation analysis using the ‘mediation’ R package
- Understanding summary output with binary outcome variable
- The limitations of the traditional approaches
First, load the dataset and take a few minutes to explore the variables in the dataset.
5.1 Variables
The dataset includes 11627 participants and 39.
#> 'data.frame': 11627 obs. of 39 variables:
#> $ randid : int 2448 2448 6238 6238 6238 9428 9428 10552 10552 11252 ...
#> $ sex : int 1 1 2 2 2 1 1 2 2 2 ...
#> $ totchol : int 195 209 250 260 237 245 283 225 232 285 ...
#> $ age : int 39 52 46 52 58 48 54 61 67 46 ...
#> $ sysbp : num 106 121 121 105 108 ...
#> $ diabp : num 70 66 81 69.5 66 80 89 95 109 84 ...
#> $ cursmoke: int 0 0 0 0 0 1 1 1 1 1 ...
#> $ cigpday : int 0 0 0 0 0 20 30 30 20 23 ...
#> $ bmi : num 27 NA 28.7 29.4 28.5 ...
#> $ diabetes: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ bpmeds : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ heartrte: int 80 69 95 80 80 75 75 65 60 85 ...
#> $ glucose : int 77 92 76 86 71 70 87 103 89 85 ...
#> $ educ : int 4 4 2 2 2 1 1 3 3 3 ...
#> $ prevchd : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ prevap : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ prevmi : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ prevstrk: int 0 0 0 0 0 0 0 0 0 0 ...
#> $ prevhyp : int 0 0 0 0 0 0 0 1 1 0 ...
#> $ time : int 0 4628 0 2156 4344 0 2199 0 1977 0 ...
#> $ period : int 1 3 1 2 3 1 2 1 2 1 ...
#> $ hdlc : int NA 31 NA NA 54 NA NA NA NA NA ...
#> $ ldlc : int NA 178 NA NA 141 NA NA NA NA NA ...
#> $ death : int 0 0 0 0 0 0 0 1 1 0 ...
#> $ angina : int 0 0 0 0 0 0 0 0 0 0 ...
#> $ hospmi : int 1 1 0 0 0 0 0 0 0 0 ...
#> $ mi_fchd : int 1 1 0 0 0 0 0 0 0 0 ...
#> $ anychd : int 1 1 0 0 0 0 0 0 0 0 ...
#> $ stroke : int 0 0 0 0 0 0 0 1 1 0 ...
#> $ cvd : int 1 1 0 0 0 0 0 1 1 0 ...
#> $ hyperten: int 0 0 0 0 0 0 0 1 1 1 ...
#> $ timeap : int 8766 8766 8766 8766 8766 8766 8766 2956 2956 8766 ...
#> $ timemi : int 6438 6438 8766 8766 8766 8766 8766 2956 2956 8766 ...
#> $ timemifc: int 6438 6438 8766 8766 8766 8766 8766 2956 2956 8766 ...
#> $ timechd : int 6438 6438 8766 8766 8766 8766 8766 2956 2956 8766 ...
#> $ timestrk: int 8766 8766 8766 8766 8766 8766 8766 2089 2089 8766 ...
#> $ timecvd : int 6438 6438 8766 8766 8766 8766 8766 2089 2089 8766 ...
#> $ timedth : int 8766 8766 8766 8766 8766 8766 8766 2956 2956 8766 ...
#> $ timehyp : int 8766 8766 8766 8766 8766 8766 8766 0 0 4285 ...
5.2 Research question
In this exercise, we will investigate the effect of obesity on the risk of cardiovascular disease (CVD), and how much this relation is mediated by high blood pressure. Mediation analysis could help us answer this research question.
For the exercise on day 1, we will work with the following variables:
- bmi = body mass index (BMI) (continuous variable in kg/m2)
- sysbp = systolic blood pressure (continous)
- cvd = (binary: 0=no, 1=yes).
- cursmoke = smoking status (binary: 0 = no smoking, 1 = smoking)
- educ = education (categorical variable)
- sex = sex (0: male, 1: female)
- age = age (continuous variable)
Note: in this example, the outcome of interest is the presence of CVD, which is a binary variable. We will not think about time-to-event outcomes for now. More on this will come later.
Before we start, let’s think about the research question and draw a directed acyclic graphs. You can either draw by hand or use daggity (https://www.dagitty.net/).
Address the following questions:
- Estimate the relationship between obesity and blood pressure (fit the mediator model).
Cleaning the data:
Estimate the effect of obesity on the prevalence of CVD, adjusting for SES, or other confounding factors based on your DAG.
Estimate the effect of blood pressure on the prevalence of CVD, adjusting for obesity and other confounding factors based on your DAG.
Now, calculate the direct effect and indirect effect using the product method and difference method, and compare results from the two approaches. Note: you do not need to estimate the confidence intervals.
Product method:
Difference method:
- Calculate the direct effect and indirect effect using the mediation package.
library(CMAverse)
res_rb_confounders <- cmest(
data = frmgham_clean,
model = "rb",
outcome = "y",
exposure = "a",
mediator = "m",
basec = c("w1", "w2", "w3", "w4"),
EMint = FALSE,
mreg = list("linear"),
yreg = "logistic",
astar = 0,
a = 1,
mval = list(120),
estimation = "paramfunc",
inference = "delta"
)
summary(res_rb_confounders)
- Discuss the limitations of the traditional approach.