18 NetCoupler

All documentation for the algorithm and package can be found here.

18.1 What is NetCoupler?

The goal of NetCoupler is to estimate potential causal links between a set of -omic (e.g. metabolomics, lipidomics) or other high-dimensional metabolic data as a conditional dependency network and either a disease outcome, an exposure, or both.

These potential causal links are classified as direct, ambigious, or no effects.

This algorithm is largely meant to be used with -omic style data to generate the networks and while theoretically non-omic data could be used, we have not tested it in that context.

Given the algorithms nature, it’s primarily designed to be used for exploration of potential mechanisms and used to complement other analyses for a research question. It could also be used to confirm a pre-specified and explicit hypothesis, similar to how structural equation models are used. However, this might be a more niche use.

Figure 18.1: Overview of the NetCoupler algorithm

18.2 Why or when might you want to use NetCoupler?

You are interested in asking a research question on how some factor might influence another factor and how it might mediate through a metabolic network.
If you want to explore how a factor might influence a metabolic network or how a metabolic network might influence a factor.
You have an -omic dataset and want another method to explore how it relates to your variable of interest.

18.3 Input and assumptions

The input for NetCoupler includes:

Standardized metabolic or other high-dimensional data.
Exposure or outcome data.
Network estimating method (default is the PC algorithm (Colombo and Maathuis 2014) from the pcalg package).
Modeling method (e.g. linear regression with lm()), including confounders to adjust for.

The final output is the modeling results along with the results from NetCoupler’s classification. Results can then be displayed as a joint network model in graphical format.

There are a few key assumptions to consider before using NetCoupler for your own research purposes.

-omics data is the basis for the network. We haven’t tested this on non-omics datasets, so can’t guarantee it works as intended.
The variables used for the metabolic network are numerical
Metabolic data should have a theoretical network underlying it.
Missing data are not used in any of the NetCoupler processes.

18.4 Installation

To install the official CRAN version, use:

install.packages("NetCoupler")

18.5 Example

18.5.1 Estimating the metabolic network

For estimating the network, it’s (basically) required to standardize the metabolic variables before inputting into nc_estimate_network().

This function also log-transforms and scales (mean-center and z-score normalize) the values of the metabolic variables.

We do this because the network estimation algorithm can sometimes be finicky about differences in variable numerical scale (mean of 1 vs mean of 1000).

library(NetCoupler)
library(here)

load(file = here::here("data", "simulated_data.Rda"))

std_metabolic_data <- simulated_data %>%
  nc_standardize(starts_with("metabolite"))

If you have potential confounders that you need to adjust for during the estimating links phase of NetCoupler, you’ll need to include these confounding variables when standardizing the metabolic variables. You do this by regressing the confounding variables on the metabolic variables by using the regressed_on argument of nc_standardize().

This will automatically first standardize the variables, run models on the metabolic variables that includes the confounding variables, and then extract the residuals from the model which are then used to construct the network. Here’s an example:

std_metabolic_data <- simulated_data %>%
  nc_standardize(starts_with("metabolite"),
    regressed_on = "age"
  )

After that, you can estimate the network. The network is by default estimated using the PC-algorithm.

# Make partial independence network from metabolite data
metabolite_network <- std_metabolic_data %>%
  nc_estimate_network(starts_with("metabolite"))

18.5.2 Estimating exposure and outcome-side connections

For the exposure and outcome side, you should standardize the metabolic variables, but this time, we don’t regress on the confounders since they will be included in the models.

standardized_data <- simulated_data %>%
  nc_standardize(starts_with("metabolite"))

Now you can estimate the outcome or exposure and identify direct effects for either the exposure side (exposure -> metabolite) or the outcome side (metabolite -> outcome).

For the exposure side, the function identifies whether a link between the exposure and an index node (one metabolic variable in the network) exists, independent of potential confounders and from neighbouring nodes (other metabolic variables linked to the index variable).

Depending on how consistent and strong the link is, the effect is classified as “direct”, “ambiguous”, or “none”.

In the example below, we specifically generated the simulated data so that the exposure is associated with metabolites 1, 8, and 12. And as we can see, those links have been correctly identified.

outcome_estimates <- standardized_data %>%
  nc_estimate_outcome_links(
    edge_tbl = as_edge_tbl(metabolite_network),
    outcome = "outcome_continuous",
    model_function = lm
  )
outcome_estimates

#> # A tibble: 12 × 6
#>    outcome            index_node    estimate std_error fdr_p_value effect   
#>    <chr>              <chr>            <dbl>     <dbl>       <dbl> <chr>    
#>  1 outcome_continuous metabolite_1   0.0466     0.0254   0.124     ambiguous
#>  2 outcome_continuous metabolite_10  0.00449    0.0254   0.947     ambiguous
#>  3 outcome_continuous metabolite_11 -0.00700    0.0254   0.912     none     
#>  4 outcome_continuous metabolite_12  0.350      0.0242   0         direct   
#>  5 outcome_continuous metabolite_2  -0.0280     0.0255   0.424     none     
#>  6 outcome_continuous metabolite_3  -0.0936     0.0252   0.000620  direct   
#>  7 outcome_continuous metabolite_4   0.0267     0.0256   0.453     ambiguous
#>  8 outcome_continuous metabolite_5   0.103      0.0253   0.000167  ambiguous
#>  9 outcome_continuous metabolite_6   0.113      0.0252   0.0000237 ambiguous
#> 10 outcome_continuous metabolite_7   0.00171    0.0255   0.956     none     
#> 11 outcome_continuous metabolite_8   0.0212     0.0253   0.548     none     
#> 12 outcome_continuous metabolite_9   0.201      0.0250   0         direct

exposure_estimates <- standardized_data %>%
  nc_estimate_exposure_links(
    edge_tbl = as_edge_tbl(metabolite_network),
    exposure = "exposure",
    model_function = lm
  )
exposure_estimates

#> # A tibble: 12 × 6
#>    exposure index_node    estimate std_error fdr_p_value effect   
#>    <chr>    <chr>            <dbl>     <dbl>       <dbl> <chr>    
#>  1 exposure metabolite_1   0.173      0.0228      0      direct   
#>  2 exposure metabolite_10  0.318      0.0219      0      direct   
#>  3 exposure metabolite_11  0.0543     0.0232      0.0409 ambiguous
#>  4 exposure metabolite_12  0.0242     0.0231      0.380  none     
#>  5 exposure metabolite_2  -0.0430     0.0231      0.106  ambiguous
#>  6 exposure metabolite_3   0.0411     0.0231      0.123  ambiguous
#>  7 exposure metabolite_4   0.00344    0.0232      0.920  none     
#>  8 exposure metabolite_5   0.0479     0.0232      0.0717 ambiguous
#>  9 exposure metabolite_6  -0.0189     0.0230      0.506  none     
#> 10 exposure metabolite_7  -0.162      0.0229      0      direct   
#> 11 exposure metabolite_8  -0.355      0.0216      0      direct   
#> 12 exposure metabolite_9   0.0571     0.0230      0.0292 ambiguous

If you want to adjust for confounders and have already used regressed_on in the nc_standardize() function, add confounders to nc_estimate_outcome_links() or nc_estimate_exposure_links() with the adjustment_vars argument:

outcome_estimates <- standardized_data %>%
  nc_estimate_outcome_links(
    edge_tbl = as_edge_tbl(metabolite_network),
    outcome = "outcome_continuous",
    model_function = lm,
    adjustment_vars = "age"
  )

outcome_estimates