Introduction to Bayesian kernel machine regression and the bkmr R package

In this document, we illustrate the main features of the bkmr R package through examples. Additional information on the statistical methodology and on the computational details are provided in Bobb et al. 2015.

Brief overview of kernel machine regression

Kernel machine regression (KMR), also called Gaussian process regression, is a popular tool in the machine learning literature. The main idea behind KMR is to flexibly model the relationship between a large number of variables and a particular outcome (dependent variable). The general modeling framework we consider here is

\[ g(\mu_i) = h(z_{i1}, \ldots, z_{iM}) + \beta{\bf x}_i, \quad i = 1, \ldots, n \] where $g$ is a monotonic link function, $\mu_i = E(Y_i)$, $h$ is a flexible function of the predictor variables $z_{i1}, \ldots, z_{iM}$, and ${\bf x}$ is a vector of covariates assumed to have a linear relationship with the outcome ($\beta$ is the corresponding vector of coefficients). We will refer to the predictors ${\bf z}$ as exposure variables and to $h(\cdot)$ as the exposure-response function. In settings where there are a large number of exposures, or when the exposure-response function is a complex, potentially nonlinear and non-additive relation, it may be challenging to specify a set of basis functions to represent $h$. In these settings, an alternative way to characterize $h$ is through a kernel machine representation.

There are several choices for the kernel function used to represent $h$ under KMR. Here we focus on the Gaussian kernel, which flexibly captures a wide range of underlying functional forms for $h$, and can be expressed as \[ K({\bf z}, {\bf z}^\prime) = \exp\left\{ -\sum_{m=1}^M r_m \left( z_m - z_m^\prime \right)^{2}\right\}. \] Here ${\bf z}$ and ${\bf z^\prime}$ represent vectors of predictors for two different individuals, and $r_m \ge 0$ denotes the tuning parameter that control the smoothness of $h$ as a function of the exposure $z_m$. Intuitively, the kernel function shrinks the estimated health effects of two individuals with similar exposure profiles toward each other.

This package includes functions for conducting Bayesian inference for the model above. The main function kmbayes() fits the model. We also provide several functions to summarize the model output in different ways and to visually display the results.

Basic example with component-wise variable selection

First, load the R package.

library(bkmr)

Now, to illustrate the main features of the R package bkmr, let’s first generate some data. We have built in a few functions directly into the R package for this purpose.

set.seed(111)
dat <- SimData(n = 50, M = 4)
y <- dat$y
Z <- dat$Z
X <- dat$X

Let’s view the true exposure-response function used to generate the data

z1 <- seq(min(dat$Z[, 1]), max(dat$Z[, 1]), length = 20)
z2 <- seq(min(dat$Z[, 2]), max(dat$Z[, 2]), length = 20)
hgrid.true <- outer(z1, z2, function(x,y) apply(cbind(x,y), 1, dat$HFun))

res <- persp(z1, z2, hgrid.true, theta = 30, phi = 20, expand = 0.5, 
             col = "lightblue", xlab = "", ylab = "", zlab = "")

Fit BKMR

To fit the BKMR model, we use the kmbayes function. This function implements the Markov chain Monte Carlo (MCMC) algorithm. The argument iter indicates the number of iterations of the MCMC sampler; y is the vector of outcomes, Z is a matrix of exposures (each column is an exposure variable); X is a matrix of covariates (each column is a covariate); verbose indicates whether interim output summarizing the progress of the model fitting should be printed; and varsel indicates whether to conduct variable selection on the predictors $z_{im}$.

set.seed(111)
fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 10000, verbose = FALSE, varsel = TRUE)

The interim output tells you about the algorithm as it is running, including the number of iterations competed and the amount of time elapsed since the algorithm started. With verbose = TRUE the interim output will additionally include the acceptance rates of parameters that are updated by using the Metropolis-Hastings algorithm.

Investigate model convergence

Let’s visually inspect the trace plots, showing how various parameter values change as the sampler runs.

TracePlot(fit = fitkm, par = "beta")

TracePlot(fit = fitkm, par = "sigsq.eps")

TracePlot(fit = fitkm, par = "r", comp = 1)

Estimated posterior inclusion probabilities

Because we fit the model with variable selection (we set the argument varsel to be TRUE), we can estimate the posterior inclusion probability (PIP) for each of the exposures $z_{im}$.

ExtractPIPs(fitkm)

##   variable    PIP
## 1       z1 1.0000
## 2       z2 1.0000
## 3       z3 0.1122
## 4       z4 0.3034

Estimating h

To estimate $h({\bf z})$ at a particular vector of exposures ${\bf z}$, we note that the posterior distribution of $h$ is normally distributed with posterior mean $\mu_h(\theta)$ and variance $V_h(\theta)$, where $\theta$ denotes the vector of model parameters (e.g., $\beta$, $r_m$); the specific forms of these mean and variance functions are given in the supplemental material of Bobb et al. 2015.

The bkmr package provides three post-processing functions for estimating $h({\bf z})$.

Approach 1: estimates the posterior mean as $\mu_h(\hat\theta)$ and the posterior variance as $V_h(\hat\theta)$ where $\hat\theta$ is the posterior mean estimate of $\theta$.
- This approach is very fast but is only an approximate method
Approach 2: estimates the posterior mean as $E[\mu_h(\theta)]$ by taking the mean of posterior samples of $\mu_h(\theta)$, and the posterior variance as $E[V_h(\theta)] + Var[\mu_h(\theta)]$ by taking the mean of the posterior samples of $V_h(\theta)$ and the variance of the posterior samples of $\mu_h(\theta)$ and them summing these quantities.
- This approach is fast for moderate sized datasets but can be slow for large datasets
Approach 3: generates posterior samples of $h({\bf z})$ by sampling $h \sim N(\mu_h(\theta), V_h(\theta))$, given particular samples of $\theta$ from the fitted BKMR model.
- This approach is the slowest, but allows for estimating the full posterior distribution of $h({\bf z})$ rather than just the posterior mean and variance.

Approaches 2 and 3 are exact methods in the sense that they will provide unbiased estimates of the posterior summaries if the BKMR model is correctly specified and if the model has converged. To illustrate the three different approaches, here we estimate $h$ at the median value of the exposures:

med_vals <- apply(Z, 2, median)
Znew <- matrix(med_vals, nrow = 1)
h_true <- dat$HFun(Znew)

h_est1 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "approx")
h_est2 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "exact")
set.seed(111)
samps3 <- SamplePred(fitkm, Znew = Znew, Xnew = cbind(0))

h_est_compare <- data.frame(
  method = c("truth", 1:3),
  post_mean = c(h_true, h_est1$postmean, h_est2$postmean, mean(samps3)),
  post_sd = c(NA, sqrt(h_est1$postvar), sqrt(h_est2$postvar), sd(samps3))
)
h_est_compare

##   method post_mean   post_sd
## 1  truth  2.037946        NA
## 2      1  2.310482 0.1647542
## 3      2  2.283688 0.2425458
## 4      3  2.282308 0.2417609

We see that the posterior mean estimates are similar for all of the approaches, with the two exact methods (2 and 3) having very similar posterior standard deviation (SD) estimates. To speed up computation for methods 2 and 3, one can select which iterations from the original model fit to use, using the sel argument.

Summarize model output

Let’s now explore the different functions included in the bkmr package that can be used to summarize the model output. These include functions to visualize different cross-sections of the exposure-response surface, as well as to calculate a variety of summary statistics that may be of scientific interest.

Plot the predictor-response function

Once we have fit the BKMR model, we often would like to visualize $h(\cdot)$. Because we can’t view a high-dimensional surface, we instead look at different cross-sections of this surface. We do this focusing on the relationships of 1 or 2 exposures with the outcome and setting the remaining exposures to specific values.

One cross section of interest is the univariate relationship between each $z_m$ and the outcome, where all of the other exposures are fixed to a particular percentile. This can be done using the function PredictorResponseUnivar. The argument specifying the quantile at which to fix the other exposures is given by q.fixed (the default value is q.fixed = 0.5).

pred.resp.univar <- PredictorResponseUnivar(fit = fitkm)

We use the ggplot2 package to plot the resulting cross section of $h$.

library(ggplot2)
ggplot(pred.resp.univar, aes(z, est, ymin = est - 1.96*se, ymax = est + 1.96*se)) + 
    geom_smooth(stat = "identity") + 
    facet_wrap(~ variable) +
  ylab("h(z)")

Here we see that for $z_1$ and $z_2$, increasing values of the exposures is associated with increasing values of the outcome $Y$ (for each of the other predictors in $Z$ fixed to their 50th percentile, and for the covariates in $x$ held constant). It also looks like $z_1$ may have a nonlinear relationship with $Y$, with a similar suggestion of nonlinearity for $z_2$.

Building upon the previous example, we can similarly visualze the bivarate exposure-response function for two predictors, where all of the other predictors are fixed at a particular percentile.

pred.resp.bivar <- PredictorResponseBivar(fit = fitkm, min.plot.dist = 1)

This output can be used to create image or contour plots (e.g., by using the geom_raster function from the ggplot2 package.

ggplot(pred.resp.bivar, aes(z1, z2, fill = est)) + 
    geom_raster() + 
    facet_grid(variable2 ~ variable1) +
  scale_fill_gradientn(colours=c("#0000FFFF","#FFFFFFFF","#FF0000FF")) +
  xlab("expos1") +
  ylab("expos2") +
  ggtitle("h(expos1, expos2)")

Because it can be hard to see what’s going on in these types of image plots, an alternative approach is to investigate the predictor-response function of a single predictor in Z for the second predictor in Z fixed at various quantiles (and for the remaining predictors fixed to a particular value). These can be obtained using the PredictorResponseBivarLevels function, which takes as input the bivariate exposure-response function outputted from the previous command, where the argument qs specifies a sequence of quantiles at which to fix the second predictor.

pred.resp.bivar.levels <- PredictorResponseBivarLevels(
  pred.resp.df = pred.resp.bivar, 
                                                       
  Z = Z, qs = c(0.1, 0.5, 0.9))

ggplot(pred.resp.bivar.levels, aes(z1, est)) + 
    geom_smooth(aes(col = quantile), stat = "identity") + 
    facet_grid(variable2 ~ variable1) +
    ggtitle("h(expos1 | quantiles of expos2)") +
  xlab("expos1")

Summary statistics of the predictor-response function

In addition to visually inspecting the estimated predictor-response function $h$, one may also wish to calculate a range of summary statistics that highligh specific features of the (potentially) high-dimensional surface. One potential summary measure of interest is to compute the overall effect of the predictors, by comparing the value of $h$ when all of predictors are at a particular percentile as compared to when all of them are at their 50th percentile. The function OverallRiskSummaries allows one to specify a sequence of values of quantiles using the argument qs and the fixed quantile (the default is the 50th percentile) using the argument q.fixed.

risks.overall <- OverallRiskSummaries(fit = fitkm, y = y, Z = Z, X = X, 
                                      qs = seq(0.25, 0.75, by = 0.05), 
                                      q.fixed = 0.5, method = "exact")
risks.overall

##    quantile         est         sd
## 1      0.25 -1.24022859 0.20999691
## 2      0.30 -1.07409824 0.17907496
## 3      0.35 -0.57098339 0.10056121
## 4      0.40 -0.37647134 0.06862383
## 5      0.45 -0.06985162 0.02041499
## 6      0.50  0.00000000 0.00000000
## 7      0.55  0.28945731 0.05089159
## 8      0.60  0.51300959 0.09587608
## 9      0.65  0.67740749 0.13012125
## 10     0.70  0.88153349 0.16111132
## 11     0.75  1.10039068 0.19844371

We can also plot the overall risk summaries; here we use the ggplot package.

ggplot(risks.overall, aes(quantile, est, ymin = est - 1.96*sd, ymax = est + 1.96*sd)) + 
    geom_pointrange()

Another summary of $h$ that may be of interest would be to summarize the contribution of a individual predictors to the response. For example, we may wish to compare risk when a single predictor in $h$ is at the 75th percentile as compared to when that predictor is at its 25th percentile, where we fixe all of the remaining predictors to a particular percentile. We refer to this as the single-predictor health risks, and these can be computed using the function SingVarRiskSummaries. The two different quantiles at which to compare the risk are specified using the qs.diff argument, and a sequence of values at which to fix the remaining pollutants can be specified using the q.fixed argument.

risks.singvar <- SingVarRiskSummaries(fit = fitkm, y = y, Z = Z, X = X, 
                                      qs.diff = c(0.25, 0.75), 
                                      q.fixed = c(0.25, 0.50, 0.75),
                                      method = "exact")
risks.singvar

## # A tibble: 12 × 4
##    q.fixed variable           est         sd
##     <fctr>   <fctr>         <dbl>      <dbl>
## 1     0.25       z1  0.9900863616 0.27305455
## 2     0.25       z2  0.6538145079 0.25585205
## 3     0.25       z3  0.0013583345 0.07331792
## 4     0.25       z4 -0.0028078663 0.11460949
## 5      0.5       z1  1.3820953638 0.26906484
## 6      0.5       z2  1.1379994710 0.23862341
## 7      0.5       z3 -0.0008100762 0.05338260
## 8      0.5       z4  0.0365624832 0.10957626
## 9     0.75       z1  1.6852732951 0.32844982
## 10    0.75       z2  1.3105576463 0.27502934
## 11    0.75       z3 -0.0047191015 0.06051743
## 12    0.75       z4  0.0481485003 0.12991681

Here we see, for example, that a change in the predictor $z_1$ from its 75th to its 25th percentile, where the predictors $z_2$, $z_3$, and $z_4$ are fixed at their 75th percentile (and for the covariates $x$ fixed), is given by 1.69. It is easier to investigate trends in the estimates by plotting the results:

ggplot(risks.singvar, aes(variable, est, ymin = est - 1.96*sd, 
                          ymax = est + 1.96*sd, col = q.fixed)) + 
    geom_pointrange(position = position_dodge(width = 0.75)) + 
  coord_flip()

We see that the predictors $z_3$ and $z_4$ do not contribute to the risk, and that higher values of $z_1$ and $z_2$ are associated with higher values of the $h$ function. In addition, the plot suggests that for $z_1$, as the remaining predictors increase in value from their 25th to their 75th percentile, the risk of the outcome associated with $z_1$ increases. A similar pattern occurs for $z_2$. This indicates the potential for interaction of $z_1$ and $z_2$.

To make this notion a bit more formal, we may wish to compute specific ‘interaction’ parameters. For example, we may which to compare the single-predictor health risks when all of the other predictors in Z are fixed to their 75th percentile to when all of the other predictors in Z are fixed to their 25th percentile. In the previous plot, this corresponds to substracting the estimate represented by the red circle from the estimate represented by the blue circle. This can be done using the function SingVarIntSummaries.

risks.int <- SingVarIntSummaries(fit = fitkm, y = y, Z = Z, X = X, 
                                 qs.diff = c(0.25, 0.75), 
                                 qs.fixed = c(0.25, 0.75),
                                 method = "exact")
risks.int

## # A tibble: 4 × 3
##   variable          est         sd
##     <fctr>        <dbl>      <dbl>
## 1       z1  0.695186934 0.31647915
## 2       z2  0.656743138 0.31785138
## 3       z3 -0.006077436 0.09010269
## 4       z4  0.050956367 0.16267033

We see that single pollutant risk associated with a change in $z_1$ from its 25th to 75th percentile increases by 0.7 when $z_2$ to $z_4$ are fixed at their 25th percentile, as compared to when $z_2$ to $z_4$ are fixed at their 75th percentile.

Simple example with hierarchical variable selection

In the example above, setting the argument varsel = TRUE in the kmbayes function led the function to conduct variable selection allowing each of the individual exposures ($z_{m}$) to enter into the model. Doing this allows you to estimate the posterior inclusion probability for each exposure.

A situation where this approach may not work well is when two or more of the exposures are highly correlated with each other. Let’s next consider an example where two exposures are very correlated:

set.seed(111)
d2 <- SimData(n = 100, M = 4, Zgen = "corr", sigsq.true = 2.2)
round(cor(d2$Z), 2)

##       z1    z2    z3    z4
## z1  1.00  0.25  0.96 -0.06
## z2  0.25  1.00  0.42 -0.06
## z3  0.96  0.42  1.00 -0.04
## z4 -0.06 -0.06 -0.04  1.00

In this situation, exposures $z_1$ and $z_3$ are highly correlated. An alternative to the component-wise variable selection is hierarchical variable selection, where you categorize the exposure variables into non-overlapping groups. There are then two levels of variable selection. In the first level, variable selection is done at the group level. At the second level, for those groups that are selected into the model, variable selection is done on the exposures within the group. The groups may be selected by using prior knowledge on the structure of how the variables are related. For example, exposures representing components of the air pollution mixture may be grouped using information on pollutant sources. Another possibility for defining exposure groups is based on the empirical correlation structure. In our simulated dataset, we will use this latter approach to group the highly correlated exposures $z_1$ and $z_3$ into one group (call it Group 1). We will keep each of the other exposures in single-variable groups by themselves (e.g., $z_2$ will be Group 2, and $z_4$ will be Group 3). To implement this in the kmbayes function, add in the argument groups = c(1,2,1,3), where the Group number of the exposure variables is listed in the order that the variables appear as collumns in the Z matrix argument. If the groups argument is not specified than the default component-wise variable selection when varsel = TRUE.

Here we apply both the original component-wise variable selection and the hierarchical variable selection.

set.seed(111)
fitkm_corr <- kmbayes(y = d2$y, Z = d2$Z, X = d2$X, iter = 10000, varsel = TRUE, verbose = FALSE)
fitkm_hier <- kmbayes(y = d2$y, Z = d2$Z, X = d2$X, iter = 10000, varsel = TRUE, 
                      groups = c(1,2,1,3), verbose = FALSE)

Lets compare the posterior inclusion probabilities (PIPs) estimated from the two approaches.

ExtractPIPs(fitkm_corr)

##   variable    PIP
## 1       z1 0.8076
## 2       z2 1.0000
## 3       z3 0.6702
## 4       z4 0.4252

ExtractPIPs(fitkm_hier)

##   variable group groupPIP   condPIP
## 1       z1     1   0.9976 0.6046512
## 2       z2     2   1.0000 1.0000000
## 3       z3     1   0.9976 0.3953488
## 4       z4     3   0.3896 1.0000000

For the component-wise variable selection fit (fitkm_corr), the PIPs clearly indicate that $z_2$ should be included in the model, but there is not strong evidence that $z_1$ should be included. On the other hand, for the hierarchical variable selection fit (fitkm_hier), we estimated a group-specific PIP for Group 1 of 1, providing evidence that one of the pollutants of Group 1 should be included. Of the exposures $z_1$ and $z_3$ within Group 1, the results suggest that $z_1$ is more likely to be associated with the outcome. Based on these results, we might wrongly conclude from the component-wise variable selection approach that only $z_2$, whereas the hierarchical variable selection approach would lead us to correctly identify both $z_1$ and $z_2$ as predictors of the outcome.

As in the example above, we can generate visual displays of the exposure-response function and numerical summaries of this surface that are of scientific interest.

Changing the tuning parameters for fitting the algorithm

The BKMR model is fit using Markov chain Monte Carlo. Most of the parameters are updated using Gibbs steps, except the $r_m$ and $\lambda$ parameters, which are updated using Metropolis-Hastings (M-H) steps. These M-H steps are implemented as a random walk proposal distribution centered about the current value. Because the $r_m$ and $\lambda$ parameters are $\geq 0$, we use a gamma propoasal distribution with mean equal to the current value. The standard devation (SD) of the proposal distribution is a tuning parameter that can be specified by the user in order to get good acceptance rates. Acceptance rates that are too high lead to slow mixing of the chain, while acceptance rates that are too low prevent the algorithm from fully exploring the parameter space. Thus, good acceptance rates will enable the algorithm to converge more quickly to the target distribution. Generally, increasing the SD of the proposal distribution leads to lower acceptance rates and decreasing the SD leads to higher acceptance rates. Acceptance rates can be monitored as the algorithm is running by changing the verbose argument of the main kmbayes function.

To change the values of the tuning parameters, you can specify the control.params argument, which is a list with named components.

data.frame(fitkm$control.params)

##   lambda.jump mu.lambda sigma.lambda a.p0 b.p0 r.prior a.sigsq b.sigsq
## 1          10        10           10    1    1 invunif   0.001   0.001
##   mu.r sigma.r r.muprop r.jump r.jump1 r.jump2 r.a r.b
## 1    5       5        1    0.1       2     0.1   0 100

Some of these components are tuning parameters; others are hyperparameters for the prior distributions (see next section).

The tuning parameter lambda.jump is the standard deviation of the proposal distribution for $\lambda = \tau/\sigma^2$, where $\tau$ is a variance component in the kernel matrix that controls the overall smoothness of the exposure-response function; the tuning parameter r.jump is the standard deviation of the proposal distribution for the $r_m$ parameters when there is no variable selection (varsel = FALSE).

When there is variable selection (either component-wise or hierarchical), there are two different types of proposal distributions, corresponding to the two different moves of the M-H algorithm. Move 1 occurs when a variable, or a variable group under hierarchical variable selection, goes from being in the model ($r_m > 0$) to not being included in the model ($r_m = 0$) or vice versa. The tuning parameter r.jump1 is the standard deviation of the gamma proposal distribution under move 1, where variable $m$ goes from not being in the model to being included in the model, and the tuning parameter r.muprop corresponds to the mean of the gamma proposal distribution. Move 2 is the refining step where a variable that is included in the model gets updated. The tuning parameter r.jump2 is the standard deviation of the random walk proposal distribution under move 2.

The following table summarizes these parameters.

Control parameter	Model parameter	When used	Description	Note
`lambda.jump`	$\lambda$	For all models	Specifies the standard deviation (SD) of the proposal distribution for $\lambda = \tau/\sigma^2$, where $\tau$ is a variance component in the kernel matrix that controls the overall smoothness of the exposure-response function.	When the model includes a random intercept, then $\lambda$ is a vector with 2 components. In this case, we need `lambda.jump` to be a vector with 2 components
`r.jump`	$r_m$	When `varsel = FALSE`	Specifies the SD of the proposal distribution for `r_m`
`r.jump1`	$r_m$	When `varsel = TRUE`	Specifies the SD of the proposal distribution for `r_m` under one of the moves of the M-H algorithm, where the exposure variable (or group of exposure variables under hierarchical variable selection) goes from not being in the model to being in the model (“switching step”)
`r.jump2`	$r_m$	When `varsel = TRUE`	Specifies SD of proposal distribution for `r_m` under one of the moves of the M-H algorithm, where a new value of a pollutant that stays in the model is proposed (“refinement step”)
`r.muprop`	$r_m$	When `varsel = TRUE`	Specifies mean of proposal distribution for `r_m` under one of the moves of the M-H algorithm, where the pollutant goes from not being in the model to being in the model

Changing the prior distributions

The BKMR fit depends on the specification of prior distributions. When we ran the kmbayes function to fit the models for the two simulated examples, we didn’t change any of the default settings. In this case the prior specifications were used.

The prior distributions can be specified as part of the control.params argument.

data.frame(fitkm$control.params)

##   lambda.jump mu.lambda sigma.lambda a.p0 b.p0 r.prior a.sigsq b.sigsq
## 1          10        10           10    1    1 invunif   0.001   0.001
##   mu.r sigma.r r.muprop r.jump r.jump1 r.jump2 r.a r.b
## 1    5       5        1    0.1       2     0.1   0 100

Some of these components are hyperparameters for the prior distributions; others are tuning parameters for running the algorithm (see previous section).

The following table summarizes the choices for specifying the prior distributions.

Control parameter	Model parameter	When used	Description	Note
a.sigsq, b.sigsq	$\sigma^2$	When `family = gaussian`	Shape and rate for gamma prior
mu.lambda, sigma.lambda	$\lambda$		Mean and SD for gamma prior	When the model includes a random intercept, then lambda is a vector with 2 components. In this case, we need both mu.lambda and sigma.lambda to be vectors with 2 components each
r.prior	$r_m$	For all models	Specifies which family of prior distributions to use for $r_m$ parameters	Current options are “gamma”, “invunif”, and “unif”
mu.r, sigma.r	$r_m$	When `control.params$r.prior = "gamma"`	Mean and SD for gamma prior
r.a, r.b	$r_m$	When `control.params$r.prior = "unif"` or `"invunif"`	Lower and upper bound for uniform prior on $r_m$ (when `r.prior = "unif"`) or for $1/r_m$ (when `r.prior = "invunif"`)
a.p0, b.p0	$\pi$	When `varsel = TRUE`	Shape parameters for beta prior

Because the model fit can be particularly sensitive to the choice of the prior distribution on the $r_m$ parameters, we have implemented three different families of prior distributions that the user can choose. We recommend to check the senstivity of the results to the choice of the prior distribution. Details for how to do this are in the next section.

Prior distributions for the $r_m$ parameters

The $r_m$ parameters control how smooth the exposure-response function is as a function of the variable $z_m$. To investigate how these parameters affect the model fit, we can use the InvestigatePrior function.

priorfits <- InvestigatePrior(y = y, Z = Z, X = X, 
                              q.seq = c(2, 1/2, 1/4, 1/16))

This function fits separate, univariate KMR models (with the Gaussian kernel) to each of the exposure variables $z_m$, fixing the $r_m$ parameters to different values. You can specify values of $r_m$ directly using the r.seq argument. Alternatively, you can specify a sequence of $q$ values using the q.seq argument, which correspond to specific $r_m$ values. This is easiest to see from an example.

We can use the PlotPriorFits function to visualize the role of the $r_m$ parameters.

PlotPriorFits(y = y, Z = Z, X = X, 
              fits = priorfits)

The top row of the plot shows the correlation between two individuals risk $h_i$ as a function of their “distance” in exposure to component $z_m$, which is given by $\mbox{cor}(h_{i},h_{j}) \propto\exp\left\{-r\cdot d_{ij}^{2}\right\}$, for different values of $r$, where $d_{ij} = |z_{i} - z_{j}|$. Rows 2–5 show, for each exposure variable $z_1,\ldots,z_4$ in our first simulated example, the estimated $h$ from the univariate KMR model across different values of $r_m$. These values of $r_m$ are calculated from the q.seq argument, and correspond to a decay in the correlation by 50% over different fractions $q$ of the range of the exposure data $z_m$.

We can see that the large values of $r_m$ correspond to an estimated expsoure response that is likely overfit, whereas the smallest values of $r_m$ may be oversmoothing the data. Using these functions can help investigators specify prior distributions that are informed by prior knowledge on how smooth the exposure-response function is expected to be. They can also inform which prior distributions to select for conducting sensitivity analyses that investigate the impact of the prior specification on the results.

References

Bobb, JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA. Bayesian Kernel Machine Regression for Estimating the Health Effects of Multi-Pollutant Mixtures. Biostatistics 16, no. 3 (July 1, 2015): 493–508. doi:10.1093/biostatistics/kxu058

Introduction to Bayesian kernel machine regression and the `bkmr` R package

Jennifer F. Bobb

2017-03-24