Title: | Genomic Mediation Analysis with Adaptive Confounding Adjustment |
---|---|
Description: | Performs genomic mediation analysis with adaptive confounding adjustment (GMAC) proposed by Yang et al. (2017) <doi:10.1101/078683>. It implements large scale mediation analysis and adaptively selects potential confounding variables to adjust for each mediation test from a pool of candidate confounders. The package is tailored for but not limited to genomic mediation analysis (e.g., cis-gene mediating trans-gene regulation pattern where an eQTL, its cis-linking gene transcript, and its trans-gene transcript play the roles as treatment, mediator and the outcome, respectively), restricting to scenarios with the presence of cis-association (i.e., treatment-mediator association) and random eQTL (i.e., treatment). |
Authors: | Fan Yang, Jiebiao Wang and Lin Chen |
Maintainer: | Jiebiao Wang <[email protected]> |
License: | GPL |
Version: | 3.1 |
Built: | 2025-01-30 04:36:36 UTC |
Source: | https://github.com/cran/GMAC |
This simulated data list is for demonstration.
A list containing
known.conf |
The known confounders matrix which is adjusted in all mediation tests. Each row is a confounder, each column is a sample. |
cov.pool |
The pool of candidate confounding variables from which potential confounders are adaptively selected to adjust for each mediation test. Each row is a covariate, each column is a sample. |
exp.dat |
The gene expression matrix. Each row is for one gene, each column is a sample. |
snp.dat.cis |
The cis-eQTL genotype matrix. Each row is an eQTL, each column is a sample. |
trios.idx |
The matrix of selected trios indexes (row numbers) for mediation tests. Each row consists of the index (i.e., row number) of the eQTLs in |
data(example)
data(example)
The gmac function performs genomic mediation analysis with adaptive confounding adjustment. It tests for mediation effects for a set of user specified mediation trios (e.g., eQTL, cis- and trans-genes) in the genome with the assumption of the presence of cis-association. The gmac function considers either a user provided pool of potential confounding variables, real or constructed by other methods, or all the PCs based on the expression data as the potential confounder pool. It returns the mediation p-values and the proportions mediated (e.g., the percentage of reduction in trans-effects after accounting for cis-mediation), based on the mediation tests i) adjusting for known confounders only, and ii) adjusting for known confounders and adaptively selected potential confounders for each mediation trio. It also provides plots of mediation p-values (in the negative of log base of 10) versus the proportions mediated based on the above two adjustments.
gmac( cl = NULL, known.conf, cov.pool = NULL, exp.dat, snp.dat.cis, trios.idx, nperm = 10000, fdr = 0.05, fdr_filter = 0.1, nominal.p = FALSE )
gmac( cl = NULL, known.conf, cov.pool = NULL, exp.dat, snp.dat.cis, trios.idx, nperm = 10000, fdr = 0.05, fdr_filter = 0.1, nominal.p = FALSE )
cl |
Parallel backend if it is set up. It is used for parallel computing. |
known.conf |
A known confounders matrix which is adjusted in all mediation tests. Each row is a confounder, each column is a sample. |
cov.pool |
The pool of candidate confounding variables from which potential confounders are adaptively selected to adjust for each mediation test. Each row is a covariate, each column is a sample. |
exp.dat |
A gene expression matrix. Each row is for one gene, each column is a sample. |
snp.dat.cis |
The cis-eQTL genotype matrix. Each row is an eQTL, each column is a sample. |
trios.idx |
A matrix of selected trios indexes (row numbers) for mediation tests. Each row consists of the index (i.e., row number) of the eQTL in |
nperm |
The number of permutations for testing mediation. |
fdr |
The false discovery rate to select confounders. We set |
fdr_filter |
The false discovery rate to filter common child and intermediate variables. We set |
nominal.p |
An option to obtain the nominal p-value or permutation-based p-value, which is the default. |
In genomic studies, a large number of mediation tests are often performed, and it is challenging to adjust for unmeasured confounding effects for the cis- and trans-genes (i.e., mediator-outcome) relationship. The current function adaptively selects the variables to adjust for each mediation trio given a large pool of constructed or real potential confounding variables. The function allows the input of variables known to be potential cis- and trans-genes (mediator-outcome) confounders in all mediation tests (known.conf
), and the input of the pool of candidate confounders from which potential confounders for each mediation test will be adaptively selected (cov.pool
). When no pool is provided (cov.pool = NULL
), all the PCs based on expression data (exp.dat
) will be constructed as the potential confounder pool.
The algorithm assumes the presence of cis-association (treatment-mediator association), random eQTL (treatment) and the standard identification assumption in causal mediation literature that no effect of eQTL (treatment) that confounds the cis- and trans-genes (mediator-outcome) relationship. The algorithm will first filter out common child (Figure 1.B) and intermediate variables (Figure 1.C) from cov.pool
for each mediation trio at a pre-specified significance threshold of FDR (fdr_filter
) by utilizing their associations with the eQTL (treatment). Then, confounder (Figure 1.A) set for each mediation trio will be selected from the retained pool of candidate variables using a stratified FDR approach. Specifically, for each trio, the p-values of association for each candidate variable to the cis-gene (mediator) and trans-gene (outcome) pairs are obtained based on the F-test for testing the joint association to either the cis-gene (mediator) or the trans-gene (outcome). For each candidate variable, a pre-specified FDR (fdr
) threshold is applied to the p-values corresponding to the joint associations of this variable to all the potential mediation trios. Lastly, mediation is tested for each mediation trio. Adjusting for the adaptively selected confounder set, we calculate the mediation statistic as the Wald statistic for testing the indirect mediation effect based on the regression
where
,
,
and
are the eQTL genotype (treatment), the cis-gene expression level (mediator), the trans-gene expression level (outcome) and the selected set of potential confounding variables. P-values are calculated based on within-genotype group permutation on the cis-gene expression level which maintains the cis- and trans-associations while breaks the potential mediation effect from the cis- to the trans-gene transcript.
Figure 1. Graphical illustrations of (A) a potential mediation relationship among an eQTL , its cis-gene transcript
, and a trans-gene transcript
, with confounders
(i.e., variables affecting both
and
), allowing
to affect
via a pathway independent of
. For the mediation effect tests to have a causal interpretation, adjustment must be made for the confounders. (B) A potential mediation trio with common child variables,
(i.e., variables affected by both
and
). Adjusting for common child variables in mediation analysis would “marry"
and
and make
appearing to be regulating
even if there is no such effect. (C) A potential mediation trio with intermediate variables
(i.e., variables affected by
and affecting
). Adjusting for intermediate variables in mediation analysis would prevent the detection of the true mediation effect from
to
.
The algorithm returns the mediation p-values (pvals
) and the proportions mediated (beta.change
, i.e., the percentage of reduction in trans-effects after accounting for cis-mediation), based on the mediation tests i) adjusting for known confounders only, and ii) adjusting for known confounders and adaptively selected potential confounders for each mediation trio. It also returns indicator matrix for the selected potential confounders (sel.conf.ind
). Plots of mediation p-values (in the negative of log base of 10) versus the proportions mediated based on the adjustments i) and ii) are provided. The plot could further be used as a diagnostic check for sufficiency in confounding adjustment in scenarios such as cis-gene mediating trans-gene regulation pattern, where we expect the trios with very significant mediation p-values to have positive proportions mediated. Therefore, a J shape pattern is expected when most if not all confounding effects have been well adjusted, whereas a U shape pattern may indicate the presence of unadjusted confounders.
The algorithm will return a list of p-values, beta changes, and indicator matrix for confounders selected.
pvals |
The mediation p-values. A matrix with dimension of the number of trios by two ("Adjust Known Covariates Only", "Adjust Known + Selected Covariates"). |
beta.change |
The proportions mediated. A matrix with dimension of the number of trios by two ("Adjust Known Covariates Only", "Adjust Known + Selected Covariates"). |
sel.conf.ind |
An indicator matrix with dimension of the number of trios by the number of covariates in |
pc.matrix |
PCs will be returned if the PCs based on expression data are used as the pool of potential confounders. Each column is a PC. |
Fan Yang, Jiebiao Wang, the GTEx consortium, Brandon L. Pierce, and Lin S. Chen. (2017) Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis. Genome Research. Volume 27, pp. 1859-1871. doi:10.1101/078683
John D. Storey with contributions from Andrew J. Bass, Alan Dabney and David Robinson (2015). qvalue: Q-value estimation for false discovery rate control. R package version 2.8.0. doi:10.18129/B9.bioc.qvalue
data(example) # a fast example with only 50 permutations output <- gmac(known.conf = dat$known.conf, cov.pool = dat$cov.pool, exp.dat = dat$exp.dat, snp.dat.cis = dat$snp.dat.cis, trios.idx = dat$trios.idx[1:40, ], nperm = 50, nominal.p = TRUE) plot(output) ## Not run: ## the construction of PCs as cov.pool pc <- prcomp(t(dat$exp.dat), scale = T) cov.pool <- t(pc$x) ## generate a cluster with 2 nodes for parallel computing cl <- makeCluster(2) output <- gmac(cl = cl, known.conf = dat$known.conf, cov.pool = cov.pool, exp.dat = dat$exp.dat, snp.dat.cis = dat$snp.dat.cis, trios.idx = dat$trios.idx, nominal.p = TRUE) stopCluster(cl) ## End(Not run)
data(example) # a fast example with only 50 permutations output <- gmac(known.conf = dat$known.conf, cov.pool = dat$cov.pool, exp.dat = dat$exp.dat, snp.dat.cis = dat$snp.dat.cis, trios.idx = dat$trios.idx[1:40, ], nperm = 50, nominal.p = TRUE) plot(output) ## Not run: ## the construction of PCs as cov.pool pc <- prcomp(t(dat$exp.dat), scale = T) cov.pool <- t(pc$x) ## generate a cluster with 2 nodes for parallel computing cl <- makeCluster(2) output <- gmac(cl = cl, known.conf = dat$known.conf, cov.pool = cov.pool, exp.dat = dat$exp.dat, snp.dat.cis = dat$snp.dat.cis, trios.idx = dat$trios.idx, nominal.p = TRUE) stopCluster(cl) ## End(Not run)