This repository contains a methodology that adapts the StructLMM framework to study Gene-Gene (GxG) interactions instead of Gene-Environment (GxE) interactions. Our approach leverages local ancestry Principal Components (PCs) as a proxy for the "environment" in the StructLMM model. Slides
We extend the StructLMM framework by Moore et al. (2018) to detect GxG interactions using the following inputs:
- A query SNV that has a large effect size for a phenotype
- A query region of interest to test for interactions (can be cis or trans)
- The phenotype
The key innovation is using local ancestry Principal Components as the "environment" matrix in the StructLMM model. This allows us to capture interaction effects between the query SNV and genetic variants in the region of interest.
The adapted model is structured as:
where:
- y is the phenotype vector
- M contains covariates
- g is the query SNV with large effect size
- E is a matrix representing local ancestry PCs from the query region
-
$\rho \in [0, 1]$ dictates the relevance of the interaction effect
We are assuming that the genotypes are in PLINK format split by chromosome and reside in a directory here called "testPlink", click to download the example files (97 files in total), and make sure it's in the testPlink folder
We extract Principal Components from a specified genomic region to capture local ancestry patterns:
./dataPrepScripts/getPCfromPlinkDirectory.sh testPlink/ chr6:29944513-29945558 output/PCoutput
The script:
- Extracts variants from the region of interest
- Performs quality control
- Calculates PCs that explain up to 90% of variance
- Outputs PC coordinates and variance explained
Extract the single variant of interest:
./dataPrepScripts/getSingleVariantFromPlinkDirectory.sh testPlink/ chr6:32529369:C:A output/SNVoutput.txt
The query variant can be from any source and can even be an aggregate score over multiple variants.
We then apply the StructLMM framework using:
- The query SNV as the genetic variant
- Local ancestry PCs as the "environment"
- The phenotype of interest
- PLINK 2.0
- Python 3.6+
- numpy
- pandas
- scipy
- StructLMM
Setting up StructLMM Environment via Conda
To install StructLMM and all required dependencies in a clean environment, follow these steps:
# Create a conda environment
conda create --name structlmm_env
# Activate the environment
conda activate structlmm_env
# Install necessary packages
conda install -c conda-forge -c bioconda liknorm-py=1.2.6 glimix-core chi2comb -y
# Install StructLMM
pip install struct-lmm
# Extract PCs from region of interest
./dataPrepScripts/getPCfromPlinkDirectory.sh testPlink/ chr6:29944513-29945558 output/PCoutput
# Extract the variant of interest
./dataPrepScripts/getSingleVariantFromPlinkDirectory.sh testPlink/ chr6:32529369:C:A output/SNVoutput.txt
python gxg-structlmm-script.py \
--pcs output/PCoutput_local_ancestry_pcs.csv \
--snv output/SNVoutput.txt \
--phenotype testPlink/synthetic_small_v1.pheno1 \
--phenotype-column "Phenotype(binary)" \
--output results.csv
This methodology builds upon the StructLMM framework. The original StructLMM implementation is available at: https://github.com/limix/struct-lmm
If you use this method in your research, please cite:
Original StructLMM paper:
Moore, R., Casale, F. P., Bonder, M. J., Horta, D., Franke, L., Barroso, I., & Stegle, O. (2018). A linear mixed-model approach to study multivariate gene–environment interactions. Nature Genetics, 50(7), 1167-1174.