This instruction outlines the mean effect, variance effect test of POE(Parent-of-Origin Effect) on FTO gene variation on obesity-associated traits.

This instruction requires a data frame, a trait of interest, and a specific SNP with its corresponding minor allele as inputs. It generates a table displaying mean and variance test results.

Required Packages

suppressPackageStartupMessages({
library(tidyverse)
library(dplyr)
library(stringr)
library(car)
library(tidyr)
library(rlang)
})

Before proceeding, ensure the data frame is accessible. Our original dataset contains each row as the genotype for a single SNP for one individual, along with their phenotype, resulting 1,752,394 observations from 4,403 individuals and 398 SNPs. We convert this data from a long to a wide format for better demonstration. The columns in the wide table include: Indiv_ID, SNP_1_Minor_Allele, SNP_2_Minor_Allele, SNP_3_Minor_Allele, …, SEX, AGE, DIABETES, and other traits of interest.

SEX, AGE, and DIABETES are covariates used in the mean effect analysis. We’ll demonstrate the trait of interest using Log_BMI, but feel free to add more traits as needed.

The name of each SNP_Minor_Allele column identifies a specific SNP using its rs ID followed by the minor allele. You might need to refer to cohort genotype data to determine these information. The value in these columns indicates the inheritance origin of the minor allele, which follows four possible categories: maternal, paternal, both, or none.

For example, in the Framingham Heart Study cohort, individual ID 9995 has the minor allele “C” at SNP rs8058618 inherited from the mother, while the minor allele “G” at SNP rs4389136 is inherited from the father (The first row of the data overview below). This inheritance pattern can be determined by comparing the individual’s genotype with the parental genotypes.

Data Overview

An overview of a example data is as follows:

mQTL, vQTL Function

The following function performs mean and variance tests for a specified SNP and trait of interest. It requires three parameters: df, which is your dataset; SNP_minor_allele, representing the variant with its rs ID and the corresponding minor allele; and trait, which is the trait of interest.

It consists of two sections:

Mean Effect: To investigate how the maternally or paternally inherited minor allele affects trait means, we perform an mQTL POE contrast between two heterozygous genotype (Maternal vs Paternal) groups using a linear regression model, adjusting for age, sex, and diabetes. This analysis will return the mean effect p-value and the effect size, which is mean difference of the trait of interest from maternal minus paternal inheritance, adjusting for the covariates. If the difference is greater than 0, it indicates a stronger maternal effect, while a negative value indicates a stronger paternal effect.

Variance Effect: To evaluate how these inherited minor alleles influence trait dispersion, we apply Levene’s test to assess variance equality across the two heterozygous genotypes (Maternal vs. Paternal). The analysis returns the variance effect p-value and the effect size, which is the standard deviation ratio of the trait of interest from maternal divided by paternal inheritance. If the ratio is greater than 1, it indicates greater variability in the maternal inheritance group, while a ratio less than 1 indicates greater variability in the paternal inheritance group.

##################################################################
### Create a function to perform mQTL vQTL and Bimodality Test
##################################################################
mQTL_vQTL_test <- function(df, SNP_minor_allele, trait) {
  
  # Convert string column names to symbols
  SNP_minor_allele_sym <- rlang::sym(SNP_minor_allele)
  trait_sym <- rlang::sym(trait)
  
  # Filter the dataframe
  df_filtered <- df %>%
    dplyr::select(!!SNP_minor_allele_sym, SEX, AGE, !!trait_sym, DIABETES) ## !!SNP_minor_allele_sym evaluate the symbols stored in SNP_minor_allele_sym and injects it as a column name
  
  ###################################
  ## Mean Effect ##
  ###################################
  # Set the reference level as "Paternal" to ensure "Maternal - Paternal" in the mean effect analysis
  df_filtered[[SNP_minor_allele]] <- relevel(factor(df_filtered[[SNP_minor_allele]]), ref = "Paternal") 
  
  # Create formula dynamically
  formula_str <- paste0(trait, " ~ AGE + SEX + DIABETES + ", SNP_minor_allele)
  lm_model <- lm(as.formula(formula_str), data = df_filtered)
  
  # Extract p-value and effect
  mQTL_p <- round(summary(lm_model)$coefficients[paste0(SNP_minor_allele, "Maternal"), "Pr(>|t|)"], 3)
  mQTL_Eff <- round(lm_model$coefficients[paste0(SNP_minor_allele, "Maternal")], 3)
  
  #################################
  ## Variance Effect ##
  #################################
  # Filter for heterozygous cases
  df_filtered_heter <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) %in% c("Maternal", "Paternal"))
  
  # Extract data for different genotypes
  data_ma <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "Maternal") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  data_pa <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "Paternal") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  data_both <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "Both") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  data_none <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "None") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  # Calculate variance QTL statistics
  vQTL_formula <- as.formula(paste0(trait, " ~ ", SNP_minor_allele))
  vQTL_p <- round(car::leveneTest(vQTL_formula, data = df_filtered_heter)$`Pr(>F)`[1], 3)
  vQTL_Eff <- round(sd(data_ma)/sd(data_pa), 3)
  

  
  # Collect results
  results <- data.frame(
    SNP_Minor_Allele = SNP_minor_allele,
    ma_pa_mQTL_p = mQTL_p,
    ma_pa_mQTL_Eff = mQTL_Eff,
    ma_pa_vQTL_p = vQTL_p,
    ma_pa_vQTL_Eff = vQTL_Eff
  )
  
  # Transpose results and set column names
  results <- as.data.frame(t(results))
  colnames(results)[1] <- results[1,1]
  rownames(results)[1] <- "SNP_Minor_Allele"
  
  return(results)
}

Call the mQTL_vQTL_test function

The code below demonstrates how to call the mQTL_vQTL_test function using a SNP and Log_BMI as examples. A comprehensive result will be generated.

rs115140649_G (This SNP shows mQTL and vQTL from Log_BMI in Framingham cohort))

mQTL_vQTL_mixR_result = mQTL_vQTL_test(geno_pheno_snp_df_wide, "rs115140649_G", "Log_BMI")
##                  rs115140649_G
## SNP_Minor_Allele rs115140649_G
## ma_pa_mQTL_p             0.026
## ma_pa_mQTL_Eff           0.066
## ma_pa_vQTL_p             0.038
## ma_pa_vQTL_Eff           1.347