This instruction outlines the mean effect, variance effect and Bimodality test of POE(Parent-of-Origin Effect) on FTO gene variation on obesity-associated traits.

This instruction requires a data frame, a trait of interest, and a specific SNP with its corresponding minor allele as inputs. It generates a table displaying mean, variance, and bimodality test results, while also producing a violin-histogram plot for visual analysis.

Required Packages

suppressPackageStartupMessages({
library(tidyverse)
library(dplyr)
library(stringr)
library(mixR)
library(car)
library(patchwork)
library(ggplot2)
library(tidyr)
library(gridExtra)
library(grid)
library(rlang)
})

Before proceeding, ensure the data frame is accessible. Our original dataset contains each row as the genotype for a single SNP for one individual, along with their phenotype, resulting 1,752,394 observations from 4,403 individuals and 398 SNPs. We convert this data from a long to a wide format for better demonstration. The columns in the wide table include: Indiv_ID, SNP_1_Minor_Allele, SNP_2_Minor_Allele, SNP_3_Minor_Allele, …, SEX, AGE, DIABETES, and other traits of interest.

SEX, AGE, and DIABETES are covariates used in the mean effect analysis. We’ll demonstrate the trait of interest using Log_BMI, but feel free to add more traits as needed.

The name of each SNP_Minor_Allele column identifies a specific SNP using its rs ID followed by the minor allele. You might need to refer to cohort genotype data to determine these information. The value in these columns indicates the inheritance origin of the minor allele, which follows four possible categories: maternal, paternal, both, or none.

For example, in the Framingham Heart Study cohort, individual ID 9995 has the minor allele “C” at SNP rs8058618 inherited from the mother, while the minor allele “G” at SNP rs4389136 is inherited from the father (The first row of the data overview below). This inheritance pattern can be determined by comparing the individual’s genotype with the parental genotypes.

Data Overview

An overview of a example data is as follows:

mQTL, vQTL, Bimodality Function

The following function performs mean, variance, and bimodality tests for a specified SNP and trait of interest. It requires three parameters: df, which is your dataset; SNP_minor_allele, representing the variant with its rs ID and the corresponding minor allele; and trait, which is the trait of interest.

It consists of three sections:

Mean Effect: To investigate how the maternally or paternally inherited minor allele affects trait means, we perform an mQTL POE contrast between two heterozygous genotype (Maternal vs Paternal) groups using a linear regression model, adjusting for age, sex, and diabetes. This analysis will return the mean effect p-value and the effect size, which is mean difference of the trait of interest from maternal minus paternal inheritance, adjusting for the covariates. If the difference is greater than 0, it indicates a stronger maternal effect, while a negative value indicates a stronger paternal effect.

Variance Effect: To evaluate how these inherited minor alleles influence trait dispersion, we apply Levene’s test to assess variance equality across the two heterozygous genotypes (Maternal vs. Paternal). The analysis returns the variance effect p-value and the effect size, which is the standard deviation ratio of the trait of interest from maternal divided by paternal inheritance. If the ratio is greater than 1, it indicates greater variability in the maternal inheritance group, while a ratio less than 1 indicates greater variability in the paternal inheritance group.

Bimodality Test: We use the mixR package to evaluate bimodality in the trait of interest across genotype groups from a specific SNP. This analysis generates the mean and standard deviation of both components, the proportion of the first component, and the p-value of the test for all four genotype groups respectively. Note that a minimum sample size of 30 is required for effective mixR fitting; results will return NA if this requirement is not met. We employ the default settings of the mixR methods (mixR::mixfit, mixR::bs.test). If these do not work for specific data sets, please refer to the mixR documentation for parameter tuning: https://cran.r-project.org/web/packages/mixR/mixR.pdf .

##################################################################
### Create a function to perform mQTL vQTL and Bimodality Test
##################################################################
mQTL_vQTL_Bimodality <- function(df, SNP_minor_allele, trait) {
  
  # Convert string column names to symbols
  SNP_minor_allele_sym <- rlang::sym(SNP_minor_allele)
  trait_sym <- rlang::sym(trait)
  
  # Filter the dataframe
  df_filtered <- df %>%
    dplyr::select(!!SNP_minor_allele_sym, SEX, AGE, !!trait_sym, DIABETES) ## !!SNP_minor_allele_sym evaluate the symbols stored in SNP_minor_allele_sym and injects it as a column name
  
  ###################################
  ## Mean Effect ##
  ###################################
  # Set the reference level as "Paternal" to ensure "Maternal - Paternal" in the mean effect analysis
  df_filtered[[SNP_minor_allele]] <- relevel(factor(df_filtered[[SNP_minor_allele]]), ref = "Paternal") 
  
  # Create formula dynamically
  formula_str <- paste0(trait, " ~ AGE + SEX + DIABETES + ", SNP_minor_allele)
  lm_model <- lm(as.formula(formula_str), data = df_filtered)
  
  # Extract p-value and effect
  mQTL_p <- round(summary(lm_model)$coefficients[paste0(SNP_minor_allele, "Maternal"), "Pr(>|t|)"], 3)
  mQTL_Eff <- round(lm_model$coefficients[paste0(SNP_minor_allele, "Maternal")], 3)
  
  #################################
  ## Variance Effect ##
  #################################
  # Filter for heterozygous cases
  df_filtered_heter <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) %in% c("Maternal", "Paternal"))
  
  # Extract data for different genotypes
  data_ma <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "Maternal") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  data_pa <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "Paternal") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  data_both <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "Both") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  data_none <- df_filtered %>%
    dplyr::filter(!!rlang::sym(SNP_minor_allele) == "None") %>%
    dplyr::select(!!trait_sym) %>%
    dplyr::pull() %>%
    na.omit()
  
  # Calculate variance QTL statistics
  vQTL_formula <- as.formula(paste0(trait, " ~ ", SNP_minor_allele))
  vQTL_p <- round(car::leveneTest(vQTL_formula, data = df_filtered_heter)$`Pr(>F)`[1], 3)
  vQTL_Eff <- round(sd(data_ma)/sd(data_pa), 3)
  
  ####################################################
  ## Bimodality Test ##
  ###################################################
  # Define mixR test function
  mixR_test <- function(x) {
    # Initialize results list with NAs
    results <- list(
      fit_mu1 = NA,
      fit_mu2 = NA,
      fit_sd1 = NA,
      fit_sd2 = NA,
      fit_prop1 = NA,
      fit_p_value = NA
    )
    
    # Check if length of x is greater than 30
    if (length(x) <= 30) {
      warning("mixR warning: the sample size should be greater than 30.")
      return(results)
    }
    
    # Try to fit the model and perform the test
    tryCatch({
      bimo_fit <- mixR::mixfit(x, ncomp = 2)
      bimo_test <- mixR::bs.test(x, ncomp = c(1, 2))
      
      # Update results with values from the fit
      results$fit_mu1 <- round(bimo_fit$mu[1], 2)
      results$fit_mu2 <- round(bimo_fit$mu[2], 2)
      results$fit_sd1 <- round(bimo_fit$sd[1], 2)
      results$fit_sd2 <- round(bimo_fit$sd[2], 2)
      results$fit_prop1 <- round(bimo_fit$pi[1], 2)
      results$fit_p_value <- bimo_test$pvalue
      
      return(results)
    }, error = function(e) {
      # Return the list with NAs if there's an error
      message("Error in mixR test: ", e$message)
      return(results)
    })
  }
  
  # Apply mixR test to each genotype
  ma <- mixR_test(data_ma)
  pa <- mixR_test(data_pa)
  both <- mixR_test(data_both)
  none <- mixR_test(data_none)
  
  # Collect results
  results <- data.frame(
    SNP_Minor_Allele = SNP_minor_allele,
    ma_pa_mQTL_p = mQTL_p,
    ma_pa_mQTL_Eff = mQTL_Eff,
    ma_pa_vQTL_p = vQTL_p,
    ma_pa_vQTL_Eff = vQTL_Eff,
    mixR_ma_mu1_mu2 = paste0(ma$fit_mu1, "_", ma$fit_mu2),
    mixR_ma_sd1_sd2 = paste0(ma$fit_sd1, "_", ma$fit_sd2),
    mixR_ma_prop1 = ma$fit_prop1,
    mixR_ma_p = ma$fit_p_value,
    mixR_pa_mu1_mu2 = paste0(pa$fit_mu1, "_", pa$fit_mu2),
    mixR_pa_sd1_sd2 = paste0(pa$fit_sd1, "_", pa$fit_sd2),
    mixR_pa_prop1 = pa$fit_prop1,
    mixR_pa_p = pa$fit_p_value,
    mixR_both_mu1_mu2 = paste0(both$fit_mu1, "_", both$fit_mu2),
    mixR_both_sd1_sd2 = paste0(both$fit_sd1, "_", both$fit_sd2),
    mixR_both_prop1 = both$fit_prop1,
    mixR_both_p = both$fit_p_value,
    mixR_none_mu1_mu2 = paste0(none$fit_mu1, "_", none$fit_mu2),
    mixR_none_sd1_sd2 = paste0(none$fit_sd1, "_", none$fit_sd2),
    mixR_none_prop1 = none$fit_prop1,
    mixR_none_p = none$fit_p_value
  )
  
  # Transpose results and set column names
  results <- as.data.frame(t(results))
  colnames(results)[1] <- results[1,1]
  rownames(results)[1] <- "SNP_Minor_Allele"
  
  return(results)
}

Violin and Histogram Function

The following function displays the violin plot and histogram of the SNP and trait of interest across 6 genotype groups(“Maternal”, “Paternal”, “None_Maternal_Heter”, “None_Paternal_Heter”, “None_Both_Parents_Homo”, “None_Both_Parents_Heter”). Note if the count of any group is less than 5, it won’t appear in the plot. It requires 4 parameters: df, which is your dataset; SNP_minor_allele, representing the variant with its rs ID and the minor allele; trait, which is the trait of interest; trait_display_name is optional, when it’s not specified, the function will display the trait instead.

violin_histogram_combined <- function(df, snp_name, trait, trait_display_name = NULL) {
  
  # If trait_display_name is not provided, use trait
  if(is.null(trait_display_name)) {
    trait_display_name <- trait
  }
  
  df_filtered = df %>%
    dplyr::filter(SNP_Minor_Allele == snp_name, Minor_Allele_Inheritance_Interested_Subgroups != "Both") %>%
    dplyr::select(!!sym(trait), Minor_Allele_Inheritance_Interested_Subgroups) 
  
  df_filtered$Minor_Allele_Inheritance_Interested_Subgroups = factor(df_filtered$Minor_Allele_Inheritance_Interested_Subgroups,
                                                                     levels = c("Maternal", "Paternal", "None_Maternal_Heter", "None_Paternal_Heter", "None_Both_Parents_Homo", "None_Both_Parents_Heter"))
  
  # Store the original levels and their corresponding colors for consistent mapping
  # Assuming the order in genotype_number matches the expected groups
  all_inheritance_types <- c("Maternal", "Paternal", "None_Maternal_Heter", "None_Paternal_Heter", "None_Both_Parents_Homo", "None_Both_Parents_Heter")
  
  genotype_number = c(2,3,4,5,6,7)
  color_mapping <- setNames(genotype_number, all_inheritance_types)
  
  # Count observations in each Minor_Allele_Inheritance_Interested_Subgroups group
  group_counts <- table(df_filtered$Minor_Allele_Inheritance_Interested_Subgroups)
  
  # Filter out groups with fewer than 5 observations
  groups_to_keep <- names(group_counts[group_counts >= 5])
  
  #groups_to_keep = names(group_counts)
  
  # Filter the dataframe to only include groups with sufficient observations
  df_filtered <- df_filtered[df_filtered$Minor_Allele_Inheritance_Interested_Subgroups %in% groups_to_keep, ]
  
  # If no groups remain after filtering, return a blank plot with a message
  if(nrow(df_filtered) == 0) {
    empty_plot <- ggplot() + 
      annotate("text", x = 0.5, y = 0.5, label = "No groups with 5 or more observations") +
      theme_void()
    return(arrangeGrob(empty_plot))
  }
  
  # Ensure Minor_Allele_Inheritance_Interested_Subgroups is a factor with only the levels that have 5+ observations
  df_filtered$Minor_Allele_Inheritance_Interested_Subgroups <- factor(df_filtered$Minor_Allele_Inheritance_Interested_Subgroups)
  
  # Get the subset of color mapping for the groups that remain after filtering
  filtered_color_mapping <- color_mapping[groups_to_keep]
  
  # Create a named vector for secondary labels
  secondary_labels <- c(
    "Maternal" = "Offspring Ref|Alt",
    "Paternal" = "Offspring Alt|Ref",
    "Both" = "Offspring Alt|Alt",
    "None_Maternal_Heter" = "Offspring Ref|Ref Mother is Heter",
    "None_Paternal_Heter" = "Offspring Ref|Ref Father is Heter",
    "None_Both_Parents_Homo" = "Offspring Ref|Ref Parents Ref|Ref ",
    "None_Both_Parents_Heter" = "Offspring Ref|Ref Parents are Heter"
  )
  
  # Get secondary labels for groups that remain after filtering
  filtered_secondary_labels <- secondary_labels[groups_to_keep]
  
  # Calculate the maximum value for positioning the top labels
  y_max <- max(df_filtered[[trait]], na.rm = TRUE)
  y_min <- min(df_filtered[[trait]], na.rm = TRUE)
  y_range <- y_max - y_min
  
  # Violin Plot
  violin_plot <- ggplot(df_filtered, aes(x = Minor_Allele_Inheritance_Interested_Subgroups, y = .data[[trait]], 
                                         fill = Minor_Allele_Inheritance_Interested_Subgroups)) +
    geom_violin(trim = FALSE, alpha = 0.3) +
    geom_point(color = "darkgrey", position = position_jitter(width = 0.1, height = 0), 
               alpha = 0.5) +
    stat_summary(fun.data = "mean_sdl", geom = "pointrange", color = "black", na.rm = TRUE) +
    labs(title = paste(trait_display_name, " - ", snp_name), # Combined title with trait and SNP
         x = "",  # Remove x-axis label as we're using custom annotations
         y = paste(trait_display_name)) +
    scale_fill_manual(values = filtered_color_mapping) +
    scale_color_manual(values = filtered_color_mapping) +
    theme_minimal(base_size = 30) +
    theme(
      plot.title = element_text(size = 30),
      axis.title.x = element_text(size = 30), 
      axis.title.y = element_text(size = 30), 
      axis.text.x = element_text(size = 30),  
      axis.text.y = element_text(size = 30),   
      legend.title = element_text(size = 30), 
      legend.text = element_text(size = 30),
      # Add margin at the top of the plot for secondary labels
      plot.margin = margin(t = 40, r = 5, b = 5, l = 5, unit = "pt")
    )
  
  # Add secondary labels at the top of each violin
  x_positions <- 1:length(groups_to_keep)
  
  # Add text at the top for offspring labels
  violin_plot <- violin_plot + 
    annotate("text", 
             x = x_positions, 
             y = rep(y_max + y_range * 0.05, length(x_positions)),
             label = filtered_secondary_labels,
             size = 8)
  
  # Get the range of the trait column for density plot
  trait_range <- range(df_filtered[[trait]], na.rm = TRUE)
  trait_min <- floor(trait_range[1] * 100) / 100
  trait_max <- ceiling(trait_range[2] * 100) / 100
  step_size <- (trait_max - trait_min) / 4
  breaks <- seq(trait_min, trait_max, by = step_size)
  
  # Density Plot with histogram - explicitly move facet labels to the bottom
  # Create a modified theme that ensures strip position at bottom
  bottom_strip_theme <- theme_minimal(base_size = 30) +
    theme(
      plot.title = element_text(size = 30),
      axis.title.x = element_text(size = 30), 
      axis.title.y = element_text(size = 30), 
      axis.text.x = element_text(size = 30),  
      axis.text.y = element_text(size = 30),   
      legend.title = element_text(size = 30), 
      legend.text = element_text(size = 30),
      # Critical settings for bottom strips
      strip.placement = "outside",
      strip.text = element_text(size = 30),
      strip.background = element_rect(fill = "white", color = NA),
      panel.spacing = unit(1, "lines"),
      # Explicitly set strip position to bottom
      strip.position = "bottom"
    )
  
  density_plot <- ggplot(df_filtered, aes(x = .data[[trait]], fill = Minor_Allele_Inheritance_Interested_Subgroups, 
                                          color = Minor_Allele_Inheritance_Interested_Subgroups)) +
    geom_histogram(aes(y = after_stat(density)), position = "identity", 
                   alpha = 0.2, bins = 30) +
    geom_density(alpha = 0.3) +
    scale_fill_manual(values = filtered_color_mapping) +
    scale_color_manual(values = filtered_color_mapping) +
    scale_x_continuous(limits = c(trait_min, trait_max), 
                       breaks = breaks, 
                       labels = round(breaks, 1)) +
    # Use facet_grid instead of facet_wrap to have more control over strip placement
    facet_grid(cols = vars(Minor_Allele_Inheritance_Interested_Subgroups), 
               switch = "x") +  # This is key - switches the strip to the bottom
    ggtitle(paste(trait_display_name, " - ", snp_name)) + # Combined title
    ylab("Density") +
    xlab("") + # Remove x-axis label
    bottom_strip_theme
  
  # Remove legends from both plots
  violin_plot <- violin_plot + theme(legend.position = "none")
  density_plot <- density_plot + theme(legend.position = "none")
  
  # Use grid.arrange to display the plot immediately
  grid.arrange(violin_plot, density_plot, nrow = 2)
}

Call the mQTL_vQTL_Bimodality function

The code below demonstrates how to call the mQTL_vQTL_Bimodality function using a SNP and Log_BMI as examples. A comprehensive result will be generated.

rs115140649_G (This SNP shows mQTL and vQTL from Log_BMI in Framingham cohort))

mQTL_vQTL_mixR_result = mQTL_vQTL_Bimodality(geno_pheno_snp_df_wide, "rs115140649_G", "Log_BMI")
##                   rs115140649_G
## SNP_Minor_Allele  rs115140649_G
## ma_pa_mQTL_p              0.026
## ma_pa_mQTL_Eff            0.066
## ma_pa_vQTL_p              0.038
## ma_pa_vQTL_Eff            1.347
## mixR_ma_mu1_mu2        3.22_3.7
## mixR_ma_sd1_sd2       0.15_0.04
## mixR_ma_prop1              0.94
## mixR_ma_p                  0.02
## mixR_pa_mu1_mu2       3.13_3.24
## mixR_pa_sd1_sd2       0.14_0.12
## mixR_pa_prop1              0.44
## mixR_pa_p                  0.94
## mixR_both_mu1_mu2         NA_NA
## mixR_both_sd1_sd2         NA_NA
## mixR_both_prop1            <NA>
## mixR_both_p                <NA>
## mixR_none_mu1_mu2     3.19_3.39
## mixR_none_sd1_sd2     0.14_0.21
## mixR_none_prop1            0.69
## mixR_none_p                   0

The code below generates a violin plot and histogram of Log_BMI for a SNP of interest across different genotype groups. We’ll use the original long format data here; alternatively, you can use the code below to convert the wide data back to the long format. Note that we use “Minor_Allele_Inheritance_Interested_Subgroups” instead of “Minor_Allele_Inheritance”, because in addition to the four original groups (“Paternal”, “Maternal”, “None”, “Both”) from “Minor_Allele_Inheritance”, “Minor_Allele_Inheritance_Interested_Subgroups” includes 4 more subgroups within the offspring “None” category that are associated with parental genotypes (“None_Maternal_Heter”, “None_Paternal_Heter”, “None_Both_Parents_Homo”, “None_Both_Parents_Heter”).

geno_pheno_snp_df_long <- geno_pheno_snp_df_wide %>%
  pivot_longer(
    cols = -c(Indiv_ID, SEX, AGE, Log_BMI, DIABETES),
    names_to = "SNP_Minor_Allele",
    values_to = "Minor_Allele_Inheritance_Interested_Subgroups"
  )
violin_histogram_combined(df = geno_pheno_snp_df_long, 
                              snp_name = "rs115140649_G", 
                              trait = "Log_BMI", 
                              trait_display_name = "log(BMI)")

rs1421085_C (This SNP is considered linked to Adipocyte IRX3 Expression and BMI of Children)

##                   rs1421085_C
## SNP_Minor_Allele  rs1421085_C
## ma_pa_mQTL_p            0.045
## ma_pa_mQTL_Eff          0.015
## ma_pa_vQTL_p            0.374
## ma_pa_vQTL_Eff          0.997
## mixR_ma_mu1_mu2      3.2_3.45
## mixR_ma_sd1_sd2     0.15_0.21
## mixR_ma_prop1            0.79
## mixR_ma_p                   0
## mixR_pa_mu1_mu2     3.12_3.32
## mixR_pa_sd1_sd2     0.12_0.19
## mixR_pa_prop1            0.39
## mixR_pa_p                   0
## mixR_both_mu1_mu2   3.23_3.46
## mixR_both_sd1_sd2   0.15_0.22
## mixR_both_prop1          0.79
## mixR_both_p                 0
## mixR_none_mu1_mu2       NA_NA
## mixR_none_sd1_sd2       NA_NA
## mixR_none_prop1          <NA>
## mixR_none_p              <NA>
violin_histogram_combined(df = geno_pheno_snp_df_long, 
                          snp_name = "rs1421085_C", 
                          trait = "Log_BMI", 
                          trait_display_name = "log(BMI)")