Extra Credit - Take Home

Background and Data Preparation

Long chain 3-hydroxyacyl-CoA dehydrogenase deficiency (LCHADD) is a fatty acid oxidation disorder (FAOD) caused by a pathogenic variant, c.1528 G > C, in HADHA encoding the alpha subunit of trifunctional protein (TFPα). Individuals with LCHADD develop chorioretinopathy and peripheral neuropathy not observed in other FAODs in addition to the more ubiquitous symptoms of hypoketotic hypoglycemia, rhabdomyolysis and cardiomyopathy. We report a CRISPR/Cas9 generated knock-in murine model of G1528C in Hadha that recapitulates aspects of the human LCHADD phenotype. Gaston (2023) is investigating cardiomyopathy in terms of ejection fraction comparing LCHADD with WT among female and male mice. Note: Modifications are made to fit our midterm test purpose.

Reference: Gaston, G., Babcock, S., Ryals, R. et al. A G1528C Hadha knock-in mouse model recapitulates aspects of human clinical phenotypes for long-chain 3-hydroxyacyl-CoA dehydrogenase deficiency. Commun Biol 6, 890 (2023). https://doi.org/10.1038/s42003-023-05268-1

library(IntroStats)

ejection.frac.male.WT <- c(53.33, 61.08, 59.07, 58.68, 
                           50.27, 55.3, 37.84, 67.19, 
                           56.18, 57.95, 59.3, 61.75)

n1 = length(ejection.frac.male.WT)


ejection.frac.male.LCHADD <- c(52.01,   44.99,  42.6,   35.81,  
                               45.05,   52.43,  50.69,  48.11,  
                               50.24,   46.72,  52.9,   43.48)

n2 = length(ejection.frac.male.LCHADD)

ejection.frac.female.WT <- c(59.47, 66.39,  54.49,  56.83,  
                             59.06, 49.72,  49.57,  57.73,  
                             64.29, 54.26,  69.23,  58.51)

n3 = length(ejection.frac.female.WT)

ejection.frac.female.LCHADD <- c(47.02, 44.98,  48.60,  34.56,  
                                 52.64, 49.24,  48.23,  61.77,  
                                 57.64, 67.40,  40.66,  48.23)
n4 = length(ejection.frac.female.LCHADD)

ef <- c(ejection.frac.male.WT, 
        ejection.frac.male.LCHADD,
        ejection.frac.female.WT,
        ejection.frac.female.LCHADD)

sex <- c(rep("male", n1), rep("male", n2), rep("female", n3), rep("female", n4))
genotype <- c(rep("WT", n1), rep("LCHADD", n2), rep("WT", n3), rep("LCHADD", n4))
group <- c(rep(1, n1),rep(2, n2),rep(3, n3),rep(4, n4))

#Group 1: Male WT; 2: male LCHADD; 3: female WT; 4: Female LCHADD

dat <- data.frame(ef)
dat$sex=sex
dat$genotype = genotype
dat$group = factor(group)
dat

##       ef    sex genotype group
## 1  53.33   male       WT     1
## 2  61.08   male       WT     1
## 3  59.07   male       WT     1
## 4  58.68   male       WT     1
## 5  50.27   male       WT     1
## 6  55.30   male       WT     1
## 7  37.84   male       WT     1
## 8  67.19   male       WT     1
## 9  56.18   male       WT     1
## 10 57.95   male       WT     1
## 11 59.30   male       WT     1
## 12 61.75   male       WT     1
## 13 52.01   male   LCHADD     2
## 14 44.99   male   LCHADD     2
## 15 42.60   male   LCHADD     2
## 16 35.81   male   LCHADD     2
## 17 45.05   male   LCHADD     2
## 18 52.43   male   LCHADD     2
## 19 50.69   male   LCHADD     2
## 20 48.11   male   LCHADD     2
## 21 50.24   male   LCHADD     2
## 22 46.72   male   LCHADD     2
## 23 52.90   male   LCHADD     2
## 24 43.48   male   LCHADD     2
## 25 59.47 female       WT     3
## 26 66.39 female       WT     3
## 27 54.49 female       WT     3
## 28 56.83 female       WT     3
## 29 59.06 female       WT     3
## 30 49.72 female       WT     3
## 31 49.57 female       WT     3
## 32 57.73 female       WT     3
## 33 64.29 female       WT     3
## 34 54.26 female       WT     3
## 35 69.23 female       WT     3
## 36 58.51 female       WT     3
## 37 47.02 female   LCHADD     4
## 38 44.98 female   LCHADD     4
## 39 48.60 female   LCHADD     4
## 40 34.56 female   LCHADD     4
## 41 52.64 female   LCHADD     4
## 42 49.24 female   LCHADD     4
## 43 48.23 female   LCHADD     4
## 44 61.77 female   LCHADD     4
## 45 57.64 female   LCHADD     4
## 46 67.40 female   LCHADD     4
## 47 40.66 female   LCHADD     4
## 48 48.23 female   LCHADD     4

library(ggplot2)

Problem 1. Data Visualization (10 points)

Draw box plots to show the 4 groups of genotype and sex combinations, i.e., female wt, female LCHADD, male wt, and male LCHADD. Please interpret the boxplots regarding the comparison of ejection fraction between male and female, and between WT and LCHADD.

# Imported 
# Base R Boxplot
boxplot(ef ~ interaction(sex, genotype), 
        data = dat,
        main = "Boxplots of Ejection Fraction by Group",
        xlab = "Group (Sex and Genotype)",
        ylab = "Ejection Fraction",
        col = c("lightblue", "pink", "lightgreen", "orange"),
        names = c("Male WT", "Male LCHADD", "Female WT", "Female LCHADD"))

Problem 2. Checking normality using normal probability plots and qqplots (10 points)

Draw normal probability plots and qqplots to check data normality within each of the 4 groups above. Hint: You can use the code here to select group, eg, group 1 data: ef[group == 1].

par(mfrow = c(2, 2))  # Set up a 2x2 plotting layout

# Group 1: Male WT
qqnorm(ef[group == 1], main = "QQ Plot: Male WT")
qqline(ef[group == 1])

# Group 2: Male LCHADD
qqnorm(ef[group == 2], main = "QQ Plot: Male LCHADD")
qqline(ef[group == 2])

# Group 3: Female WT
qqnorm(ef[group == 3], main = "QQ Plot: Female WT")
qqline(ef[group == 3])

# Group 4: Female LCHADD
qqnorm(ef[group == 4], main = "QQ Plot: Female LCHADD")
qqline(ef[group == 4])

# Reset plotting layout
par(mfrow = c(1, 1))

Problem 3. Perform Kolmogorov–Smirnov tests for each of the four groups

Perform Kolmogorov–Smirnov tests for each of the four groups to conclude whether there is sufficient evidence of non-normality.

# Group 1: Male WT
ks.test(ef[group == 1], "pnorm", mean(ef[group == 1]), sd(ef[group == 1]))

## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  ef[group == 1]
## D = 0.18479, p-value = 0.7426
## alternative hypothesis: two-sided

# Group 2: Male LCHADD
ks.test(ef[group == 2], "pnorm", mean(ef[group == 2]), sd(ef[group == 2]))

## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  ef[group == 2]
## D = 0.15091, p-value = 0.9101
## alternative hypothesis: two-sided

# Group 3: Female WT
ks.test(ef[group == 3], "pnorm", mean(ef[group == 3]), sd(ef[group == 3]))

## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  ef[group == 3]
## D = 0.17337, p-value = 0.8057
## alternative hypothesis: two-sided

# Group 4: Female LCHADD
ks.test(ef[group == 4], "pnorm", mean(ef[group == 4]), sd(ef[group == 4]))

## Warning in ks.test.default(ef[group == 4], "pnorm", mean(ef[group == 4]), :
## ties should not be present for the one-sample Kolmogorov-Smirnov test

## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  ef[group == 4]
## D = 0.20422, p-value = 0.6988
## alternative hypothesis: two-sided

3.1 What is the null and alternative hypotheses of Kolmogorov–Smirnov test in this setting? (5 points)

Solution: # Null Hypothesis (𝐻0 H 0): The data in the sample follows a normal distribution and the Alternative Hypothesis (𝐻𝐴H A): The data in the sample does not follow a normal distribution. Therefore based on the Kolmogrov-Smirnov test results for each group: The p-values were greater than 0.05. This indicates insufficient evidence to reject H0, meaning we cannot conclude non-normality for any of the groups. The data for all groups appear consistent with normality.

3.2 Find the sample mean and sample standard deviation of each group (5 points)

# Group 1: Male WT
mean_1 <- mean(ef[group == 1])
sd_1 <- sd(ef[group == 1])

# Group 2: Male LCHADD
mean_2 <- mean(ef[group == 2])
sd_2 <- sd(ef[group == 2])

# Group 3: Female WT
mean_3 <- mean(ef[group == 3])
sd_3 <- sd(ef[group == 3])

# Group 4: Female LCHADD
mean_4 <- mean(ef[group == 4])
sd_4 <- sd(ef[group == 4])


cat("Group 1 (Male WT): Mean =", mean_1, ", SD =", sd_1, "\n")

## Group 1 (Male WT): Mean = 56.495 , SD = 7.278326

cat("Group 2 (Male LCHADD): Mean =", mean_2, ", SD =", sd_2, "\n")

## Group 2 (Male LCHADD): Mean = 47.08583 , SD = 5.041103

cat("Group 3 (Female WT): Mean =", mean_3, ", SD =", sd_3, "\n")

## Group 3 (Female WT): Mean = 58.29583 , SD = 6.074809

cat("Group 4 (Female LCHADD): Mean =", mean_4, ", SD =", sd_4, "\n")

## Group 4 (Female LCHADD): Mean = 50.08083 , SD = 8.919168

3.3 Perform the KS test for each group and interpret the results (5 points)

# Perform Kolmogorov–Smirnov tests for each group

# Group 1: Male WT
ks_test_1 <- ks.test(ef[group == 1], "pnorm", mean(ef[group == 1]), sd(ef[group == 1]))

# Group 2: Male LCHADD
ks_test_2 <- ks.test(ef[group == 2], "pnorm", mean(ef[group == 2]), sd(ef[group == 2]))

# Group 3: Female WT
ks_test_3 <- ks.test(ef[group == 3], "pnorm", mean(ef[group == 3]), sd(ef[group == 3]))

# Group 4: Female LCHADD
ks_test_4 <- ks.test(ef[group == 4], "pnorm", mean(ef[group == 4]), sd(ef[group == 4]))

## Warning in ks.test.default(ef[group == 4], "pnorm", mean(ef[group == 4]), :
## ties should not be present for the one-sample Kolmogorov-Smirnov test

cat("Group 1 (Male WT): p-value =", ks_test_1$p.value, "\n")

## Group 1 (Male WT): p-value = 0.7425544

cat("Group 2 (Male LCHADD): p-value =", ks_test_2$p.value, "\n")

## Group 2 (Male LCHADD): p-value = 0.9101327

cat("Group 3 (Female WT): p-value =", ks_test_3$p.value, "\n")

## Group 3 (Female WT): p-value = 0.8056705

cat("Group 4 (Female LCHADD): p-value =", ks_test_4$p.value, "\n")

## Group 4 (Female LCHADD): p-value = 0.6988165

solutions Null Hypothesis (H0H_0H0): The group data follows a normal distribution. Alternative Hypothesis (HAH_AHA): The group data does not follow a normal distribution. Evaluate p-values: p > 0.05: Fail to reject H0. The data is consistent with normality. p ≤ 0.05: Reject H0. The data is not normally distributed.

Problem 4. Perform a t-test comparing ejection fraction between two genotypes within male

4.1 t-test assumptions (5 points)

What are the assumptions of t-test? Based on the previous normality check, do you feel it is appropriate to use t-test? Do you have any concerns using t-test?

Solution: Summary of T-Test Assumptions and Applicability Assumptions: Independence: Samples are independent. Normality: Data in each group follows a normal distribution. Equal Variance: Variances in the groups are approximately equal. Evaluation: Independence: Assumed based on study design. Normality: Check with normality tests (p>0.05); violation (𝑝≤ 0.05) may require alternatives. Equal Variance: Use Levene’s or F-test; if unequal, apply Welch’s t-test. Concerns: Normality Violation: Consider Mann-Whitney U test for small samples. Variance Violation: Use Welch’s t-test. Conclusion: Use t-test if normality and variance assumptions hold; otherwise, use alternative methods.

4.2 pooled t-test vs unpooled t-test (5 points)

Based on the sample standard deviations in the two groups, do you think it is appropriate to use pooled t-test? Why?

Solution: Results Standard Deviation (Male WT): 7.28 Standard Deviation (Male LCHADD): 5.04 Ratio of Standard Deviations: 1.44 Conclusion Since the ratio of standard deviations is 1.44, which is less than 2:1, the variance difference is not substantial. It is appropriate to use a pooled t-test in this case.

4.3 perform the pooled t-test (5 points)

male_wt <- c(53.33, 61.08, 59.07, 58.68, 50.27, 55.3, 37.84, 67.19, 56.18, 57.95, 59.3, 61.75)
male_lchadd <- c(52.01, 44.99, 42.6, 35.81, 45.05, 52.43, 50.69, 48.11, 50.24, 46.72, 52.9, 43.48)


t_test_result <- t.test(male_wt, male_lchadd, var.equal = TRUE)


print(t_test_result)

## 
##  Two Sample t-test
## 
## data:  male_wt and male_lchadd
## t = 3.6815, df = 22, p-value = 0.001307
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.108715 14.709618
## sample estimates:
## mean of x mean of y 
##  56.49500  47.08583

4.4 CI for sample mean (10 points)

What is the 95%CI for mean ejection fraction in male WT and male LCHADD? (5 points)

compute_ci <- function(data, confidence = 0.95) {
  n <- length(data)                     # Sample size
  mean_val <- mean(data)                # Sample mean
  std_error <- sd(data) / sqrt(n)       # Standard error
  t_crit <- qt((1 + confidence) / 2, df = n - 1)  # Critical t-value
  lower <- mean_val - t_crit * std_error  # Lower bound
  upper <- mean_val + t_crit * std_error  # Upper bound
  return(c(lower, upper))
}


ci_male_wt <- compute_ci(male_wt)
ci_male_lchadd <- compute_ci(male_lchadd)


cat("95% CI for Male WT:", ci_male_wt, "\n")

## 95% CI for Male WT: 51.87057 61.11943

cat("95% CI for Male LCHADD:", ci_male_lchadd, "\n")

## 95% CI for Male LCHADD: 43.88287 50.2888

What is the 95% CI for mean difference between ejection fraction among male WT and male LCHADD? If the 95%CI of mean difference doesn’t include 0, what does that mean? (5 points)

Solution:

compute_diff_ci <- function(group1, group2, confidence = 0.95) {
  n1 <- length(group1)
  n2 <- length(group2)
  mean_diff <- mean(group1) - mean(group2)  # Difference in means
  pooled_var <- ((sd(group1)^2 / n1) + (sd(group2)^2 / n2))  # Pooled variance
  std_error_diff <- sqrt(pooled_var)  # Standard error of the difference
  df <- n1 + n2 - 2  # Degrees of freedom
  t_crit <- qt((1 + confidence) / 2, df = df)  # Critical t-value
  lower <- mean_diff - t_crit * std_error_diff  # Lower bound
  upper <- mean_diff + t_crit * std_error_diff  # Upper bound
  return(c(lower, upper))
}

ci_diff <- compute_diff_ci(male_wt, male_lchadd)

cat("95% CI for mean difference:", ci_diff, "\n")

## 95% CI for mean difference: 4.108715 14.70962

Problem 5. One-way ANOVA

Suppose we would like to perform one-WAY ANOVA to test whethere this is difference among 4 groups, i.e., ejection fraction w.r.t group.

5.1 Hypotheses in one-way ANOVA (5 points)

What is the null and alternative hypotheses in one-way ANOVA?

solutions Hypotheses in One-Way ANOVA Null Hypothesis (𝐻0):All group means are equal. Mathematically:𝐻0:μ 1 = μ 2 = μ 3 = μ 4 (Where μ 1,μ 2,μ 3,μ 4 are the means of the four groups.) Alternative Hypothesis (H A): At least one group mean is different. Mathematically:𝐻𝐴:Not all𝜇𝑖are equal (for𝑖= 1,2,3,4).HA : Not all μi are equal (for i = 1,2,3,4).

5.2 ANOVA model assumptions (5 points)

What are the modelling assumptions in one-way anova?

Solution: Summary of One-Way ANOVA Assumptions Independence: Observations must be independent across and within groups. Normality: Each group’s data should follow a normal distribution (check via QQ plots or Shapiro-Wilk test). Homogeneity of Variances: All groups should have equal variances (test with Levene’s or Bartlett’s test). Random Sampling: Data should be randomly drawn from the population. Note: Violations of normality or equal variances can lead to unreliable results. Consider alternatives like the Kruskal-Wallis test if assumptions are not met.

5.3 Perform ANOVA (5 points)

Perform the ANOVA analysis and produce the one-way ANOVA table.

Hint: Use factor(group) when fitting ANOVA model because group should be treated as categorical variable rather than continuous. It is a good practice, always add factor() in anova for group.

dat$group <- factor(dat$group)

anova_model <- aov(ef ~ group, data = dat)

anova_table <- summary(anova_model)
print(anova_table)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## group        3   1005   335.0   6.878 0.000672 ***
## Residuals   44   2143    48.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.4 Interpretation of ANOVA (5 points)

What is the degrees of freedom for group variable? Why? What is the p value and interpretation of the p value?

solutions Summary of ANOVA Interpretation Degrees of Freedom (DF): Between Groups: 𝐷𝐹=𝑘− 1 = 4 − 1 = 3 (for 4 groups: Male WT, Male LCHADD, Female WT, Female LCHADD). Within Groups (Residual):𝐷𝐹=𝑁−𝑘, where 𝑁is the total sample size. P-Value:Tests the null hypothesis (𝐻0:𝜇1=𝜇2=𝜇3=𝜇).If𝑝≤ 0.05 p≤0.05: Reject𝐻0.At least one group mean differs significantly. If𝑝> 0.05: Fail to reject𝐻0.No significant difference among group means. Key Point: Degrees of freedom and p-value help determine whether group means are significantly different.