Part 3

Q1.1

The response variable is symmetry and the predictor variables are strain and category.

Q1.2

The null hypothesis is that group centroids are not significantly different from one another. Group centroids are simply multivariate means – means in two or more dimensions.

mus_data <- read.csv("mouse_data.csv")
#convert symm to a matrix so it can be used in manova
mus_shape <- as.matrix(mus_data[,4:27])
mus_manova1 <- manova(mus_shape ~ mus_data$Category)
summary(mus_manova1, test=c('Pillai', 'Wilks', 'Hotelling-Lawley', 'Roy'))

Q1.3

Table below:

# Fit the MANOVA model
mus_manova2 <- manova(mus_shape ~ Category + Strain, data = mus_data)
# Extract the summary of the MANOVA model
mus_summary <- summary(mus_manova2, test = c('Pillai', 'Wilks', 'Hotelling-Lawley', 'Roy'))
# Create a data frame with test statistics and p-values
summary_df <- data.frame(df = mus_summary$stats[,1],
                         Pillai = mus_summary$stats[,2],
                         `approx F` = mus_summary$stats[,3],
                         `num Df` = mus_summary$stats[,4],
                         `den Df` = mus_summary$stats[,5],
                         `p-value` = mus_summary$stats[, "Pr(>F)"])
# Use kable to format the table
table <- knitr::kable(summary_df, digits = 3, format = "markdown")
# Print the table
table
df Pillai approx.F num.Df den.Df p.value
Category 2 1.825 89.748 48 414 0
Strain 7 4.769 18.888 168 1484 0
Residuals 229 NA NA NA NA NA

Table 1. Test statistics and p-values of a MANOVA.

Q1.4

For the category factor:

The Pillai’s trace statistic is 1.8, indicating a significant multivariate effect (differences among groups). The approximate F-statistic is 98.7, with degrees of freedom 2 and 48 for the numerator and denominator. The p-value is extremely small (~0), suggesting strong evidence of difference among groups.

For the strain factor:

The Pillai’s trace statistic is 4.7, indicating a significant multivariate effect. The approximate F-statistic is 18.88, with degrees of freedom 7 and 168 for the numerator and denominator, respectively. The p-value is extremely small (~0), suggesting strong evidence of difference among groups.

Biologically, these results suggest that both Category and Strain factors have a significant effect on the skull shape and that here are significant differences among the predictors.

Part 4

Q2.1

This output shows the 24 univariate responses individually compared to strain and category/strain. You should be able to conclude the significance of each individual dependent variable within the MANOVA. This allows you to identify which specific response variables are driving the multivariate effects that lead to the results in MANOVA.

summary.aov(mus_manova2)

Q2.2

The proportion of the variance DF-1 explains is ~56%

mus_dfa1 <- lda(mus_data$Category ~ mus_shape, CV=F)
#sum the “coefficients of linear discriminants” (the eigenvectors)
#To sum correctly take the absolute value
DF1var <- sum(abs(mus_dfa1$scaling[,1]))
DF2var <- sum(abs(mus_dfa1$scaling[,2]))
#sum together
tve <- sum(DF1var, DF2var)
#proportion of the variance DF-1 explains
print(DF1var / tve)
## [1] 0.5587257

Q2.3

The five most important (most important to least important) response variables for determining wild, lab and mutant mice are as follows:

  1. Symm1
  2. Symm9
  3. Symm13
  4. symm5
  5. Symm17
# Order coefficients from largest to smallest
total <- abs(mus_dfa1$scaling[,1]) + abs(mus_dfa1$scaling[,2])
ord <- sort(total, decreasing = T)
#Extract names of the top five response variables
ord[1:5]
##  mus_shapeSymm1  mus_shapeSymm9 mus_shapeSymm13  mus_shapeSymm5 mus_shapeSymm17 
##        1636.873        1301.138        1275.008        1263.214        1136.157

Q2.4

78 individuals in the lab group were correctly identified and 2 were incorrectly classified as mutant.

mus_dfa1a <- lda(mus_data$Category ~ mus_shape, CV=T)
mus_dfa1a_table = table(mus_data$Category, mus_dfa1a$class)
mus_dfa1a_table
##         
##          Lab Mutant Wild
##   Lab     78      2    0
##   Mutant   5    106    0
##   Wild     0      0   48

Q2.5

The total number of correct classifications was 232 mice.

Table below:

#proportion of successful classification using
diag(prop.table(mus_dfa1a_table,1))
##      Lab   Mutant     Wild 
## 0.975000 0.954955 1.000000
#calculate the overall proportion of correct classifications
sum(diag(prop.table(mus_dfa1a_table)))
## [1] 0.9707113
# Create a data frame
summary_df <- data.frame(Lab = mus_dfa1a_table[,1],
                         Mutant = mus_dfa1a_table[,2],
                         Wild = mus_dfa1a_table[,3],
                         `Prop Correct` = diag(prop.table(mus_dfa1a_table,1)))
# Use kable to format the table
table <- knitr::kable(summary_df, digits = 3, format = "markdown")
# Print the table
table
Lab Mutant Wild Prop.Correct
Lab 78 2 0 0.975
Mutant 5 106 0 0.955
Wild 0 0 48 1.000

Table 2. The overall proportion of correct classifications. Rows represent the group to which individuals actually belong.

Q2.6

The two that appear to be the most similar are Lab and Mutant where wild is much further away.

plot(mus_dfa1)
Figure 1. The DF for each individual as a discriminant function space. This plot shows which individuals may look most alike.

Figure 1. The DF for each individual as a discriminant function space. This plot shows which individuals may look most alike.

Q2.7

There seems to be ~11 outliers, the mutant group seems to have the most outliers of all three groups.

pred_dfa1 <- predict(mus_dfa1)$x
mah_dists <- mahalanobis(pred_dfa1, c(0,0), cov(pred_dfa1))
#Create a box plot of Mahalanobis distances for the three groups
boxplot(mah_dists ~ mus_data$Category, xlab = "Group", ylab = "Mahalanobis Distance")
Figure 2. Boxplot illustraiting the Mahalanobis distance function, of each mouse group and their outliers.

Figure 2. Boxplot illustraiting the Mahalanobis distance function, of each mouse group and their outliers.

Part 6

Q3.1

There seems to be no variables that obviously are related in a curved or non-linnear manner? No violations of linearrity present.

fb_data <- read.csv("flycatcher_data.csv")
maleID <- as.factor(fb_data$male)
pairs(fb_data[,3:15], pch = '.')
Figure 3. Pairwise plots of all variables in the flycatcher dataset.

Figure 3. Pairwise plots of all variables in the flycatcher dataset.

Q3.2

The loadings = FALSE argument specifies that the loadings should not be included in the summary. The cutoff = 0.1 argument specifies the cutoff for printing values. Loadings smaller in absolute value than this cutoff are not printed. loadings = TRUE provide the loadings and have a much smaller cutoff than the earlier summary.

fb_pca1 <- princomp(fb_data[,3:15], cor=F, scores=T, covmat=NULL)
summary(fb_pca1, loadings=FALSE, cutoff=0.1)

foo <- princomp(fb_data)[,3:15], cor=F, scores=T, covmat=NULL)
summary(foo, loadings=TRUE, cutoff=0.1)

foo1 <- princomp(fb_data[,3:15], cor=F, scores=T, covmat=NULL)
summary(foo, loadings=TRUE, cutoff=0.0001)

Q3.3

Table below:

fb_pca1 <- princomp(fb_data[,3:15], cor=T, scores=T, covmat=NULL)
# Eigenvalues are the squares of the standard deviations
eigen <- fb_pca1$sdev^2
# Create a data frame and transpose to make row
df <- t(data.frame(Lambda = eigen[1:10]))
# Use kable to format the table
table <- knitr::kable(df, digits = 3, format = "markdown")
# Print the table
table
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
Lambda 3.405 2.482 2.291 1.705 1.306 0.551 0.413 0.325 0.264 0.168

Table 3. The first ten eigenvalues from the Fitz-Bew song data.

Q3.4

Five PCs should be interpretd with the Fitz-Bew dataset with an eigenvalue above 1.

Table below:

screeplot(fb_pca1, type=c('lines'))
Figure 4. Screeplot plot of eigenvalues against PC number.

Figure 4. Screeplot plot of eigenvalues against PC number.

Q3.5

Q3.6

Q3.7

EC.