MA564 Homework 1

Problem 1:

Consider a bivariate normal population with μ1 = 0, μ2 = 2, σ11 = 2, σ22 = 1, and ρ12 = 0.5.

Write out the bivariate normal density.

Ans:

\[ \displaystyle{\phi(x_{1}, x_{2}) = \frac{\sqrt{6}}{6 \pi} e^{\frac{-x_{1}^{2} + \sqrt{2}x_{1}x_{2}-2\sqrt{2}x_{1}-2x_{2}^{2}+8x_{2}-8}{3}}} \]

Manually calculated (Please refer to attached handwritten work).

Write out the squared generalized distance expression (x - μ)’ Σ⁻¹ (x - μ) as a function of x1 and x2.

Ans:

\[ \displaystyle{D^{2} = \frac{2x_{1}^{2} + 4\sqrt{2}x_{1} - 2\sqrt{2}x_{1}x_{2}-16x_{2} + 4x_{2}^{2} + 16}{3}} \]

Manually calculated (Please refer to attached handwritten work).

mu <- c(0, 2)
covmat <- matrix(c(2, ((sqrt(2))/2), ((sqrt(2))/2), 1), nrow = 2, byrow = TRUE)

#Calculating the inverse of the sample covariance.
inv_covmat <- solve(covmat, symmetric = TRUE)
print(inv_covmat)

##            [,1]       [,2]
## [1,]  0.6666667 -0.4714045
## [2,] -0.4714045  1.3333333

#
#
#
#
#
# b) 
# Verifying eigen values and vectors.
ei1 <- eigen(covmat)

# Extract eigenvalues.
ei1_vals <- ei1$values
print(ei1_vals)

## [1] 2.3660254 0.6339746

# Extract eigenvectors.
ei1_vecs <- ei1$vectors
print(ei1_vecs)

##            [,1]       [,2]
## [1,] -0.8880738  0.4597008
## [2,] -0.4597008 -0.8880738

####### Optional: Normalizing vector so y = 1 (Mainly used to verify algebraic steps to
#finding eigenvectors before normalization).
# Normalize the first eigenvector so that the second component is 1.
#ei1_vec1 <- ei1_vecs[, 1]
#if (ei1_vec1[2] != 0) {
#  norm_ei1_vec1 <- ei1_vec1 / ei1_vec1[2]
#} else {
#  stop("The second component of the first eigenvector is zero; cannot normalize to 1.")
#}
#print(norm_ei1_vec1)

# Normalize the second eigenvector so that the second component is 1.
#ei1_vec2 <- ei1_vecs[, 2]
#if (ei1_vec2[2] != 0) {
#  norm_ei1_vec2 <- ei1_vec2 / ei1_vec2[2]
#} else {
#  stop("The second component of the second eigenvector is zero; cannot normalize #to 1.")
#}
#print(norm_ei1_vec2)
# Eigen values and vectors are the same as manual calculations when normalized.
#########

Determine (and sketch) the constant-density contour that contains 50% of the probability.

#c)
#Calculating the Major Axes Endpoints.
e1 <- ei1_vals[1]
ei1_vec1 <- ei1_vecs[, 1]
major_axis1 = mu + ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e1)) * ei1_vec1) 
print(major_axis1)

## [1] -1.608372  1.167445

major_axis2 = mu - ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e1)) * ei1_vec1) 
print(major_axis2)

## [1] 1.608372 2.832555

Determine (and sketch) the constant-density contour that contains 50% of the probability.

#Calculating the Minor Axes Endpoints.
e2 <- ei1_vals[2]
ei1_vec2 <- ei1_vecs[, 2]
minor_axis1 = mu + ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e2)) * ei1_vec2) 
print(minor_axis1)

## [1] 0.430962 1.167445

minor_axis2 = mu - ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e2)) * ei1_vec2) 
print(minor_axis2)

## [1] -0.430962  2.832555

#c)
library(ellipse)

## 
## Attaching package: 'ellipse'

## The following object is masked from 'package:graphics':
## 
##     pairs

library(MASS)
library(mvtnorm)
set.seed(123)

X <- mvrnorm(n=10000,mu=mu, Sigma=covmat)


elps <- t(t(ellipse(covmat, level=0.5, npoints=1000))+mu)


#Plotting the random data followign the covariance of sample.
plot(X[, 1], X[, 2], col = "black", pch = 1, xlab = "X1", ylab = "X2", main = "Bivariate
     Normal Distribution with 50% Confidence Ellipse")

#Drawing confidence ellipse itself.
lines(elps, col = "red", lwd = 1)  

#Plotting the mean coordinates.
points(mu[1], mu[2], col = "red", pch = 1, cex = 1)  
text(mu[1], mu[2], labels = paste0("μ = (", round(mu[1], 2), ", ", round(mu[2], 2), ")"),
     pos = 4, col = "red", cex = 0.8)


#Plotting the  principal axes.
segments(mu[1], mu[2], major_axis1[1], major_axis1[2], col = "red", lwd = 1)
segments(mu[1], mu[2], minor_axis1[1], minor_axis1[2], col = "red", lwd = 1)

#Adding labeled coordinates for major and minor axes.
points(major_axis1[1], major_axis1[2], col = "red", pch = 19)
text(major_axis1[1], major_axis1[2], labels = paste0("(", round(major_axis1[1], 2), ", ", 
                                                     round(major_axis1[2], 2), ")"), 
     pos = 3, col = "red", cex = 1)

points(major_axis2[1], major_axis2[2], col = "red", pch = 19)
text(major_axis2[1], major_axis2[2], labels = paste0("(", round(major_axis2[1], 2), ", ",
                                                     round(major_axis2[2], 2), ")"), 
     pos = 1, col = "red", cex = 1)

points(minor_axis1[1], minor_axis1[2], col = "red", pch = 19)
text(minor_axis1[1], minor_axis1[2], labels = paste0("(", round(minor_axis1[1], 2), ", ", 
                                                     round(minor_axis1[2], 2), ")"), 
     pos = 3, col = "red", cex = 1)

points(minor_axis2[1], minor_axis2[2], col = "red", pch = 19)
text(minor_axis2[1], minor_axis2[2], labels = paste0("(", round(minor_axis2[1], 2), ", ",
                                                     round(minor_axis2[2], 2), ")"), 
     pos = 1, col = "red", cex = 1)

Problem 2:

A physical anthropologist performed a mineral analysis of nine ancient Peruvian hairs. The results for the chromium (x1) and strontium (x2) levels, in parts per million (ppm), were as follows:

Source: Benfer and others, ‘Mineral Analysis of Ancient Peruvian Hair,’ American Journal of Physical Anthropology, 48(3), 1978, pp. 277–282.
	1	2	3	4	5	6	7	8	9
x1(Cr)	48	40.53	2.19	0.55	0.74	0.66	0.93	0.37	0.22
x2(Sr)	12.57	73.68	11.3	20.03	20.29	7.8	4.64	4.3	1.08

It is known that low levels (less than or equal to 0.100 ppm) of chromium suggest the presence of diabetes.

In parts b)-f) first perform calculations manually and verify the manual calculations using R.

Use R to find vector of sample means, sample covariance, and inverse of the sample covariance.

#a)
#Creating Dataframe.
mineraldf <- data.frame(
x1_Cr = c(48, 40.53, 2.19, 0.55, 0.74, 0.66, 0.93, 0.37, 0.22),
x2_Sr = c(12.57, 73.68, 11.3, 20.03, 20.29, 7.8, 4.64, 4.3, 1.08)
)

#Calculating vector of sample means.
x_bar <- colMeans(mineraldf)
print(x_bar)

##    x1_Cr    x2_Sr 
## 10.46556 17.29889

#Calculating sample covariance.
s_cov <- cov(mineraldf)
print(s_cov)

##          x1_Cr    x2_Sr
## x1_Cr 371.0078 252.7872
## x2_Sr 252.7872 492.0105

#Calculate inverse of sample covariance.
s_inv_cov <- solve(s_cov, symmetric = TRUE)
print(s_inv_cov)

##              x1_Cr        x2_Sr
## x1_Cr  0.004147146 -0.002130738
## x2_Sr -0.002130738  0.003127216

Ans:

	Sample Mean 1 (Cr)	Sample Mean 2 (Sr)	Sample Covariance	Inverse of Sample Cov.
Manual Calculations	10.46556	17.29889	\(\begin{bmatrix} 371.0078 & 252.7872\\252.7872 & 492.0105 \end{bmatrix}\)	\(\begin{bmatrix} 0.004147146 & -0.002130738\\-0.002130738 & 0.0003127216 \end{bmatrix}\)
R Results	691	172	\(\begin{bmatrix} 371 & 253\\253 & 492 \end{bmatrix}\)	\(\begin{bmatrix} 0.00415 & -0.00213\\-0.00213 & 0.000313 \end{bmatrix}\)

Calculate manually eigen values and eigen vectors. Check your results using R.

# b) 
#Verifying eigen values and vectors.
ei2 <- eigen(s_cov)

#Eigenvalues.
ei2_vals <- ei2$values
print(ei2_vals)

## [1] 691.4357 171.5826

#Eigenvectors
ei2_vecs <- ei2$vectors
print(ei2_vecs)

##           [,1]       [,2]
## [1,] 0.6193693 -0.7850998
## [2,] 0.7850998  0.6193693

####### Optional: Normalizing vector so y = 1 (Mainly used to verify algebraic steps done
#manually to finding eigenvectors before normalization).
# Normalize the first eigenvector so that the second component is 1.
#ei2_vec1 <- ei2_vecs[, 1]
#if (ei2_vec1[2] != 0) {
#  norm_ei2_vec1 <- ei2_vec1 / ei2_vec1[2]
#} else {
#  stop("The second component of the first eigenvector is zero; cannot normalize to 1#.")
#}
#print(norm_ei2_vec1)

# Normalize the second eigenvector so that the second component is 1.
#ei2_vec2 <- ei2_vecs[, 2]
#if (ei2_vec2[2] != 0) {
#  norm_ei2_vec2 <- ei2_vec2 / ei2_vec2[2]
#} else {
#  stop("The second component of the second eigenvector is zero; cannot normalize to #1.")
#}
#print(norm_ei2_vec2)

Ans:

	Eigen Value 1	Eigen Value 2	Eigen Vector 1	Eigen Vector2
Manual Calculations	691.4356431	171.5826569	\(\begin{bmatrix} 0.619362199 \\0.785105386 \end{bmatrix}\)	\(\begin{bmatrix} -0.785099773 \\0.619369314 \end{bmatrix}\)
R Results	691	172	\(\begin{bmatrix} 0.619 \\0.785 \end{bmatrix}\)	\(\begin{bmatrix} -0.785 \\0.619 \end{bmatrix}\)

Conduct a hypothesis test to determine whether the mean vector significantly differs from [0.1, 15] at 5% significance level.

#c)
#Defining variables.
n <- nrow(mineraldf)
p <- ncol(mineraldf)
mu <- c(0.1, 15)
mean_diff <- x_bar - mu

#Computing Critical Value for T2.
alpha <- 0.05
F_crit <- qf(1 - alpha, p, n - p)  
F_crit

## [1] 4.737414

T2_CV <- (p * (n - 1) / (n - p)) * F_crit

#Computing Statistical Value for T2.
T2_Stat <- n * t(mean_diff) %*% s_inv_cov %*% mean_diff

#Printing Critical Value and Statistical Value for T2 for comparison.
print(T2_CV)

## [1] 10.82838

print(T2_Stat)

##          [,1]
## [1,] 3.245114

#Computing P-value as an alternative method for comparison.
F_Stat <- (((n-p)/((n-1)*p)) * T2_Stat)
print(F_Stat)

##          [,1]
## [1,] 1.419737

F_Pval <- pf(F_Stat, 2, 7, lower.tail = FALSE)
print(F_Pval)

##           [,1]
## [1,] 0.3036977

Ans:

	\(T^2\) Critical Value	\(T^2\) Test Value	F-Stat	P-Value
Manual Calculations	10.83428572	3.245116635	1.419738528	\(p > \alpha = 0.05\)
R Results	10.8	3.25	1.42	0.304

Because our \(T^2\) Test Value is less than our \(T^2\)Critical Value, our \(T^2\) does not lie within the Rejection Region (RR) and thus we cannot reject the null hypothesis.
Furthermore, we computed the P-value of our computed F-statistic \(= 0.304\). Thus, we cannot reject the null hypothesis because our P-value is greater than our significance level \(\alpha = 0.05\).
Therefore, we may conclude that there is not enough evidence to reject the claim that the mean vector does not significantly differ from [0.1, 15] at the 5% significance level.

Construct and plot a 95% joint confidence ellipse for the population mean vector μ = [μ1, μ2] assuming that these nine Peruvian hairs represent a random sample from individuals belonging to a particular ancient Peruvian culture.

#d)
library(ellipse)
set.seed(456)
X <- mvrnorm(n=10000,mu=x_bar, Sigma=s_cov)
# Plot data
plot(X[, 1], X[, 2], col = "black", pch = 1, xlab = "X1", ylab = "X2", 
     main = "95% Joint Confidence Ellipse for Chromium and Strontium")

#Computing the F-distribution scaling factor (Hotelling’s T²).
scale_factor_F <- sqrt((p * (n - 1)) / (n * (n - p)) * F_crit)

#Computing major and minor axis endpoints.
major_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]
print(major_axis1)

##    x1_Cr    x2_Sr 
## 28.32985 39.94331

major_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]  
print(major_axis2)

##     x1_Cr     x2_Sr 
## -7.398743 -5.345529

minor_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]  
print(minor_axis1)

##      x1_Cr      x2_Sr 
## -0.8147768 26.1980014

minor_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]  
print(minor_axis2)

##     x1_Cr     x2_Sr 
## 21.745888  8.399776

#Generating the 95% confidence ellipse using Hotelling’s T² scaling.
ellipse_df <- as.data.frame(ellipse(s_cov, centre = x_bar, t = scale_factor_F))
colnames(ellipse_df) <- c("x", "y")

#Plotting the ellipse itself.
lines(ellipse_df, col = "red", lwd = 1)

#Plotting the mean.
points(x_bar[1], x_bar[2], col = "red", pch = 1, cex = 1.5)
text(x_bar[1], x_bar[2], labels = paste0("μ = (", round(x_bar[1], 2), ", ",
                                         round(x_bar[2], 2), ")"), 
     pos = 4, col = "red", cex = 1)

#Plotting the major and minor axes endpoints.
points(major_axis1[1], major_axis1[2], col = "red", pch = 19, cex = 1.2)
points(major_axis2[1], major_axis2[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis1[1], minor_axis1[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis2[1], minor_axis2[2], col = "red", pch = 19, cex = 1.2)

#Drawing principle lines for major and minor axes.
segments(x_bar[1], x_bar[2], major_axis2[1], major_axis2[2], col = "red", lwd = 1)
segments(x_bar[1], x_bar[2], minor_axis2[1], minor_axis2[2], col = "red", lwd = 1)

#Labeling the axes coordinates.
text(major_axis1[1], major_axis1[2], labels = paste0("(", round(major_axis1[1], 2), ", ",
                                                     round(major_axis1[2], 2), ")"), 
     pos = 3, col = "red", cex = 1)

text(major_axis2[1], major_axis2[2], labels = paste0("(", round(major_axis2[1], 2), ", ",
                                                     round(major_axis2[2], 2), ")"), 
     pos = 1, col = "red", cex = 1)

text(minor_axis1[1], minor_axis1[2], labels = paste0("(", round(minor_axis1[1], 2), ", ",
                                                     round(minor_axis1[2], 2), ")"), 
     pos = 3, col = "red", cex = 1)

text(minor_axis2[1], minor_axis2[2], labels = paste0("(", round(minor_axis2[1], 2), ", ",
                                                     round(minor_axis2[2], 2), ")"), 
     pos = 1, col = "red", cex = 1)

Ans:

	Major Axis EP 1	Major Axis EP 2	Minor Axis EP 1	Minor Axis EP 2
Manual Calculations	(28.33452775, 39.94964726)	(-7.40340775, -5.35186726)	(-0.81785167, 26.20043244)	(21.74897167, 8.397347564)
R Results	(28.3, 39.9)	(-7.40, -5.35)	(-0.815, 26.198)	(21.7, 8.4)

Obtain the individual simultaneous 95% confidence intervals for μ1 and μ2.

#e)
#Computing standard errors.
s_i <- sqrt(diag(s_cov) / n)
s_i

##    x1_Cr    x2_Sr 
## 6.420521 7.393770

#Computing F-distribution critical value.
alpha <- 0.05  #At 95% confidence level.
F_crit <- qf(1 - alpha, p, n - p)
F_crit

## [1] 4.737414

#Computing the confidence intervals.
SCI_lower <- x_bar - sqrt(T2_CV) * s_i
SCI_upper <- x_bar + sqrt(T2_CV) * s_i

#Optional: Combining results into a nice data frame.
SCI_results <- data.frame(
  Variable = paste0("mu", 1:p),
  Lower_Bound = SCI_lower,
  Upper_Bound = SCI_upper
)

print(SCI_results)

##       Variable Lower_Bound Upper_Bound
## x1_Cr      mu1  -10.662129    31.59324
## x2_Sr      mu2   -7.031418    41.62920

Ans:

	Mean 1 (Cr) Simultaneous CI	Mean 2 (Sr) Simultaneous CI
Manual Calculations	(-10.66789035, 31.59901035)	(-7.03805572, 41.63583572)
R Results	(-10.66, 31.6)	(-7.03, 41.6)

Obtain 95% Bonferroni confidence intervals for μ1 and μ2.

#f)
#Computing the Bonferroni-adjusted t-value.
t_crit <- qt(1 - (alpha / (2 * p)), df = n - 1) 
t_crit

## [1] 2.751524

cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))

## Bonferroni-adjusted t-critical value: 2.7515

#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i

#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
  Variable = paste0("mu", 1:p),
  Lower_Bound = BCI_lower,
  Upper_Bound = BCI_upper
)

print(BCI_results)

##       Variable Lower_Bound Upper_Bound
## x1_Cr      mu1   -7.200659    28.13177
## x2_Sr      mu2   -3.045244    37.64302

#Computing the Bonferroni-adjusted t-value using the manually calculated critical t-val to
#see if it matches the value I esteimated from t-table.
t_crit <- 2.896 #Adjusted t-value.
t_crit

## [1] 2.896

cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))

## Bonferroni-adjusted t-critical value: 2.8960

#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i

#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
  Variable = paste0("mu", 1:p),
  Lower_Bound = BCI_lower,
  Upper_Bound = BCI_upper
)

print(BCI_results)

##       Variable Lower_Bound Upper_Bound
## x1_Cr      mu1   -8.128272    29.05938
## x2_Sr      mu2   -4.113470    38.71125

Ans:

	Mean 1 (Cr) Simultaneous CI	Mean 2 (Sr) Simultaneous CI
Manual Calculations	(-8.12826859, 29.05938859)	(-4.11346763, 38.71124763)
R Results (t = 2.75)	(-7.20, 28.1)	(-3.05, 37.6)
R Results (t = 2.896)	(-8.13, 29.1)	(-4.11, 38.7)

Do these data appear to be bivariate normal? Discuss their status with reference to Q-Q plots and a scatter diagram. If the data are not bivariate normal, what implications does this have for the results in Parts c) - f)?

#g)
#Graphing the Normal Q-Q probability plots.
qqnorm(mineraldf$x1_Cr, main = "Chromium Normal Q-Q Probability Plot",
       xlab = "Theoretical Quantiles", 
       ylab = "Sample Quantiles")
qqline(mineraldf$x1_Cr, col = "red", lwd = 2)

qqnorm(mineraldf$x2_Sr, main = "Strontium Normal Q-Q Probability Plot",
       xlab = "Theoretical Quantiles",
       ylab = "Sample Quantiles")
qqline(mineraldf$x2_Sr, col = "red", lwd = 2)

#Graphing the Chi-square Q-Q Plot for Mahalanobis Distances.
mahal_dist <- mahalanobis(mineraldf, center = x_bar, cov = s_cov)
print(mahal_dist)

## [1] 6.6689704 6.4658858 0.1849973 0.5464687 0.5442094 0.2839870 0.3638145
## [8] 0.3918495 0.5498174

qqplot(qchisq(ppoints(nrow(mineraldf)), df = 2), mahal_dist,
       main = "Chi-Square Q-Q Plot for Mahalanobis Distances",
       xlab = "Theoretical Quantiles", ylab = "Mahalanobis Distance")
abline(0, 1, col = "red")

#Grapphing the Scatter-Plot diagram.
plot(mineraldf$x1_Cr, mineraldf$x2_Sr,
     main = "Scatter Plot of Mineral Levels in Ancient Peruvian Hairs",
     xlab = "Chromium",
     ylab = "Strontium")

Ans:

Based off the Normal Q-Q Plot and the Q-Q Plot for Mahalanobis Distances, there are strong deviations from normaility with extreme outliers towards theright end for both Chromium and Strontium univariate wise and multivariate wise suggesting that they are not normally distributed. The scatter plot also shows no obvious elliptical pattern and a cluster of data towards zero solidifying that the variables follow a true bivariate normal distribution.

Repeat the analysis (using R, no need for manual calculations) with the outlier observation removed (observation with the largest Mahalanobis distance). Do the inferences change? Comment.

#h)
#Identifying the index of the observation with the largest Mahalanobis distance.
max_mahal_index <- which.max(mahal_dist)  

#Creating a new dataset without this observation.
mineraldf2 <- mineraldf[-max_mahal_index, ]  

print(mineraldf2)

##   x1_Cr x2_Sr
## 2 40.53 73.68
## 3  2.19 11.30
## 4  0.55 20.03
## 5  0.74 20.29
## 6  0.66  7.80
## 7  0.93  4.64
## 8  0.37  4.30
## 9  0.22  1.08

Analysis Repeated

############REPEATING ANALYSIS#############
#a)
#Creating Dataframe.

#Calculating vector of sample means.
x_bar <- colMeans(mineraldf2)
print(x_bar)

##    x1_Cr    x2_Sr 
##  5.77375 17.89000

#Calculating sample covariance.
s_cov <- cov(mineraldf2)
print(s_cov)

##          x1_Cr    x2_Sr
## x1_Cr 197.5891 317.4259
## x2_Sr 317.4259 558.7038

#Calculate inverse of sample covariance.
s_inv_cov <- solve(s_cov, symmetric = TRUE)
print(s_inv_cov)

##             x1_Cr       x2_Sr
## x1_Cr  0.05798953 -0.03294658
## x2_Sr -0.03294658  0.02050836

#b) 
#Verifying eigen values and vectors.
ei2 <- eigen(s_cov)

#Eigenvalues.
ei2_vals <- ei2$values
print(ei2_vals)

## [1] 743.33153  12.96133

#Eigenvectors
ei2_vecs <- ei2$vectors
print(ei2_vecs)

##           [,1]       [,2]
## [1,] 0.5027788 -0.8644151
## [2,] 0.8644151  0.5027788

#c)
#Defining variables.
n <- nrow(mineraldf2)
p <- ncol(mineraldf2)
mu <- c(0.1, 15)
mean_diff <- x_bar - mu

#Computing Critical Value for T2.
alpha <- 0.05
F_crit <- qf(1 - alpha, p, n - p)  
F_crit

## [1] 5.143253

T2_CV <- (p * (n - 1) / (n - p)) * F_crit

#Computing Statistical Value for T2.
T2_Stat <- n * t(mean_diff) %*% s_inv_cov %*% mean_diff

#Printing Critical Value and Statistical Value for T2 for comparison.
print(T2_CV)

## [1] 12.00092

print(T2_Stat)

##          [,1]
## [1,] 7.660761

#Computing P-value as an alternative method for comparison.
F_Stat <- (((n-p)/((n-1)*p)) * T2_Stat)
print(F_Stat)

##          [,1]
## [1,] 3.283183

F_Pval <- pf(F_Stat, 2, n, lower.tail = FALSE)
print(F_Pval)

##            [,1]
## [1,] 0.09098189

Ans:

After hypothesis testing and excluding the extreme outlier observation, our decision does not change and remains to not reject the null hypothesis. There is still not enough evidence that the mean vector differs significantly from [0.1, 15] at the 5% significance level.

#d)
set.seed(789)
X <- mvrnorm(n=10000,mu=x_bar, Sigma=s_cov)
# Plot data
plot(X[, 1], X[, 2], col = "black", pch = 1, xlab = "X1", ylab = "X2", 
     main = "95% Joint Confidence Ellipse for Chromium and Strontium")

#Computing the F-distribution scaling factor (Hotelling’s T²).
scale_factor_F <- sqrt((p * (n - 1)) / (n * (n - p)) * F_crit)

#Computing major and minor axis endpoints.
major_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]
print(major_axis1)

##    x1_Cr    x2_Sr 
## 22.56297 46.75529

major_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]  
print(major_axis2)

##     x1_Cr     x2_Sr 
## -11.01547 -10.97529

minor_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]  
print(minor_axis1)

##     x1_Cr     x2_Sr 
##  1.962131 20.106991

minor_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]  
print(minor_axis2)

##     x1_Cr     x2_Sr 
##  9.585369 15.673009

#Generating the 95% confidence ellipse using Hotelling’s T² scaling.
ellipse_df <- as.data.frame(ellipse(s_cov, centre = x_bar, t = scale_factor_F))
colnames(ellipse_df) <- c("x", "y")

#Plotting the ellipse itself.
lines(ellipse_df, col = "red", lwd = 1)

#Plotting the mean.
points(x_bar[1], x_bar[2], col = "red", pch = 1, cex = 1.5)
text(x_bar[1], x_bar[2], labels = paste0("μ = (", round(x_bar[1], 2), ", ",
                                         round(x_bar[2], 2), ")"), 
     pos = 4, col = "red", cex = 1)

#Plotting the major and minor axes endpoints.
points(major_axis1[1], major_axis1[2], col = "red", pch = 19, cex = 1.2)
points(major_axis2[1], major_axis2[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis1[1], minor_axis1[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis2[1], minor_axis2[2], col = "red", pch = 19, cex = 1.2)

#Drawing principle lines for major and minor axes.
segments(x_bar[1], x_bar[2], major_axis2[1], major_axis2[2], col = "red", lwd = 1)
segments(x_bar[1], x_bar[2], minor_axis2[1], minor_axis2[2], col = "red", lwd = 1)

#Labeling the axes coordinates.
text(major_axis1[1], major_axis1[2], labels = paste0("(", round(major_axis1[1], 2), ", ",
                                                     round(major_axis1[2], 2), ")"), 
     pos = 3, col = "red", cex = 1)

text(major_axis2[1], major_axis2[2], labels = paste0("(", round(major_axis2[1], 2), ", ",
                                                     round(major_axis2[2], 2), ")"), 
     pos = 1, col = "red", cex = 1)

text(minor_axis1[1], minor_axis1[2], labels = paste0("(", round(minor_axis1[1], 2), ", ",
                                                     round(minor_axis1[2], 2), ")"), 
     pos = 3, col = "red", cex = 1)

text(minor_axis2[1], minor_axis2[2], labels = paste0("(", round(minor_axis2[1], 2), ", ",
                                                     round(minor_axis2[2], 2), ")"), 
     pos = 1, col = "red", cex = 1)

#e)
#Computing standard errors.
s_i <- sqrt(diag(s_cov) / n)
s_i

##    x1_Cr    x2_Sr 
## 4.969772 8.356912

#Computing F-distribution critical value.
alpha <- 0.05  #At 95% confidence level.
F_crit <- qf(1 - alpha, p, n - p)
F_crit

## [1] 5.143253

#Computing the confidence intervals.
SCI_lower <- x_bar - sqrt(T2_CV) * s_i
SCI_upper <- x_bar + sqrt(T2_CV) * s_i

#Optional: Combining results into a nice data frame.
SCI_results <- data.frame(
  Variable = paste0("mu", 1:p),
  Lower_Bound = SCI_lower,
  Upper_Bound = SCI_upper
)

print(SCI_results)

##       Variable Lower_Bound Upper_Bound
## x1_Cr      mu1   -11.44271    22.99021
## x2_Sr      mu2   -11.06031    46.84031

#f)
#Computing the Bonferroni-adjusted t-value.
t_crit <- qt(1 - (alpha / (2 * p)), df = n - 1) 
t_crit

## [1] 2.841244

cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))

## Bonferroni-adjusted t-critical value: 2.8412

#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i

#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
  Variable = paste0("mu", 1:p),
  Lower_Bound = BCI_lower,
  Upper_Bound = BCI_upper
)

print(BCI_results)

##       Variable Lower_Bound Upper_Bound
## x1_Cr      mu1   -8.346586    19.89409
## x2_Sr      mu2   -5.854028    41.63403

#Computing the Bonferroni-adjusted t-value using the manually calculated critical t-val 
#to see if it matches the value I esteimated from t-table.
t_crit <- 2.998 #Adjusted t-value.
t_crit

## [1] 2.998

cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))

## Bonferroni-adjusted t-critical value: 2.9980

#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i

#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
  Variable = paste0("mu", 1:p),
  Lower_Bound = BCI_lower,
  Upper_Bound = BCI_upper
)

print(BCI_results)

##       Variable Lower_Bound Upper_Bound
## x1_Cr      mu1   -9.125626    20.67313
## x2_Sr      mu2   -7.164022    42.94402

#g)
#Graphing the Normal Q-Q probability plots.
qqnorm(mineraldf2$x1_Cr, main = "Chromium Normal Q-Q Probability Plot",
       xlab = "Theoretical Quantiles",
       ylab = "Sample Quantiles")
qqline(mineraldf$x1_Cr, col = "red", lwd = 2)

qqnorm(mineraldf2$x2_Sr,
       main = "Strontium Normal Q-Q Probability Plot",
       xlab = "Theoretical Quantiles",
       ylab = "Sample Quantiles")
qqline(mineraldf$x2_Sr, col = "red", lwd = 2)

#Graphing the Chi-square Q-Q Plot for Mahalanobis Distances.
mahal_dist <- mahalanobis(mineraldf2, center = x_bar, cov = s_cov)
print(mahal_dist)

##          2          3          4          5          6          7          8 
## 6.11372626 0.07922092 2.41292118 2.38355915 0.20443075 0.73204157 0.64198029 
##          9 
## 1.43211988

qqplot(qchisq(ppoints(nrow(mineraldf2)), df = 2), mahal_dist,
       main = "Chi-Square Q-Q Plot for Mahalanobis Distances",
       xlab = "Theoretical Quantiles",
       ylab = "Mahalanobis Distance")
abline(0, 1, col = "red")

#Grapphing the Scatter-Plot diagram.
plot(mineraldf2$x1_Cr, mineraldf2$x2_Sr,
     main = "Scatter Plot of Mineral Levels in Ancient Peruvian Hairs",
     xlab = "Chromium",
     ylab = "Strontium")

Ans:

After removing the outlier with the largest Mahalanobis distance, the overall inferences remain unchanged, as the analytical changes are minimal. Our hypothesis remains the same, and we still fail to reject the null hypothesis, indicating that the mean vector does not differ significantly from the hypothesized means. While the confidence intervals have shifted slightly—Chromium’s CI has narrowed, while Strontium’s has expanded—these changes do not suggest a meaningful pattern.

The Normal Q-Q plots show slight improvement, and the Chi-Square plot exhibits noticeable enhancement compared to the previous version. However, the data points still do not perfectly align with the 45-degree reference line, and a potential outlier may continue to skew the distribution. Additionally, the scatterplot lacks the expected elliptical shape characteristic of a bivariate normal distribution. Thus, these results may not accurately represent the inferences of the Confidence Intervals and Hypothesis Testing and their results remain unreliable.

Therefore, despite these minor adjustments, the data still do not exhibit bivariate normal behavior.

MA564 Homework 1

Michael Cajigal

Problem 1:

Problem 2: