Consider a bivariate normal population with μ1 = 0, μ2 = 2, σ11 = 2, σ22 = 1, and ρ12 = 0.5.
Ans:
\[ \displaystyle{\phi(x_{1}, x_{2}) = \frac{\sqrt{6}}{6 \pi} e^{\frac{-x_{1}^{2} + \sqrt{2}x_{1}x_{2}-2\sqrt{2}x_{1}-2x_{2}^{2}+8x_{2}-8}{3}}} \]
Manually calculated (Please refer to attached handwritten work).
Ans:
\[ \displaystyle{D^{2} = \frac{2x_{1}^{2} + 4\sqrt{2}x_{1} - 2\sqrt{2}x_{1}x_{2}-16x_{2} + 4x_{2}^{2} + 16}{3}} \]
Manually calculated (Please refer to attached handwritten work).
mu <- c(0, 2)
covmat <- matrix(c(2, ((sqrt(2))/2), ((sqrt(2))/2), 1), nrow = 2, byrow = TRUE)
#Calculating the inverse of the sample covariance.
inv_covmat <- solve(covmat, symmetric = TRUE)
print(inv_covmat)
## [,1] [,2]
## [1,] 0.6666667 -0.4714045
## [2,] -0.4714045 1.3333333
#
#
#
#
#
# b)
# Verifying eigen values and vectors.
ei1 <- eigen(covmat)
# Extract eigenvalues.
ei1_vals <- ei1$values
print(ei1_vals)
## [1] 2.3660254 0.6339746
# Extract eigenvectors.
ei1_vecs <- ei1$vectors
print(ei1_vecs)
## [,1] [,2]
## [1,] -0.8880738 0.4597008
## [2,] -0.4597008 -0.8880738
####### Optional: Normalizing vector so y = 1 (Mainly used to verify algebraic steps to
#finding eigenvectors before normalization).
# Normalize the first eigenvector so that the second component is 1.
#ei1_vec1 <- ei1_vecs[, 1]
#if (ei1_vec1[2] != 0) {
# norm_ei1_vec1 <- ei1_vec1 / ei1_vec1[2]
#} else {
# stop("The second component of the first eigenvector is zero; cannot normalize to 1.")
#}
#print(norm_ei1_vec1)
# Normalize the second eigenvector so that the second component is 1.
#ei1_vec2 <- ei1_vecs[, 2]
#if (ei1_vec2[2] != 0) {
# norm_ei1_vec2 <- ei1_vec2 / ei1_vec2[2]
#} else {
# stop("The second component of the second eigenvector is zero; cannot normalize #to 1.")
#}
#print(norm_ei1_vec2)
# Eigen values and vectors are the same as manual calculations when normalized.
#########
#c)
#Calculating the Major Axes Endpoints.
e1 <- ei1_vals[1]
ei1_vec1 <- ei1_vecs[, 1]
major_axis1 = mu + ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e1)) * ei1_vec1)
print(major_axis1)
## [1] -1.608372 1.167445
major_axis2 = mu - ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e1)) * ei1_vec1)
print(major_axis2)
## [1] 1.608372 2.832555
#Calculating the Minor Axes Endpoints.
e2 <- ei1_vals[2]
ei1_vec2 <- ei1_vecs[, 2]
minor_axis1 = mu + ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e2)) * ei1_vec2)
print(minor_axis1)
## [1] 0.430962 1.167445
minor_axis2 = mu - ((sqrt((qchisq(p = .5, df = 2, lower.tail = FALSE)) * e2)) * ei1_vec2)
print(minor_axis2)
## [1] -0.430962 2.832555
#c)
library(ellipse)
##
## Attaching package: 'ellipse'
## The following object is masked from 'package:graphics':
##
## pairs
library(MASS)
library(mvtnorm)
set.seed(123)
X <- mvrnorm(n=10000,mu=mu, Sigma=covmat)
elps <- t(t(ellipse(covmat, level=0.5, npoints=1000))+mu)
#Plotting the random data followign the covariance of sample.
plot(X[, 1], X[, 2], col = "black", pch = 1, xlab = "X1", ylab = "X2", main = "Bivariate
Normal Distribution with 50% Confidence Ellipse")
#Drawing confidence ellipse itself.
lines(elps, col = "red", lwd = 1)
#Plotting the mean coordinates.
points(mu[1], mu[2], col = "red", pch = 1, cex = 1)
text(mu[1], mu[2], labels = paste0("μ = (", round(mu[1], 2), ", ", round(mu[2], 2), ")"),
pos = 4, col = "red", cex = 0.8)
#Plotting the principal axes.
segments(mu[1], mu[2], major_axis1[1], major_axis1[2], col = "red", lwd = 1)
segments(mu[1], mu[2], minor_axis1[1], minor_axis1[2], col = "red", lwd = 1)
#Adding labeled coordinates for major and minor axes.
points(major_axis1[1], major_axis1[2], col = "red", pch = 19)
text(major_axis1[1], major_axis1[2], labels = paste0("(", round(major_axis1[1], 2), ", ",
round(major_axis1[2], 2), ")"),
pos = 3, col = "red", cex = 1)
points(major_axis2[1], major_axis2[2], col = "red", pch = 19)
text(major_axis2[1], major_axis2[2], labels = paste0("(", round(major_axis2[1], 2), ", ",
round(major_axis2[2], 2), ")"),
pos = 1, col = "red", cex = 1)
points(minor_axis1[1], minor_axis1[2], col = "red", pch = 19)
text(minor_axis1[1], minor_axis1[2], labels = paste0("(", round(minor_axis1[1], 2), ", ",
round(minor_axis1[2], 2), ")"),
pos = 3, col = "red", cex = 1)
points(minor_axis2[1], minor_axis2[2], col = "red", pch = 19)
text(minor_axis2[1], minor_axis2[2], labels = paste0("(", round(minor_axis2[1], 2), ", ",
round(minor_axis2[2], 2), ")"),
pos = 1, col = "red", cex = 1)
A physical anthropologist performed a mineral analysis of nine ancient Peruvian hairs. The results for the chromium (x1) and strontium (x2) levels, in parts per million (ppm), were as follows:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| x1(Cr) | 48 | 40.53 | 2.19 | 0.55 | 0.74 | 0.66 | 0.93 | 0.37 | 0.22 |
| x2(Sr) | 12.57 | 73.68 | 11.3 | 20.03 | 20.29 | 7.8 | 4.64 | 4.3 | 1.08 |
It is known that low levels (less than or equal to 0.100 ppm) of chromium suggest the presence of diabetes.
In parts b)-f) first perform calculations manually and verify the manual calculations using R.
#a)
#Creating Dataframe.
mineraldf <- data.frame(
x1_Cr = c(48, 40.53, 2.19, 0.55, 0.74, 0.66, 0.93, 0.37, 0.22),
x2_Sr = c(12.57, 73.68, 11.3, 20.03, 20.29, 7.8, 4.64, 4.3, 1.08)
)
#Calculating vector of sample means.
x_bar <- colMeans(mineraldf)
print(x_bar)
## x1_Cr x2_Sr
## 10.46556 17.29889
#Calculating sample covariance.
s_cov <- cov(mineraldf)
print(s_cov)
## x1_Cr x2_Sr
## x1_Cr 371.0078 252.7872
## x2_Sr 252.7872 492.0105
#Calculate inverse of sample covariance.
s_inv_cov <- solve(s_cov, symmetric = TRUE)
print(s_inv_cov)
## x1_Cr x2_Sr
## x1_Cr 0.004147146 -0.002130738
## x2_Sr -0.002130738 0.003127216
Ans:
| Sample Mean 1 (Cr) | Sample Mean 2 (Sr) | Sample Covariance | Inverse of Sample Cov. | |
|---|---|---|---|---|
| Manual Calculations | 10.46556 | 17.29889 | \(\begin{bmatrix} 371.0078 & 252.7872\\252.7872 & 492.0105 \end{bmatrix}\) | \(\begin{bmatrix} 0.004147146 & -0.002130738\\-0.002130738 & 0.0003127216 \end{bmatrix}\) |
| R Results | 691 | 172 | \(\begin{bmatrix} 371 & 253\\253 & 492 \end{bmatrix}\) | \(\begin{bmatrix} 0.00415 & -0.00213\\-0.00213 & 0.000313 \end{bmatrix}\) |
# b)
#Verifying eigen values and vectors.
ei2 <- eigen(s_cov)
#Eigenvalues.
ei2_vals <- ei2$values
print(ei2_vals)
## [1] 691.4357 171.5826
#Eigenvectors
ei2_vecs <- ei2$vectors
print(ei2_vecs)
## [,1] [,2]
## [1,] 0.6193693 -0.7850998
## [2,] 0.7850998 0.6193693
####### Optional: Normalizing vector so y = 1 (Mainly used to verify algebraic steps done
#manually to finding eigenvectors before normalization).
# Normalize the first eigenvector so that the second component is 1.
#ei2_vec1 <- ei2_vecs[, 1]
#if (ei2_vec1[2] != 0) {
# norm_ei2_vec1 <- ei2_vec1 / ei2_vec1[2]
#} else {
# stop("The second component of the first eigenvector is zero; cannot normalize to 1#.")
#}
#print(norm_ei2_vec1)
# Normalize the second eigenvector so that the second component is 1.
#ei2_vec2 <- ei2_vecs[, 2]
#if (ei2_vec2[2] != 0) {
# norm_ei2_vec2 <- ei2_vec2 / ei2_vec2[2]
#} else {
# stop("The second component of the second eigenvector is zero; cannot normalize to #1.")
#}
#print(norm_ei2_vec2)
Ans:
| Eigen Value 1 | Eigen Value 2 | Eigen Vector 1 | Eigen Vector2 | |
|---|---|---|---|---|
| Manual Calculations | 691.4356431 | 171.5826569 | \(\begin{bmatrix} 0.619362199 \\0.785105386 \end{bmatrix}\) | \(\begin{bmatrix} -0.785099773 \\0.619369314 \end{bmatrix}\) |
| R Results | 691 | 172 | \(\begin{bmatrix} 0.619 \\0.785 \end{bmatrix}\) | \(\begin{bmatrix} -0.785 \\0.619 \end{bmatrix}\) |
#c)
#Defining variables.
n <- nrow(mineraldf)
p <- ncol(mineraldf)
mu <- c(0.1, 15)
mean_diff <- x_bar - mu
#Computing Critical Value for T2.
alpha <- 0.05
F_crit <- qf(1 - alpha, p, n - p)
F_crit
## [1] 4.737414
T2_CV <- (p * (n - 1) / (n - p)) * F_crit
#Computing Statistical Value for T2.
T2_Stat <- n * t(mean_diff) %*% s_inv_cov %*% mean_diff
#Printing Critical Value and Statistical Value for T2 for comparison.
print(T2_CV)
## [1] 10.82838
print(T2_Stat)
## [,1]
## [1,] 3.245114
#Computing P-value as an alternative method for comparison.
F_Stat <- (((n-p)/((n-1)*p)) * T2_Stat)
print(F_Stat)
## [,1]
## [1,] 1.419737
F_Pval <- pf(F_Stat, 2, 7, lower.tail = FALSE)
print(F_Pval)
## [,1]
## [1,] 0.3036977
Ans:
| \(T^2\) Critical Value | \(T^2\) Test Value | F-Stat | P-Value | |
|---|---|---|---|---|
| Manual Calculations | 10.83428572 | 3.245116635 | 1.419738528 | \(p > \alpha = 0.05\) |
| R Results | 10.8 | 3.25 | 1.42 | 0.304 |
Because our \(T^2\) Test Value is less than our \(T^2\)Critical Value, our \(T^2\) does not lie within the Rejection Region (RR) and thus we cannot reject the null hypothesis.
Furthermore, we computed the P-value of our computed F-statistic \(= 0.304\). Thus, we cannot reject the null hypothesis because our P-value is greater than our significance level \(\alpha = 0.05\).
Therefore, we may conclude that there is not enough evidence to reject the claim that the mean vector does not significantly differ from [0.1, 15] at the 5% significance level.
#d)
library(ellipse)
set.seed(456)
X <- mvrnorm(n=10000,mu=x_bar, Sigma=s_cov)
# Plot data
plot(X[, 1], X[, 2], col = "black", pch = 1, xlab = "X1", ylab = "X2",
main = "95% Joint Confidence Ellipse for Chromium and Strontium")
#Computing the F-distribution scaling factor (Hotelling’s T²).
scale_factor_F <- sqrt((p * (n - 1)) / (n * (n - p)) * F_crit)
#Computing major and minor axis endpoints.
major_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]
print(major_axis1)
## x1_Cr x2_Sr
## 28.32985 39.94331
major_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]
print(major_axis2)
## x1_Cr x2_Sr
## -7.398743 -5.345529
minor_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]
print(minor_axis1)
## x1_Cr x2_Sr
## -0.8147768 26.1980014
minor_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]
print(minor_axis2)
## x1_Cr x2_Sr
## 21.745888 8.399776
#Generating the 95% confidence ellipse using Hotelling’s T² scaling.
ellipse_df <- as.data.frame(ellipse(s_cov, centre = x_bar, t = scale_factor_F))
colnames(ellipse_df) <- c("x", "y")
#Plotting the ellipse itself.
lines(ellipse_df, col = "red", lwd = 1)
#Plotting the mean.
points(x_bar[1], x_bar[2], col = "red", pch = 1, cex = 1.5)
text(x_bar[1], x_bar[2], labels = paste0("μ = (", round(x_bar[1], 2), ", ",
round(x_bar[2], 2), ")"),
pos = 4, col = "red", cex = 1)
#Plotting the major and minor axes endpoints.
points(major_axis1[1], major_axis1[2], col = "red", pch = 19, cex = 1.2)
points(major_axis2[1], major_axis2[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis1[1], minor_axis1[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis2[1], minor_axis2[2], col = "red", pch = 19, cex = 1.2)
#Drawing principle lines for major and minor axes.
segments(x_bar[1], x_bar[2], major_axis2[1], major_axis2[2], col = "red", lwd = 1)
segments(x_bar[1], x_bar[2], minor_axis2[1], minor_axis2[2], col = "red", lwd = 1)
#Labeling the axes coordinates.
text(major_axis1[1], major_axis1[2], labels = paste0("(", round(major_axis1[1], 2), ", ",
round(major_axis1[2], 2), ")"),
pos = 3, col = "red", cex = 1)
text(major_axis2[1], major_axis2[2], labels = paste0("(", round(major_axis2[1], 2), ", ",
round(major_axis2[2], 2), ")"),
pos = 1, col = "red", cex = 1)
text(minor_axis1[1], minor_axis1[2], labels = paste0("(", round(minor_axis1[1], 2), ", ",
round(minor_axis1[2], 2), ")"),
pos = 3, col = "red", cex = 1)
text(minor_axis2[1], minor_axis2[2], labels = paste0("(", round(minor_axis2[1], 2), ", ",
round(minor_axis2[2], 2), ")"),
pos = 1, col = "red", cex = 1)
Ans:
| Major Axis EP 1 | Major Axis EP 2 | Minor Axis EP 1 | Minor Axis EP 2 | |
|---|---|---|---|---|
| Manual Calculations | (28.33452775, 39.94964726) | (-7.40340775, -5.35186726) | (-0.81785167, 26.20043244) | (21.74897167, 8.397347564) |
| R Results | (28.3, 39.9) | (-7.40, -5.35) | (-0.815, 26.198) | (21.7, 8.4) |
#e)
#Computing standard errors.
s_i <- sqrt(diag(s_cov) / n)
s_i
## x1_Cr x2_Sr
## 6.420521 7.393770
#Computing F-distribution critical value.
alpha <- 0.05 #At 95% confidence level.
F_crit <- qf(1 - alpha, p, n - p)
F_crit
## [1] 4.737414
#Computing the confidence intervals.
SCI_lower <- x_bar - sqrt(T2_CV) * s_i
SCI_upper <- x_bar + sqrt(T2_CV) * s_i
#Optional: Combining results into a nice data frame.
SCI_results <- data.frame(
Variable = paste0("mu", 1:p),
Lower_Bound = SCI_lower,
Upper_Bound = SCI_upper
)
print(SCI_results)
## Variable Lower_Bound Upper_Bound
## x1_Cr mu1 -10.662129 31.59324
## x2_Sr mu2 -7.031418 41.62920
Ans:
| Mean 1 (Cr) Simultaneous CI | Mean 2 (Sr) Simultaneous CI | |
|---|---|---|
| Manual Calculations | (-10.66789035, 31.59901035) | (-7.03805572, 41.63583572) |
| R Results | (-10.66, 31.6) | (-7.03, 41.6) |
#f)
#Computing the Bonferroni-adjusted t-value.
t_crit <- qt(1 - (alpha / (2 * p)), df = n - 1)
t_crit
## [1] 2.751524
cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))
## Bonferroni-adjusted t-critical value: 2.7515
#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i
#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
Variable = paste0("mu", 1:p),
Lower_Bound = BCI_lower,
Upper_Bound = BCI_upper
)
print(BCI_results)
## Variable Lower_Bound Upper_Bound
## x1_Cr mu1 -7.200659 28.13177
## x2_Sr mu2 -3.045244 37.64302
#Computing the Bonferroni-adjusted t-value using the manually calculated critical t-val to
#see if it matches the value I esteimated from t-table.
t_crit <- 2.896 #Adjusted t-value.
t_crit
## [1] 2.896
cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))
## Bonferroni-adjusted t-critical value: 2.8960
#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i
#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
Variable = paste0("mu", 1:p),
Lower_Bound = BCI_lower,
Upper_Bound = BCI_upper
)
print(BCI_results)
## Variable Lower_Bound Upper_Bound
## x1_Cr mu1 -8.128272 29.05938
## x2_Sr mu2 -4.113470 38.71125
Ans:
| Mean 1 (Cr) Simultaneous CI | Mean 2 (Sr) Simultaneous CI | |
|---|---|---|
| Manual Calculations | (-8.12826859, 29.05938859) | (-4.11346763, 38.71124763) |
| R Results (t = 2.75) | (-7.20, 28.1) | (-3.05, 37.6) |
| R Results (t = 2.896) | (-8.13, 29.1) | (-4.11, 38.7) |
#g)
#Graphing the Normal Q-Q probability plots.
qqnorm(mineraldf$x1_Cr, main = "Chromium Normal Q-Q Probability Plot",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles")
qqline(mineraldf$x1_Cr, col = "red", lwd = 2)
qqnorm(mineraldf$x2_Sr, main = "Strontium Normal Q-Q Probability Plot",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles")
qqline(mineraldf$x2_Sr, col = "red", lwd = 2)
#Graphing the Chi-square Q-Q Plot for Mahalanobis Distances.
mahal_dist <- mahalanobis(mineraldf, center = x_bar, cov = s_cov)
print(mahal_dist)
## [1] 6.6689704 6.4658858 0.1849973 0.5464687 0.5442094 0.2839870 0.3638145
## [8] 0.3918495 0.5498174
qqplot(qchisq(ppoints(nrow(mineraldf)), df = 2), mahal_dist,
main = "Chi-Square Q-Q Plot for Mahalanobis Distances",
xlab = "Theoretical Quantiles", ylab = "Mahalanobis Distance")
abline(0, 1, col = "red")
#Grapphing the Scatter-Plot diagram.
plot(mineraldf$x1_Cr, mineraldf$x2_Sr,
main = "Scatter Plot of Mineral Levels in Ancient Peruvian Hairs",
xlab = "Chromium",
ylab = "Strontium")
Ans:
Based off the Normal Q-Q Plot and the Q-Q Plot for Mahalanobis Distances, there are strong deviations from normaility with extreme outliers towards theright end for both Chromium and Strontium univariate wise and multivariate wise suggesting that they are not normally distributed. The scatter plot also shows no obvious elliptical pattern and a cluster of data towards zero solidifying that the variables follow a true bivariate normal distribution.
#h)
#Identifying the index of the observation with the largest Mahalanobis distance.
max_mahal_index <- which.max(mahal_dist)
#Creating a new dataset without this observation.
mineraldf2 <- mineraldf[-max_mahal_index, ]
print(mineraldf2)
## x1_Cr x2_Sr
## 2 40.53 73.68
## 3 2.19 11.30
## 4 0.55 20.03
## 5 0.74 20.29
## 6 0.66 7.80
## 7 0.93 4.64
## 8 0.37 4.30
## 9 0.22 1.08
Analysis Repeated
############REPEATING ANALYSIS#############
#a)
#Creating Dataframe.
#Calculating vector of sample means.
x_bar <- colMeans(mineraldf2)
print(x_bar)
## x1_Cr x2_Sr
## 5.77375 17.89000
#Calculating sample covariance.
s_cov <- cov(mineraldf2)
print(s_cov)
## x1_Cr x2_Sr
## x1_Cr 197.5891 317.4259
## x2_Sr 317.4259 558.7038
#Calculate inverse of sample covariance.
s_inv_cov <- solve(s_cov, symmetric = TRUE)
print(s_inv_cov)
## x1_Cr x2_Sr
## x1_Cr 0.05798953 -0.03294658
## x2_Sr -0.03294658 0.02050836
#b)
#Verifying eigen values and vectors.
ei2 <- eigen(s_cov)
#Eigenvalues.
ei2_vals <- ei2$values
print(ei2_vals)
## [1] 743.33153 12.96133
#Eigenvectors
ei2_vecs <- ei2$vectors
print(ei2_vecs)
## [,1] [,2]
## [1,] 0.5027788 -0.8644151
## [2,] 0.8644151 0.5027788
#c)
#Defining variables.
n <- nrow(mineraldf2)
p <- ncol(mineraldf2)
mu <- c(0.1, 15)
mean_diff <- x_bar - mu
#Computing Critical Value for T2.
alpha <- 0.05
F_crit <- qf(1 - alpha, p, n - p)
F_crit
## [1] 5.143253
T2_CV <- (p * (n - 1) / (n - p)) * F_crit
#Computing Statistical Value for T2.
T2_Stat <- n * t(mean_diff) %*% s_inv_cov %*% mean_diff
#Printing Critical Value and Statistical Value for T2 for comparison.
print(T2_CV)
## [1] 12.00092
print(T2_Stat)
## [,1]
## [1,] 7.660761
#Computing P-value as an alternative method for comparison.
F_Stat <- (((n-p)/((n-1)*p)) * T2_Stat)
print(F_Stat)
## [,1]
## [1,] 3.283183
F_Pval <- pf(F_Stat, 2, n, lower.tail = FALSE)
print(F_Pval)
## [,1]
## [1,] 0.09098189
Ans:
After hypothesis testing and excluding the extreme outlier observation, our decision does not change and remains to not reject the null hypothesis. There is still not enough evidence that the mean vector differs significantly from [0.1, 15] at the 5% significance level.
#d)
set.seed(789)
X <- mvrnorm(n=10000,mu=x_bar, Sigma=s_cov)
# Plot data
plot(X[, 1], X[, 2], col = "black", pch = 1, xlab = "X1", ylab = "X2",
main = "95% Joint Confidence Ellipse for Chromium and Strontium")
#Computing the F-distribution scaling factor (Hotelling’s T²).
scale_factor_F <- sqrt((p * (n - 1)) / (n * (n - p)) * F_crit)
#Computing major and minor axis endpoints.
major_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]
print(major_axis1)
## x1_Cr x2_Sr
## 22.56297 46.75529
major_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[1]) * ei2_vecs[,1]
print(major_axis2)
## x1_Cr x2_Sr
## -11.01547 -10.97529
minor_axis1 <- x_bar + scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]
print(minor_axis1)
## x1_Cr x2_Sr
## 1.962131 20.106991
minor_axis2 <- x_bar - scale_factor_F * sqrt(ei2_vals[2]) * ei2_vecs[,2]
print(minor_axis2)
## x1_Cr x2_Sr
## 9.585369 15.673009
#Generating the 95% confidence ellipse using Hotelling’s T² scaling.
ellipse_df <- as.data.frame(ellipse(s_cov, centre = x_bar, t = scale_factor_F))
colnames(ellipse_df) <- c("x", "y")
#Plotting the ellipse itself.
lines(ellipse_df, col = "red", lwd = 1)
#Plotting the mean.
points(x_bar[1], x_bar[2], col = "red", pch = 1, cex = 1.5)
text(x_bar[1], x_bar[2], labels = paste0("μ = (", round(x_bar[1], 2), ", ",
round(x_bar[2], 2), ")"),
pos = 4, col = "red", cex = 1)
#Plotting the major and minor axes endpoints.
points(major_axis1[1], major_axis1[2], col = "red", pch = 19, cex = 1.2)
points(major_axis2[1], major_axis2[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis1[1], minor_axis1[2], col = "red", pch = 19, cex = 1.2)
points(minor_axis2[1], minor_axis2[2], col = "red", pch = 19, cex = 1.2)
#Drawing principle lines for major and minor axes.
segments(x_bar[1], x_bar[2], major_axis2[1], major_axis2[2], col = "red", lwd = 1)
segments(x_bar[1], x_bar[2], minor_axis2[1], minor_axis2[2], col = "red", lwd = 1)
#Labeling the axes coordinates.
text(major_axis1[1], major_axis1[2], labels = paste0("(", round(major_axis1[1], 2), ", ",
round(major_axis1[2], 2), ")"),
pos = 3, col = "red", cex = 1)
text(major_axis2[1], major_axis2[2], labels = paste0("(", round(major_axis2[1], 2), ", ",
round(major_axis2[2], 2), ")"),
pos = 1, col = "red", cex = 1)
text(minor_axis1[1], minor_axis1[2], labels = paste0("(", round(minor_axis1[1], 2), ", ",
round(minor_axis1[2], 2), ")"),
pos = 3, col = "red", cex = 1)
text(minor_axis2[1], minor_axis2[2], labels = paste0("(", round(minor_axis2[1], 2), ", ",
round(minor_axis2[2], 2), ")"),
pos = 1, col = "red", cex = 1)
#e)
#Computing standard errors.
s_i <- sqrt(diag(s_cov) / n)
s_i
## x1_Cr x2_Sr
## 4.969772 8.356912
#Computing F-distribution critical value.
alpha <- 0.05 #At 95% confidence level.
F_crit <- qf(1 - alpha, p, n - p)
F_crit
## [1] 5.143253
#Computing the confidence intervals.
SCI_lower <- x_bar - sqrt(T2_CV) * s_i
SCI_upper <- x_bar + sqrt(T2_CV) * s_i
#Optional: Combining results into a nice data frame.
SCI_results <- data.frame(
Variable = paste0("mu", 1:p),
Lower_Bound = SCI_lower,
Upper_Bound = SCI_upper
)
print(SCI_results)
## Variable Lower_Bound Upper_Bound
## x1_Cr mu1 -11.44271 22.99021
## x2_Sr mu2 -11.06031 46.84031
#f)
#Computing the Bonferroni-adjusted t-value.
t_crit <- qt(1 - (alpha / (2 * p)), df = n - 1)
t_crit
## [1] 2.841244
cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))
## Bonferroni-adjusted t-critical value: 2.8412
#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i
#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
Variable = paste0("mu", 1:p),
Lower_Bound = BCI_lower,
Upper_Bound = BCI_upper
)
print(BCI_results)
## Variable Lower_Bound Upper_Bound
## x1_Cr mu1 -8.346586 19.89409
## x2_Sr mu2 -5.854028 41.63403
#Computing the Bonferroni-adjusted t-value using the manually calculated critical t-val
#to see if it matches the value I esteimated from t-table.
t_crit <- 2.998 #Adjusted t-value.
t_crit
## [1] 2.998
cat(sprintf("Bonferroni-adjusted t-critical value: %.4f\n", t_crit))
## Bonferroni-adjusted t-critical value: 2.9980
#Computing the Bonferroni confidence intervals.
BCI_lower <- x_bar - t_crit * s_i
BCI_upper <- x_bar + t_crit * s_i
#Optional: Combining results into a nice data frame.
BCI_results <- data.frame(
Variable = paste0("mu", 1:p),
Lower_Bound = BCI_lower,
Upper_Bound = BCI_upper
)
print(BCI_results)
## Variable Lower_Bound Upper_Bound
## x1_Cr mu1 -9.125626 20.67313
## x2_Sr mu2 -7.164022 42.94402
#g)
#Graphing the Normal Q-Q probability plots.
qqnorm(mineraldf2$x1_Cr, main = "Chromium Normal Q-Q Probability Plot",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles")
qqline(mineraldf$x1_Cr, col = "red", lwd = 2)
qqnorm(mineraldf2$x2_Sr,
main = "Strontium Normal Q-Q Probability Plot",
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles")
qqline(mineraldf$x2_Sr, col = "red", lwd = 2)
#Graphing the Chi-square Q-Q Plot for Mahalanobis Distances.
mahal_dist <- mahalanobis(mineraldf2, center = x_bar, cov = s_cov)
print(mahal_dist)
## 2 3 4 5 6 7 8
## 6.11372626 0.07922092 2.41292118 2.38355915 0.20443075 0.73204157 0.64198029
## 9
## 1.43211988
qqplot(qchisq(ppoints(nrow(mineraldf2)), df = 2), mahal_dist,
main = "Chi-Square Q-Q Plot for Mahalanobis Distances",
xlab = "Theoretical Quantiles",
ylab = "Mahalanobis Distance")
abline(0, 1, col = "red")
#Grapphing the Scatter-Plot diagram.
plot(mineraldf2$x1_Cr, mineraldf2$x2_Sr,
main = "Scatter Plot of Mineral Levels in Ancient Peruvian Hairs",
xlab = "Chromium",
ylab = "Strontium")
Ans:
After removing the outlier with the largest Mahalanobis distance, the overall inferences remain unchanged, as the analytical changes are minimal. Our hypothesis remains the same, and we still fail to reject the null hypothesis, indicating that the mean vector does not differ significantly from the hypothesized means. While the confidence intervals have shifted slightly—Chromium’s CI has narrowed, while Strontium’s has expanded—these changes do not suggest a meaningful pattern.
The Normal Q-Q plots show slight improvement, and the Chi-Square plot exhibits noticeable enhancement compared to the previous version. However, the data points still do not perfectly align with the 45-degree reference line, and a potential outlier may continue to skew the distribution. Additionally, the scatterplot lacks the expected elliptical shape characteristic of a bivariate normal distribution. Thus, these results may not accurately represent the inferences of the Confidence Intervals and Hypothesis Testing and their results remain unreliable.
Therefore, despite these minor adjustments, the data still do not exhibit bivariate normal behavior.