#Creating Data frame
X = c(10,13,19,16,13,21,23,29,27,16,13,14,21,18,17,23,22,19,11,17,19,21,25,22)
Y = c(29,33,41,47,51,43,31,49,71,42,31,35,62,55,58,72,68,60,41,42,54,57,62,54)
Z = c(17,23,21,29,37,41,39,47,43,18,16,17,26,24,25,32,35,31,28,26,33,42,45,36)
DATA <- data.frame(X,Y,Z)
DATA
## X Y Z
## 1 10 29 17
## 2 13 33 23
## 3 19 41 21
## 4 16 47 29
## 5 13 51 37
## 6 21 43 41
## 7 23 31 39
## 8 29 49 47
## 9 27 71 43
## 10 16 42 18
## 11 13 31 16
## 12 14 35 17
## 13 21 62 26
## 14 18 55 24
## 15 17 58 25
## 16 23 72 32
## 17 22 68 35
## 18 19 60 31
## 19 11 41 28
## 20 17 42 26
## 21 19 54 33
## 22 21 57 42
## 23 25 62 45
## 24 22 54 36
1.Pearson’s R Correlation
# 1
#Pearson's R Correlation
with(DATA, cor.test(X,Y))
##
## Pearson's product-moment correlation
##
## data: X and Y
## t = 3.6458, df = 22, p-value = 0.001425
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2794903 0.8152634
## sample estimates:
## cor
## 0.6136956
with(DATA, cor.test(X,Z))
##
## Pearson's product-moment correlation
##
## data: X and Z
## t = 5.623, df = 22, p-value = 1.183e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5281076 0.8942831
## sample estimates:
## cor
## 0.767911
with(DATA, cor.test(Y,Z))
##
## Pearson's product-moment correlation
##
## data: Y and Z
## t = 2.7795, df = 22, p-value = 0.01093
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1339544 0.7574317
## sample estimates:
## cor
## 0.509803
Based on the above results, the values of their correlation coefficient indicates that there is a strong linear relationship between milk intake and weight, also the milk intake and Age. However there is a moderate linear relationship between weight and age.
2.Hypothesis Testing.
Are the associations for each pair in (1) significant at \(a = 0.05\). Explain.
A.For Correlation between Milk Intake(X) and Weight(Y)
Hypothesis:
\[H_{0} = r_{xy}=0\] \[vs\]
\[H_{1}=r_{xy}≠0\]
alpha:
\(a = 0.05\)
Test Statistic:t-test
\[t =
\frac{r}{\sqrt{\frac{1-r^2}{n-2}}}\] \[t =
\frac{0.6136956}{\sqrt{\frac{1-(0.6136956)^2}{24-2}}}\approx3.646\]
Decision Rule:
Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.074\), otherwise do
not reject \(H_{0}\).
Decision:
Reject \(H_{0}\), since
3.646>2.074
Conclusion:
At 5% level of significance, the data is sufficient to conclude that there is a linear relationship between Milk intake and weight of a person.
b.For Correlation between Milk Intake(X) and Age(Z)
Hypothesis:
\[H_{0} = r_{xz} = 0\] \[vs.\] \[H_{1} =
r_{xz} ≠ 0\]
\(a = 0.05\)
\(Test Statistic:t-test\)
\[t =
\frac{r}{\sqrt{\frac{1-r^2}{n-2}}}\] \[t =
\frac{0.767911}{\sqrt{\frac{1-(0.767911)^2}{24-2}}}\approx5.6229\]
Decision Rule:
Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.074\), otherwise do
not reject \(H_{0}\).
Decision:
Reject $H_{0}$, since 5.6229>2.074
Conclusion:
At 5% level of significance, the data is sufficient to conclude that there is a linear relationship between Milk intake and age of a person.
Decision:
Reject $H_{0}$, since 2.7795>2.074
Conclusion:
At 5% level of significance, the data is sufficient to conclude that there is a linear relationship between Weight and age of a person.
t.test(DATA$X,DATA$Y)
##
## Welch Two Sample t-test
##
## data: DATA$X and DATA$Y
## t = -10.915, df = 29.644, p-value = 6.648e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -36.55590 -25.02744
## sample estimates:
## mean of x mean of y
## 18.70833 49.50000
t.test(DATA$X,DATA$Z)
##
## Welch Two Sample t-test
##
## data: DATA$X and DATA$Z
## t = -5.3944, df = 34.763, p-value = 4.952e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.172994 -7.327006
## sample estimates:
## mean of x mean of y
## 18.70833 30.45833
t.test(DATA$Y,DATA$Z)
##
## Welch Two Sample t-test
##
## data: DATA$Y and DATA$Z
## t = 5.8333, df = 42.164, p-value = 6.81e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.45478 25.62855
## sample estimates:
## mean of x mean of y
## 49.50000 30.45833
3.Partial Correlation
#Partial Correlations
#install.packages("ppcor")
#install.packages("MASS")
library(ppcor)
## Loading required package: MASS
library(MASS)
pcor(DATA)
## $estimate
## X Y Z
## X 1.0000000 0.40324146 0.66993901
## Y 0.4032415 1.00000000 0.07620291
## Z 0.6699390 0.07620291 1.00000000
##
## $p.value
## X Y Z
## X 0.0000000000 0.0563976 0.0004702747
## Y 0.0563975998 0.0000000 0.7296590106
## Z 0.0004702747 0.7296590 0.0000000000
##
## $statistic
## X Y Z
## X 0.000000 2.0193393 4.1352095
## Y 2.019339 0.0000000 0.3502239
## Z 4.135209 0.3502239 0.0000000
##
## $n
## [1] 24
##
## $gp
## [1] 1
##
## $method
## [1] "pearson"
Hypothesis Testing:
Hypothesis:
\[H_{0} = r_{xy,z}=0\]
\[vs.\]
\[H_{1} = r_{xy,z} ≠ 0\]
Alpha:
\(a= 0.05\)
\(Test Statistic:t-test\)
\[t=
\frac{r_{p}{\sqrt{n-v}}}{\sqrt{1-r^2}}\]
\[t=
\frac{0.40324146{\sqrt{24-3}}}{\sqrt{1-(0.40324146)^2}}\approx2.01909\]
Decision Rule:
Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.08\), otherwise do
not reject \(H_{0}\).
Decision:
Do not reject \(H_{0}\), since 2.019
< 2.074.
Conclusion:
At 5% level of significance, the data is not sufficient to conclude that there is a linear relationship between milk intake and weight of a person controling for age.
4
#Multiple Correlations
#Compute for R squared
M <- lm(Y ~ X + Z, data = DATA)
summary(M)
##
## Call:
## lm(formula = Y ~ X + Z, data = DATA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.645 -6.748 1.635 8.329 16.252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.2210 8.7153 2.205 0.0387 *
## X 1.4097 0.6981 2.019 0.0564 .
## Z 0.1282 0.3661 0.350 0.7297
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.63 on 21 degrees of freedom
## Multiple R-squared: 0.3802, Adjusted R-squared: 0.3212
## F-statistic: 6.442 on 2 and 21 DF, p-value: 0.006582
r.squared <- summary(M)$r.squared
multiple.R <- sqrt(r.squared)
multiple.R
## [1] 0.6166378
r.squared
## [1] 0.3802422
#Compute for R adjusted
#Adjusted R2
r.adjsquared <- summary(M)$adj.r.squared
r.adjsquared
## [1] 0.3212177
multiple.Radj <- sqrt(r.adjsquared)
multiple.Radj
## [1] 0.5667607
Based on the result, the value of r adjusted is not large enough to conclude that the additional input variable is adding value to the model. It also indicates the pattern does not generally follow the movements of the model/graph.
5
n=nrow(DATA)
k=ncol(DATA)-1
R2=r.adjsquared
Fc=((n-k-1)*R2)/(k*(1-R2))
Fc
## [1] 4.968876
#Computed F-value
#solve for the critical value
qf(0.05,2,21, lower.tail = FALSE)
## [1] 3.4668
#Solve for p-value
pf(Fc,2,21, lower.tail = FALSE)
## [1] 0.01710687
Hypothesis Testing:
\(H_{0}:ρ^2 =0\)vs.\(H_{0}:ρ^2≠0\)
\(a= 0.05\)
\[ F =
\frac{(n-k-1)\tilde{R}^2}{k(1-\tilde{R}^2)}\]
\[ F =
\frac{(24-2-1)0.3212177}{2(1-0.3212177)}\approx4.969\]
Decision Rule: Reject \(H_{0}\) if F statistic is greater than F critical region.
#Computing for Critical region.
qf(0.05,2,21, lower.tail = FALSE)
## [1] 3.4668
Decision:
Do not Reject \(H_{0}\) since 4.4969 > 3.4668.
Conclusion:
At 5% level of significance, the data is sufficient to conclude that there is a linear association between milk intake and age.