#Creating Data frame

X = c(10,13,19,16,13,21,23,29,27,16,13,14,21,18,17,23,22,19,11,17,19,21,25,22)
Y = c(29,33,41,47,51,43,31,49,71,42,31,35,62,55,58,72,68,60,41,42,54,57,62,54)
Z = c(17,23,21,29,37,41,39,47,43,18,16,17,26,24,25,32,35,31,28,26,33,42,45,36)

DATA <- data.frame(X,Y,Z)
DATA
##     X  Y  Z
## 1  10 29 17
## 2  13 33 23
## 3  19 41 21
## 4  16 47 29
## 5  13 51 37
## 6  21 43 41
## 7  23 31 39
## 8  29 49 47
## 9  27 71 43
## 10 16 42 18
## 11 13 31 16
## 12 14 35 17
## 13 21 62 26
## 14 18 55 24
## 15 17 58 25
## 16 23 72 32
## 17 22 68 35
## 18 19 60 31
## 19 11 41 28
## 20 17 42 26
## 21 19 54 33
## 22 21 57 42
## 23 25 62 45
## 24 22 54 36

1.Pearson’s R Correlation

# 1
#Pearson's R Correlation

with(DATA, cor.test(X,Y))
## 
##  Pearson's product-moment correlation
## 
## data:  X and Y
## t = 3.6458, df = 22, p-value = 0.001425
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2794903 0.8152634
## sample estimates:
##       cor 
## 0.6136956
with(DATA, cor.test(X,Z))
## 
##  Pearson's product-moment correlation
## 
## data:  X and Z
## t = 5.623, df = 22, p-value = 1.183e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5281076 0.8942831
## sample estimates:
##      cor 
## 0.767911
with(DATA, cor.test(Y,Z))
## 
##  Pearson's product-moment correlation
## 
## data:  Y and Z
## t = 2.7795, df = 22, p-value = 0.01093
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1339544 0.7574317
## sample estimates:
##      cor 
## 0.509803

Based on the above results, the values of their correlation coefficient indicates that there is a strong linear relationship between milk intake and weight, also the milk intake and Age. However there is a moderate linear relationship between weight and age.

2.Hypothesis Testing.

Are the associations for each pair in (1) significant at \(a = 0.05\). Explain.

A.For Correlation between Milk Intake(X) and Weight(Y)
Hypothesis:
\[H_{0} = r_{xy}=0\] \[vs\]
\[H_{1}=r_{xy}≠0\]
alpha:
\(a = 0.05\)
Test Statistic:t-test
\[t = \frac{r}{\sqrt{\frac{1-r^2}{n-2}}}\] \[t = \frac{0.6136956}{\sqrt{\frac{1-(0.6136956)^2}{24-2}}}\approx3.646\]
Decision Rule:
Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.074\), otherwise do not reject \(H_{0}\).
Decision:
Reject \(H_{0}\), since 3.646>2.074
Conclusion:

  At 5% level of significance, the data is sufficient to conclude that there is a linear relationship between Milk intake and weight of a person.

b.For Correlation between Milk Intake(X) and Age(Z)
Hypothesis:
\[H_{0} = r_{xz} = 0\] \[vs.\] \[H_{1} = r_{xz} ≠ 0\]
\(a = 0.05\)
\(Test Statistic:t-test\)
\[t = \frac{r}{\sqrt{\frac{1-r^2}{n-2}}}\] \[t = \frac{0.767911}{\sqrt{\frac{1-(0.767911)^2}{24-2}}}\approx5.6229\]
Decision Rule:
Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.074\), otherwise do not reject \(H_{0}\).

Decision:

Reject $H_{0}$, since 5.6229>2.074            

Conclusion:

  At 5% level of significance, the data is sufficient to conclude that there is a linear relationship between Milk intake and age of a person.      
  1. For Correlation between Weight(Y) and age(Z)
    Hypothesis:
    \[H_{0} = r_{yz} = 0\] \[vs.\] \[H_{1} = r_{yz} ≠ 0\]
    \(a = 0.05\)
    \(Test Statistic:t-test\)
    \[t = \frac{r}{\sqrt{\frac{1-r^2}{n-2}}}\] \[t = \frac{0.509803}{\sqrt{\frac{1-(0.509803)^2}{24-2}}}\approx2.7795\]
    Decision Rule:
    Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.074\), otherwise do not reject \(H_{0}\).

Decision:

Reject $H_{0}$, since 2.7795>2.074       

Conclusion:

  At 5% level of significance, the data is sufficient to conclude that there is a linear relationship between Weight and age of a person.       
t.test(DATA$X,DATA$Y)
## 
##  Welch Two Sample t-test
## 
## data:  DATA$X and DATA$Y
## t = -10.915, df = 29.644, p-value = 6.648e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -36.55590 -25.02744
## sample estimates:
## mean of x mean of y 
##  18.70833  49.50000
t.test(DATA$X,DATA$Z)
## 
##  Welch Two Sample t-test
## 
## data:  DATA$X and DATA$Z
## t = -5.3944, df = 34.763, p-value = 4.952e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.172994  -7.327006
## sample estimates:
## mean of x mean of y 
##  18.70833  30.45833
t.test(DATA$Y,DATA$Z)
## 
##  Welch Two Sample t-test
## 
## data:  DATA$Y and DATA$Z
## t = 5.8333, df = 42.164, p-value = 6.81e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.45478 25.62855
## sample estimates:
## mean of x mean of y 
##  49.50000  30.45833

3.Partial Correlation

#Partial Correlations
#install.packages("ppcor")
#install.packages("MASS")
library(ppcor)
## Loading required package: MASS
library(MASS)
pcor(DATA)
## $estimate
##           X          Y          Z
## X 1.0000000 0.40324146 0.66993901
## Y 0.4032415 1.00000000 0.07620291
## Z 0.6699390 0.07620291 1.00000000
## 
## $p.value
##              X         Y            Z
## X 0.0000000000 0.0563976 0.0004702747
## Y 0.0563975998 0.0000000 0.7296590106
## Z 0.0004702747 0.7296590 0.0000000000
## 
## $statistic
##          X         Y         Z
## X 0.000000 2.0193393 4.1352095
## Y 2.019339 0.0000000 0.3502239
## Z 4.135209 0.3502239 0.0000000
## 
## $n
## [1] 24
## 
## $gp
## [1] 1
## 
## $method
## [1] "pearson"

Hypothesis Testing:

Hypothesis:
\[H_{0} = r_{xy,z}=0\]
\[vs.\]
\[H_{1} = r_{xy,z} ≠ 0\]
Alpha:
\(a= 0.05\)
\(Test Statistic:t-test\)
\[t= \frac{r_{p}{\sqrt{n-v}}}{\sqrt{1-r^2}}\]
\[t= \frac{0.40324146{\sqrt{24-3}}}{\sqrt{1-(0.40324146)^2}}\approx2.01909\] Decision Rule:
Reject \(H_{0}\) if \(t_{0.025},_{22} > 2.08\), otherwise do not reject \(H_{0}\).

Decision:
Do not reject \(H_{0}\), since 2.019 < 2.074.

Conclusion:

  At 5% level of significance, the data is not sufficient to conclude that there is a linear relationship between milk intake  and weight of a person controling for age.

4

#Multiple Correlations

#Compute for R squared
M <- lm(Y ~ X + Z, data = DATA)
summary(M)
## 
## Call:
## lm(formula = Y ~ X + Z, data = DATA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.645  -6.748   1.635   8.329  16.252 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  19.2210     8.7153   2.205   0.0387 *
## X             1.4097     0.6981   2.019   0.0564 .
## Z             0.1282     0.3661   0.350   0.7297  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.63 on 21 degrees of freedom
## Multiple R-squared:  0.3802, Adjusted R-squared:  0.3212 
## F-statistic: 6.442 on 2 and 21 DF,  p-value: 0.006582
r.squared <- summary(M)$r.squared
multiple.R <- sqrt(r.squared)
multiple.R
## [1] 0.6166378
r.squared
## [1] 0.3802422
#Compute for R adjusted
#Adjusted R2
r.adjsquared <- summary(M)$adj.r.squared
r.adjsquared
## [1] 0.3212177
multiple.Radj <- sqrt(r.adjsquared)
multiple.Radj
## [1] 0.5667607

Based on the result, the value of r adjusted is not large enough to conclude that the additional input variable is adding value to the model. It also indicates the pattern does not generally follow the movements of the model/graph.

5

n=nrow(DATA)
k=ncol(DATA)-1 
R2=r.adjsquared
Fc=((n-k-1)*R2)/(k*(1-R2))
Fc
## [1] 4.968876
#Computed F-value 
#solve for the critical value
qf(0.05,2,21, lower.tail = FALSE)
## [1] 3.4668
#Solve for p-value
pf(Fc,2,21, lower.tail = FALSE)
## [1] 0.01710687

Hypothesis Testing:

\(H_{0}:ρ^2 =0\)vs.\(H_{0}:ρ^2≠0\)

\(a= 0.05\)

\[ F = \frac{(n-k-1)\tilde{R}^2}{k(1-\tilde{R}^2)}\]
\[ F = \frac{(24-2-1)0.3212177}{2(1-0.3212177)}\approx4.969\]

Decision Rule: Reject \(H_{0}\) if F statistic is greater than F critical region.

#Computing for Critical region.
qf(0.05,2,21, lower.tail = FALSE)
## [1] 3.4668

Decision:

Do not Reject \(H_{0}\) since 4.4969 > 3.4668.

Conclusion:

At 5% level of significance, the data is sufficient to conclude that there is a linear association between milk intake and age.