As in Question 2, the aim of this question is to review some of the basic concepts you have learned in the prerequisite courses. The data set is given in the Excel file (tab “Question 3”).
samplemean<-mean(Q3$y)
samplemean
## [1] 2.3875
variance<-var(Q3$y)
variance
## [1] 1.297167
standarddev<-sd(Q3$y)
standarddev
## [1] 1.138932
mean(Q3$x)
## [1] 324.75
var(Q3$x)
## [1] 66387
sd(Q3$x)
## [1] 257.6567
cov(Q3$x,Q3$y)
## [1] -228.47
cor(Q3$x,Q3$y)
## [1] -0.7785558
Answer: Obviously, from the correlation coefficient, we can conclude that x an y are highly negatively correlated.
Solution: Assume that the population mean of y is distributed as normal distribution, which is justified by the central limit theorem. Then the 95% confidence interval is [0.1551928,4.619807].
CIlower<-mean(Q3$y)-1.96*sd(Q3$y)
CIupper<-mean(Q3$y)+1.96*sd(Q3$y)
CIlower
## [1] 0.1551928
CIupper
## [1] 4.619807
Solution: Calculate the mean and variance of W.
meanw<-3*mean(Q3$x)-4*mean(Q3$y)-1
varw<-9*var(Q3$x)+16*var(Q3$y)-2*3*4*cov(Q3$x,Q3$y)
Then the 90% confidence interval is [-313.68,2241.08].
CIlower1<-meanw-1.645*sqrt(varw)
CIupper1<-meanw+1.645*sqrt(varw)
CIlower1
## [1] -313.6793
CIupper1
## [1] 2241.079
fit<-lm(y~x,data=Q3)
summary(fit)
##
## Call:
## lm(formula = y ~ x, data = Q3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1091 -0.3904 -0.1039 0.4125 1.6439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5051229 0.3036164 11.545 1.53e-08 ***
## x -0.0034415 0.0007414 -4.642 0.000381 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7399 on 14 degrees of freedom
## Multiple R-squared: 0.6061, Adjusted R-squared: 0.578
## F-statistic: 21.55 on 1 and 14 DF, p-value: 0.000381
Answer: From the summary of result, I don’t find any violation of the assumptions of the model.
Answer:Using t-test to test for slope=1.
tstatistics<-(-0.0034415-1)/0.0007414
tstatistics
## [1] -1353.441
P-value is 0, so we reject the hypothesis. The slope of the regression is different from 1.
predictvalue<-1000*fit$coefficients[2]+fit$coefficients[1]
predictvalue
## x
## 0.06363588
Answer: We should not trust the predicted value, since x=1000 is outside of x value in the dataset.
The data set is given in the Excel file (tab “Question 4”).
plot(Q4$data,type="l")
samplemean<-mean(Q4$data)
samplemean
## [1] -1.914379
samplevariance<-var(Q4$data)
samplevariance
## [1] 0.2193034
sampleauto<-acf(Q4$data,lag.max = 1,type="covariance")
sampleauto$acf
## , , 1
##
## [,1]
## [1,] 0.217110372
## [2,] 0.009765988
sampleautocorr<-acf(Q4$data,lag.max = 1,type="correlation")
sampleautocorr$acf
## , , 1
##
## [,1]
## [1,] 1.00000000
## [2,] 0.04498167