As in Question 2, the aim of this question is to review some of the basic concepts you have learned in the prerequisite courses. The data set is given in the Excel file (tab “Question 3”).
samplemean<-mean(Q3$y)
samplemean
## [1] 2.3875
variance<-var(Q3$y)
variance
## [1] 1.297167
standarddev<-sd(Q3$y)
standarddev
## [1] 1.138932
mean(Q3$x)
## [1] 324.75
var(Q3$x)
## [1] 66387
sd(Q3$x)
## [1] 257.6567
cov(Q3$x,Q3$y)
## [1] -228.47
cor(Q3$x,Q3$y)
## [1] -0.7785558
Answer:
Obviously, from the correlation coefficient, we can conclude that x an y are highly negatively correlated.
Solution:
In the sample, n = 16 which is less than 30, so we use the t-distribution to find a confidence interval.
t.test(Q3$y,conf.level = 0.95)
##
## One Sample t-test
##
## data: Q3$y
## t = 8.385, df = 15, p-value = 4.804e-07
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 1.780606 2.994394
## sample estimates:
## mean of x
## 2.3875
From the above data, it is easy to see the 95% confidence interval is [1.780606,2.994394].
Solution:
Let W = 3X - 4Y -1 be a new variable.
w <- 3*Q3$x - 4*Q3$y - 1
t.test(w,conf.level = 0.90)
##
## One Sample t-test
##
## data: w
## t = 4.9642, df = 15, p-value = 0.0001698
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 623.3793 1304.0207
## sample estimates:
## mean of x
## 963.7
From the above data, it is easy to see the 90% confidence interval is [623.3793,1304.0207].
State your predicted model.
\(\hat{\Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i\)
Any violation of the assumptions stated in part (1)? Why?
fit<-lm(y~x,data=Q3)
summary(fit)
##
## Call:
## lm(formula = y ~ x, data = Q3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1091 -0.3904 -0.1039 0.4125 1.6439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.5051229 0.3036164 11.545 1.53e-08 ***
## x -0.0034415 0.0007414 -4.642 0.000381 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7399 on 14 degrees of freedom
## Multiple R-squared: 0.6061, Adjusted R-squared: 0.578
## F-statistic: 21.55 on 1 and 14 DF, p-value: 0.000381
Answer: From the summary of result, I don’t find any violation of the assumptions of the model.
Answer:Using t-test to test for slope=1.
tstatistics<-(-0.0034415-1)/0.0007414
tstatistics
## [1] -1353.441
P-value is 0, so we reject the hypothesis. There is enough evidence to show that the slope of the regression is different from 1.
predictvalue<-1000*fit$coefficients[2]+fit$coefficients[1]
predictvalue
## x
## 0.06363588
Answer: We should not trust the predicted value, since x=1000 is outside of x value in the dataset.(Consider as an outlier.)
The data set is given in the Excel file (tab “Question 4”).
plot(Q4$data,type="l")
samplemean<-mean(Q4$data)
samplemean
## [1] -1.914379
samplevariance<-var(Q4$data)
samplevariance
## [1] 0.2193034
sampleauto<-acf(Q4$data,lag.max = 1,type="covariance",plot=FALSE)
sampleauto$acf
## , , 1
##
## [,1]
## [1,] 0.217110372
## [2,] 0.009765988
sampleautocorr<-acf(Q4$data,lag.max = 1,type="correlation",plot=FALSE)
sampleautocorr$acf
## , , 1
##
## [,1]
## [1,] 1.00000000
## [2,] 0.04498167