Question 03

As in Question 2, the aim of this question is to review some of the basic concepts you have learned in the prerequisite courses. The data set is given in the Excel file (tab “Question 3”).

  1. Obtain the sample mean, sample variance and sample standard deviation for y.
samplemean<-mean(Q3$y)
samplemean
## [1] 2.3875
variance<-var(Q3$y)
variance
## [1] 1.297167
standarddev<-sd(Q3$y)
standarddev
## [1] 1.138932
  1. Repeat part (a) but for x.
mean(Q3$x)
## [1] 324.75
var(Q3$x)
## [1] 66387
sd(Q3$x)
## [1] 257.6567
  1. Obtain the sample covariance and sample correlation between x and y. What can you tell from the sample correlation on the association between x and y.
cov(Q3$x,Q3$y)
## [1] -228.47
cor(Q3$x,Q3$y)
## [1] -0.7785558

Answer: Obviously, from the correlation coefficient, we can conclude that x an y are highly negatively correlated.

  1. Find a 95% confidence interval for the population mean of Y . State all the necessary assumption.

Solution: Assume that the population mean of y is distributed as normal distribution, which is justified by the central limit theorem. Then the 95% confidence interval is [0.1551928,4.619807].

CIlower<-mean(Q3$y)-1.96*sd(Q3$y)
CIupper<-mean(Q3$y)+1.96*sd(Q3$y)
CIlower
## [1] 0.1551928
CIupper
## [1] 4.619807
  1. Let W = 3X − 4Y − 1. Based on the data, obtain a 90% confidence interval for the mean of W.

Solution: Calculate the mean and variance of W.

meanw<-3*mean(Q3$x)-4*mean(Q3$y)-1
varw<-9*var(Q3$x)+16*var(Q3$y)-2*3*4*cov(Q3$x,Q3$y)

Then the 90% confidence interval is [-313.68,2241.08].

CIlower1<-meanw-1.645*sqrt(varw)
CIupper1<-meanw+1.645*sqrt(varw)
CIlower1
## [1] -313.6793
CIupper1
## [1] 2241.079
  1. A simple linear regression model is used to model this data set. (1)(2) See the hand write paper.
  1. Any violation of the assumptions stated in part (1)? Why?
fit<-lm(y~x,data=Q3)
summary(fit)
## 
## Call:
## lm(formula = y ~ x, data = Q3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1091 -0.3904 -0.1039  0.4125  1.6439 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.5051229  0.3036164  11.545 1.53e-08 ***
## x           -0.0034415  0.0007414  -4.642 0.000381 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7399 on 14 degrees of freedom
## Multiple R-squared:  0.6061, Adjusted R-squared:  0.578 
## F-statistic: 21.55 on 1 and 14 DF,  p-value: 0.000381

Answer: From the summary of result, I don’t find any violation of the assumptions of the model.

  1. Is there any evidence that the slope of the linear regression model is different from 1?

Answer:Using t-test to test for slope=1.

tstatistics<-(-0.0034415-1)/0.0007414
tstatistics
## [1] -1353.441

P-value is 0, so we reject the hypothesis. The slope of the regression is different from 1.

  1. Based on the model obtained in part (2), predict the value of y at x = 1000. Should you trust your prediction? Why?
predictvalue<-1000*fit$coefficients[2]+fit$coefficients[1]
predictvalue
##          x 
## 0.06363588

Answer: We should not trust the predicted value, since x=1000 is outside of x value in the dataset.

Question 04

The data set is given in the Excel file (tab “Question 4”).

  1. Obtain the time plot and describe the trend.
plot(Q4$data,type="l")

  1. Calculate the sample mean.
samplemean<-mean(Q4$data)
samplemean
## [1] -1.914379
  1. Calculate the sample variance.
samplevariance<-var(Q4$data)
samplevariance
## [1] 0.2193034
  1. Calculate the sample autocovariance of lag 1.
sampleauto<-acf(Q4$data,lag.max = 1,type="covariance")

sampleauto$acf
## , , 1
## 
##             [,1]
## [1,] 0.217110372
## [2,] 0.009765988
  1. Calculate the sample autocorrelation of lag 1.
sampleautocorr<-acf(Q4$data,lag.max = 1,type="correlation")

sampleautocorr$acf
## , , 1
## 
##            [,1]
## [1,] 1.00000000
## [2,] 0.04498167
  1. See the hand write papaer.