Question 03

As in Question 2, the aim of this question is to review some of the basic concepts you have learned in the prerequisite courses. The data set is given in the Excel file (tab “Question 3”).

  1. Obtain the sample mean, sample variance and sample standard deviation for y.
samplemean<-mean(Q3$y)
samplemean
## [1] 2.3875
variance<-var(Q3$y)
variance
## [1] 1.297167
standarddev<-sd(Q3$y)
standarddev
## [1] 1.138932
  1. Repeat part (a) but for x.
mean(Q3$x)
## [1] 324.75
var(Q3$x)
## [1] 66387
sd(Q3$x)
## [1] 257.6567
  1. Obtain the sample covariance and sample correlation between x and y. What can you tell from the sample correlation on the association between x and y.
cov(Q3$x,Q3$y)
## [1] -228.47
cor(Q3$x,Q3$y)
## [1] -0.7785558

Answer:
Obviously, from the correlation coefficient, we can conclude that x an y are highly negatively correlated.

  1. Find a 95% confidence interval for the population mean of Y . State all the necessary assumption.

Solution:
In the sample, n = 16 which is less than 30, so we use the t-distribution to find a confidence interval.

t.test(Q3$y,conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  Q3$y
## t = 8.385, df = 15, p-value = 4.804e-07
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.780606 2.994394
## sample estimates:
## mean of x 
##    2.3875

From the above data, it is easy to see the 95% confidence interval is [1.780606,2.994394].

  1. Let W = 3X − 4Y − 1. Based on the data, obtain a 90% confidence interval for the mean of W.

Solution:
Let W = 3X - 4Y -1 be a new variable.

w <- 3*Q3$x - 4*Q3$y - 1
t.test(w,conf.level = 0.90)
## 
##  One Sample t-test
## 
## data:  w
## t = 4.9642, df = 15, p-value = 0.0001698
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
##   623.3793 1304.0207
## sample estimates:
## mean of x 
##     963.7

From the above data, it is easy to see the 90% confidence interval is [623.3793,1304.0207].

  1. A simple linear regression model is used to model this data set.
  1. Clearly state the model with all the assumptions.
    The fundamental assumption is:
    A straight-line equation u=Bo+B1x approximately relates u and x. That is, there is linear relationship between y and x.
    There are 3 inference assumptions:
    The first assumption is constant variance.
    That is, for any X=Xi, var(Yi|Xi)=σ^2. (i.e. the different populations of potential values of Y corresponding to different values of X have equal variance.
    The seconde assumption is independence.
    That is, any one value of Y is statistically independent of any other value of Y.
    The third assumption is normality.
    For any value Xi of X, the corresponding population of potential values of Y has a normal distribution.
    There are 2 model assumptions:
  1. \(\epsilon_i\) ~N(0,σ^2)
    \(\epsilon_i\) is normally distributed with mean 0 and variance σ^2.
  2. Ei are independent to each other.
  1. State your predicted model.
    \(\hat{\Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i\)

  2. Any violation of the assumptions stated in part (1)? Why?

fit<-lm(y~x,data=Q3)
summary(fit)
## 
## Call:
## lm(formula = y ~ x, data = Q3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1091 -0.3904 -0.1039  0.4125  1.6439 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.5051229  0.3036164  11.545 1.53e-08 ***
## x           -0.0034415  0.0007414  -4.642 0.000381 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7399 on 14 degrees of freedom
## Multiple R-squared:  0.6061, Adjusted R-squared:  0.578 
## F-statistic: 21.55 on 1 and 14 DF,  p-value: 0.000381

Answer: From the summary of result, I don’t find any violation of the assumptions of the model.

  1. Is there any evidence that the slope of the linear regression model is different from 1?

Answer:Using t-test to test for slope=1.

tstatistics<-(-0.0034415-1)/0.0007414
tstatistics
## [1] -1353.441

P-value is 0, so we reject the hypothesis. There is enough evidence to show that the slope of the regression is different from 1.

  1. Based on the model obtained in part (2), predict the value of y at x = 1000. Should you trust your prediction? Why?
predictvalue<-1000*fit$coefficients[2]+fit$coefficients[1]
predictvalue
##          x 
## 0.06363588

Answer: We should not trust the predicted value, since x=1000 is outside of x value in the dataset.(Consider as an outlier.)

Question 04

The data set is given in the Excel file (tab “Question 4”).

  1. Obtain the time plot and describe the trend.
plot(Q4$data,type="l")

  1. Calculate the sample mean.
samplemean<-mean(Q4$data)
samplemean
## [1] -1.914379
  1. Calculate the sample variance.
samplevariance<-var(Q4$data)
samplevariance
## [1] 0.2193034
  1. Calculate the sample autocovariance of lag 1.
sampleauto<-acf(Q4$data,lag.max = 1,type="covariance",plot=FALSE)
sampleauto$acf
## , , 1
## 
##             [,1]
## [1,] 0.217110372
## [2,] 0.009765988
  1. Calculate the sample autocorrelation of lag 1.
sampleautocorr<-acf(Q4$data,lag.max = 1,type="correlation",plot=FALSE)
sampleautocorr$acf
## , , 1
## 
##            [,1]
## [1,] 1.00000000
## [2,] 0.04498167
  1. See the handwrite paper