MATH 4130B Assignment 1

Question 03

As in Question 2, the aim of this question is to review some of the basic concepts you have learned in the prerequisite courses. The data set is given in the Excel file (tab “Question 3”).

Obtain the sample mean, sample variance and sample standard deviation for y.

samplemean<-mean(Q3$y)
samplemean

## [1] 2.3875

variance<-var(Q3$y)
variance

## [1] 1.297167

standarddev<-sd(Q3$y)
standarddev

## [1] 1.138932

Repeat part (a) but for x.

mean(Q3$x)

## [1] 324.75

var(Q3$x)

## [1] 66387

sd(Q3$x)

## [1] 257.6567

Obtain the sample covariance and sample correlation between x and y. What can you tell from the sample correlation on the association between x and y.

cov(Q3$x,Q3$y)

## [1] -228.47

cor(Q3$x,Q3$y)

## [1] -0.7785558

Answer: Obviously, from the correlation coefficient, we can conclude that x an y are highly negatively correlated.

Find a 95% confidence interval for the population mean of Y . State all the necessary assumption.

Solution: Assume that the population mean of y is distributed as normal distribution, which is justified by the central limit theorem. Then the 95% confidence interval is [0.1551928,4.619807].

CIlower<-mean(Q3$y)-1.96*sd(Q3$y)
CIupper<-mean(Q3$y)+1.96*sd(Q3$y)

CIlower

## [1] 0.1551928

CIupper

## [1] 4.619807

Let W = 3X − 4Y − 1. Based on the data, obtain a 90% confidence interval for the mean of W.

Solution: Calculate the mean and variance of W.

meanw<-3*mean(Q3$x)-4*mean(Q3$y)-1
varw<-9*var(Q3$x)+16*var(Q3$y)-2*3*4*cov(Q3$x,Q3$y)

Then the 90% confidence interval is [-313.68,2241.08].

CIlower1<-meanw-1.645*sqrt(varw)
CIupper1<-meanw+1.645*sqrt(varw)

CIlower1

## [1] -313.6793

CIupper1

## [1] 2241.079

A simple linear regression model is used to model this data set. (1)(2) See the hand write paper.

Any violation of the assumptions stated in part (1)? Why?

fit<-lm(y~x,data=Q3)
summary(fit)

## 
## Call:
## lm(formula = y ~ x, data = Q3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1091 -0.3904 -0.1039  0.4125  1.6439 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.5051229  0.3036164  11.545 1.53e-08 ***
## x           -0.0034415  0.0007414  -4.642 0.000381 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7399 on 14 degrees of freedom
## Multiple R-squared:  0.6061, Adjusted R-squared:  0.578 
## F-statistic: 21.55 on 1 and 14 DF,  p-value: 0.000381

Answer: From the summary of result, I don’t find any violation of the assumptions of the model.

Is there any evidence that the slope of the linear regression model is different from 1?

Answer:Using t-test to test for slope=1.

tstatistics<-(-0.0034415-1)/0.0007414
tstatistics

## [1] -1353.441

P-value is 0, so we reject the hypothesis. The slope of the regression is different from 1.

Based on the model obtained in part (2), predict the value of y at x = 1000. Should you trust your prediction? Why?

predictvalue<-1000*fit$coefficients[2]+fit$coefficients[1]
predictvalue

##          x 
## 0.06363588

Answer: We should not trust the predicted value, since x=1000 is outside of x value in the dataset.

Question 04

The data set is given in the Excel file (tab “Question 4”).

Obtain the time plot and describe the trend.

plot(Q4$data,type="l")

Calculate the sample mean.

samplemean<-mean(Q4$data)
samplemean

## [1] -1.914379

Calculate the sample variance.

samplevariance<-var(Q4$data)
samplevariance

## [1] 0.2193034

Calculate the sample autocovariance of lag 1.

sampleauto<-acf(Q4$data,lag.max = 1,type="covariance")

sampleauto$acf

## , , 1
## 
##             [,1]
## [1,] 0.217110372
## [2,] 0.009765988

Calculate the sample autocorrelation of lag 1.

sampleautocorr<-acf(Q4$data,lag.max = 1,type="correlation")

sampleautocorr$acf

## , , 1
## 
##            [,1]
## [1,] 1.00000000
## [2,] 0.04498167

See the hand write papaer.

MATH 4130B Assignment 1

Yixin Lou

Question 03

Question 04