3 Generating data

set.seed(20200914)
N<-100
n<-2
y1<-rnorm(N,0,1)
y2<-rnorm(N,0,0.6)
y2<- y2 + 0.5 + 0.8 * y1

3a Scatter plot and emprical estimate of correlation

plot(y1,y2,main = "scatterplot of y1 v.s. y2")

result1<-sum( (y1-mean(y1))*(y2-mean(y2)) )/ sqrt(sum( (y1-mean(y1))*(y1-mean(y1)) )*sum( (y2-mean(y2))*(y2-mean(y2)) ))
result2<-cor(y1,y2)
result<-c(result1,result2)
names(result) <- c("mine","cor(y1,y2)")
(result)
##       mine cor(y1,y2) 
##  0.8141861  0.8141861

3b Report the estimate and p-value of beta_1

## 
## Call:
## lm(formula = y ~ . - 1, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.02354 -0.68563  0.01647  0.77282  2.49934 
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)   
## beta0   0.1059     0.1075   0.985  0.32598   
## beta1   0.4682     0.1521   3.079  0.00237 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.075 on 198 degrees of freedom
## Multiple R-squared:  0.1296, Adjusted R-squared:  0.1208 
## F-statistic: 14.74 on 2 and 198 DF,  p-value: 1.082e-06
## The estiamtion of beta1 is 0.468167
## The p-value for beta1 is 0.002373549

3c Basic technique for testing “beta1 = 0” and the p-value

If we use the assumption of 3b:

\[ \frac{\hat{\beta}_1}{ \hat{se}(\hat{\beta}_1) } \sim t_{n-2}\]

## The p-value for hypothesis testing H0: beta1=0 is 0.002373549

If we assume there may be correlation between two time periods

## t.test of r code:
## 
##  One Sample t-test
## 
## data:  z
## t = -7.1395, df = 99, p-value = 1.588e-10
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.5982809 -0.3380530
## sample estimates:
## mean of x 
## -0.468167
## my result:
## mean of z :  -0.468167 , t =  -7.139473 , p-value =  1.587818e-10
## 95 percent confidence interval:
##  -0.5982809 -0.338053

3d Summarize what is the consequence of ignoring the correlation.

If we ignore the correlation, the 95% confidence interval length of \(\beta_1\) is 2 * 0.29.

If we take into account the correlation, the 95% confidence interval length of \(\beta_1\) is 2 * 0.13.

Both methods get the same estimation of \(\beta_1\).

Thus if we ignore the correlation, we will get a less accurate estimator (bigger variance, wider confidence interval, bigger p-value, less likely to reject the null hypothesis of beta equal to zero). And the reason comes from that: “ignoring correlation” relys on a stronger assumption and only uses limited information.