HW3

1.28 (a)

crime <- read.table('crime.txt', header = T)
y <- crime$y
x <- crime$x

slr.fit = lm(y~x)
plot(x,y, xlab = "the percentage having at least a high-school diploma", ylab = "The crime rate")
abline(slr.fit)

summary(slr.fit)

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5278.3 -1757.5  -210.5  1575.3  6803.3 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 20517.60    3277.64   6.260 1.67e-08 ***
## x            -170.58      41.57  -4.103 9.57e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2356 on 82 degrees of freedom
## Multiple R-squared:  0.1703, Adjusted R-squared:  0.1602 
## F-statistic: 16.83 on 1 and 82 DF,  p-value: 9.571e-05

The \(R^2\) is 0.1703, which indicates that 17.03% of variation of Y can be explained by X. This is not a good fit.

(b)

(1)one percentage point increase in high-school diploma would have \(b_{1}\) decrease in crime rate. Therefore, the difference in the mean crime rate for two counties whose high-school graduate rates differ by one percent is -170.58.
(2) \(\widehat{Y}_{h} = -170.58*80 +20517.6 = 6871.2\)
(3)\(e_{10} = 1401.96\)

y_hat = -170.58*x[10] + 20517.6
e10 = y[10] - y_hat
e10

## [1] 1401.96

(4)\(\sigma^2 = MSE = 2356^2 = 5550736\)

2.30 (a)
\[H_{0}: \beta_{1} = 0, H_{a}: \beta_{1} \neq 0\] \[t^* = \frac{\beta_{1} - 0}{s_{b_{1}}} = \frac{-170.58 - 0}{41.57} = -4.103\] \[t(0.995,82) = 2.6371 , P-value = 9.57e-05< \alpha\] Therefore, we reject the hull hypothesis that \(\beta_{1}=0\). We can conclude that there is a linear association between crime rate and percentage of high school graduates at \(\alpha=0.01\).

(b) \[t(0.995,82) = 2.6371, s_{b_{1}} = 41.57 \] \[ -170.58 \pm 2.6371*(41.57)\] The 99% CI is \[[-280.2,-60.956]\]

We would expect 99% of repeated experiments to contain \(\beta_{1}\) in the range of \([-280.2,-60.956]\).

2.31(a)

Source	SS	df	MS
Regression	\(93418887\)	\(1\)	\(F * MSE = 16.83*5550736= 93418887\)
Error	\(MS * 82 =455160352\)	\(82\)	\(MSE = 2356^2 =5550736\)
Total	\(548579239\)

(b) \[H_{0}: \beta_{1} = 0, H_{a}: \beta_{1} \neq 0\] \[ F^* = \frac{MSR}{MSE} = \frac{93418887}{5550736} = 16.83\] \[ F(0.99; 1, 82) = 6.95, P-value = 9.571e-05 < \alpha\] Therefore, we reject the hull hypothesis that \(\beta_{1}=0\). We can conclude that there is a linear association between crime rate and percentage of high school graduates at \(\alpha=0.01\).

Since \(F(0.99; 1, 82) = 6.95 = t(0.995,82)^2 = 2.6371^2\), and P-value of F test is equivalent of t test, the two tests are equivalent in this case.

(c)
The sum of square of regression is 93418887, and \(R^2\) is \(\frac{93418887}{548579239} = 0.1703\). Therefore, 17.03% of the total variance reduced when percentage of high school graduates is introduced. this is a relatively small reduction.

(d) \[r = - \sqrt(R^2) = \sqrt(0.1703) = -0.4127\]

2.48(a) \[r_{12} = - \sqrt(R^2) = \sqrt(0.1703) = -0.4127\]

(b) \[H_{0}: \rho_{12} = 0, H_{a}: \rho_{12} \neq 0\] \[ t^* = \frac{r\sqrt(n-2)}{1-r^2} = \frac{-0.4127\sqrt(82)}{1-0.4127^2} = -4.1\] \[t(0.995;82) = 2.6371\] Since \(t^* > 2.6371\), we reject the null hypothesis. Therefore, we conclude that \(\rho_{12} \neq 0\). The crime rate and high school graduates are statistically correlated.

(c) In 2.31(b) and 2.30(a), we conclude that \(\beta_{1}\) isn’t equal to 0, there is a linear relationship between crime rate and high school graduates. This is consistent with the results we showed in part (a) and (b) that those two variables are significantly correlated with correlation coefficent equal to -0.4127.

3.8(a)

stem(x)

## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   6 | 1444
##   6 | 5678
##   7 | 00334444
##   7 | 5555666677777778888888999999
##   8 | 000011111112222222233333344444
##   8 | 55578889
##   9 | 11

The distribution is slightly skewed to the left.

(b)

res = residuals(slr.fit)
res.stan = rstandard(slr.fit)
res.stud = rstudent(slr.fit)
yhat = fitted(slr.fit)
boxplot(res)

Yes, it is symmetrical

(c)

par(mfrow=c(1,3))
plot(yhat,res, ylab = 'Residuals', xlab = 'Y-hat', main = 'Residual Plot')
abline(0,0)
plot(yhat,res.stan, ylab = 'Residuals', xlab = 'Y-hat', main = 'standardized Residual Plot')
abline(0,0)
plot(yhat,res.stud, ylab = 'Residuals', xlab = 'Y-hat', main = 'Studentized Residual Plot')
abline(0,0)

。 There are no obvious pattern in the residual plots and confirms that \(cov( \widehat{y},\widehat{e}) = 0\). It also supports the assumptions of linearity and homoscedasticity.

(d)

qqnorm(res)
qqline(res)

shapiro.test(res)

## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.97763, p-value = 0.1515

\[H_{0}:normal, H_{a}: Not\:normal\] \(W = 0.97763, p-value = 0.1515\)

Therefore, we cannot reject the null hypothesis. We conclude that the distribution is normal.

(e)

crime69L = crime[crime$x <=69,]
x1 = crime69L$x
y1 = crime69L$y
crime69R = crime[crime$x > 69,]
x2 = crime69R$x
y2 = crime69R$y
slr.fit1 = lm(y1~x1)
slr.fit2 = lm(y2~x2)
library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

bptest(slr.fit1)

## 
##  studentized Breusch-Pagan test
## 
## data:  slr.fit1
## BP = 1.1417e-05, df = 1, p-value = 0.9973

bptest(slr.fit2)

## 
##  studentized Breusch-Pagan test
## 
## data:  slr.fit2
## BP = 0.012865, df = 1, p-value = 0.9097

studentized Breusch-Pagan test

data: slr.fit1 BP = 1.1417e-05, df = 1, p-value = 0.9973

studentized Breusch-Pagan test

data: slr.fit2 BP = 0.012865, df = 1, p-value = 0.9097 \[H_{0}: equal variance, H_{a}: not\: equal \:variance\]

\[BP = 1.1417e-05, df = 1, p-value = 0.9973\] \[BP = 0.012865, df = 1, p-value = 0.9097\]

Therefore, we don’t reject the hypothesis. We can conclude that both groups have constant error variance. It is consistent with the findings in (c).

HW3

Jun Fang

10/2/2019