The null hypothesis states that neither TV, radio and newspaper have a relationship with the response variable sales. Given the low p-values of TV and radio there is a relationship between these two variables and sales therefore we can reject the null hypothesis and accept the alternative hypothesis. Newspaper has no influence on sales given that the two other predictors are present.
In case of a classification setting (qualitative response) the KNN classifier identifies the K nearest neighbors of x0 and then assigning it to the class with the higher probability. The KNN in a regression setting (quantitative response) on the other hand estimates f(x0) using the average of the K nearest training responses.
female: \(salary_{i}\) = 85 + 10 x GPA + 0.07 x IQ + 0.01 x(GPA x IQ)
male: \(salary_{i}\) = 50 + 20 x GPA + 0.07 x IQ + 0.01 x(GPA x IQ)
50 + 20 x GPA \(\geq\) 85 + 10 x GPA for GPA \(\geq\) 3.5
\(salary_{i}\) = 85 + 10 x 4 + 0.07 x 110 + 0.01 x(4 x 110) = 137.1
The predicted salary is 137100 $.
False. Cannot conclude that there is little evidence based on the given information. P-value and associated test statistic are needed to draw a conclusion.
In general the more flexible polynomial regression should have a lower RSS than the linear fit on the training data even if the underlying relationship is linear.
In the case of test data the linear regression should perform better(lower RSS) than the polynomial regression due to overfitting of the training data.
The more flexible polynomial regression should have a lower RSS than the linear fit on the training data, especially if the underlying relationship is not linear.
We don’t have enough information to answer this question. If the the true relationship is close to linear the linear regressiopn sould perfom better on the test data. If the underlying relationship is far from linear than the polynomial(cubic) approach should perform better.
\[\hat y_i = x_i\hat\beta\] where
\[\hat\beta = \frac{\sum_{i=1}^{n}\left ( x_{i} y_{i} \right )}{\sum_{i'=1}^{n} x_{i'}^{2}}\] show that we can write
\[\hat{y}_{i} = \sum_{i'=1}^{n}a_{i'}y_{i'}\] What is \[a_{i'}\] ?
\[\hat y_i = x_i \Big( \frac{\sum_{j=1}^{n} x_j y_j }{\sum_{k=1}^{n} x_k^2} \Big) = \Bigg [ \sum_{j} \Big( \frac {x_ix_j}{\sum_{k}x_k^2} \Big )\Bigg ]y_j\]
\[a_j= \frac {x_ix_j}{\sum_{k}x_k^2}\]
We can work with the following two equations:
The simple linear regression equation: \[y = \hat\beta_0+\hat\beta_1x \] and equation 3.4 in the book: \[\hat\beta_0 = \bar{y} - \hat\beta_1\bar{x}\]
So let’s substitute \(\bar{x}\) for x in the first equation. Then substitute the second equation for \(\hat\beta_0\).
\[y = \bar{y} - \hat\beta_0\bar{x}+ \hat\beta_0\bar{x}\] \[y = \bar{y}\]
Auto = read.csv("Auto.csv", na.strings = "?")
Auto = na.omit(Auto)
attach(Auto)
lm.fit = lm(mpg~horsepower)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ horsepower)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Is there a relationship between the predictor and the response? With a very high F-statistic and a very low corresponding p-value we can reject the null hypothesis \(H_0: \beta_1 = 0\) in favor of the alternative hypothesis. A relationship between mpg and horsepower exists.
How strong is the relationship between the predictor and the response? With a R² of 0.6 horsepower explaines 60% of the variance in mpg which is a strong relationship.
Is the relationship between the predictor and the response positive or negative? With a coefficiant of -0.16 the relationship is negative. A 1 unit increase in horsepower results in a -0.16 decrease in mpg.
What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?
predict(lm.fit, data.frame(horsepower=c(98)), interval="confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
predict(lm.fit, data.frame(horsepower=c(98)), interval="prediction")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
plot(horsepower,mpg)
abline(lm.fit)
par(mfrow=c(2,2))
plot(lm.fit)
pairs(Auto)
cor(Auto[1:8])
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
lm.fit = lm(mpg~.-name, data=Auto)
summary(lm.fit)
##
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
Is there a relationship between the predictors and the response?
There is clear evidence of a relationship. With an F-statistic of 252.4 and a very low p-value we have enough evidence to reject \(H_0\) in favor of the alternative hypothesis.
Which predictors appear to have a statistically significant relationship to the response?
Weight, year, origin are statistically significant on the 0.001 level and displacement is stat. sig. on the 0.01 level.
What does the coefficient for the year variable suggest? Year and mpg have a positive relationship. A one unit increase in year results in a 0.75 increase in mpg.
par(mfrow = c(2, 2))
plot(lm.fit)
* evidence of non-linearity * observation 14 has somewhat high leverage but within the boundary defined by \((p+1)/n = (7+1)/ 392 = 0.02\) * funnel shape of the residuals vs. fitted values plot indicates heteroscedasticity, response should be log transformed
lm.fit2 = lm(mpg~ year*origin, data=Auto)
summary(lm.fit2)
##
## Call:
## lm(formula = mpg ~ year * origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.3141 -3.7120 -0.6513 3.3621 15.5859
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -83.3809 12.0000 -6.948 1.57e-11 ***
## year 1.3089 0.1576 8.305 1.68e-15 ***
## origin 17.3752 6.8325 2.543 0.0114 *
## year:origin -0.1663 0.0889 -1.871 0.0621 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.199 on 388 degrees of freedom
## Multiple R-squared: 0.5596, Adjusted R-squared: 0.5562
## F-statistic: 164.4 on 3 and 388 DF, p-value: < 2.2e-16
lm.fit3 = lm(mpg~ weight*year, data=Auto)
summary(lm.fit3)
##
## Call:
## lm(formula = mpg ~ weight * year, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0397 -1.9956 -0.0983 1.6525 12.9896
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.105e+02 1.295e+01 -8.531 3.30e-16 ***
## weight 2.755e-02 4.413e-03 6.242 1.14e-09 ***
## year 2.040e+00 1.718e-01 11.876 < 2e-16 ***
## weight:year -4.579e-04 5.907e-05 -7.752 8.02e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.193 on 388 degrees of freedom
## Multiple R-squared: 0.8339, Adjusted R-squared: 0.8326
## F-statistic: 649.3 on 3 and 388 DF, p-value: < 2.2e-16
lm.fit4 = lm(mpg~ year*displacement, data=Auto)
summary(lm.fit4)
##
## Call:
## lm(formula = mpg ~ year * displacement, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8530 -2.4250 -0.2234 2.0823 16.9933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.288e+01 8.368e+00 -8.709 < 2e-16 ***
## year 1.408e+00 1.102e-01 12.779 < 2e-16 ***
## displacement 2.523e-01 4.059e-02 6.216 1.32e-09 ***
## year:displacement -4.080e-03 5.453e-04 -7.482 4.96e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.729 on 388 degrees of freedom
## Multiple R-squared: 0.7735, Adjusted R-squared: 0.7718
## F-statistic: 441.7 on 3 and 388 DF, p-value: < 2.2e-16
year:displacement and weight:year seem to be statistically significant.
fit.lm4 = lm(mpg~weight+year+origin, data=Auto)
summary(fit.lm4)
##
## Call:
## lm(formula = mpg ~ weight + year + origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9440 -2.0948 -0.0389 1.7255 13.2722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.805e+01 4.001e+00 -4.510 8.60e-06 ***
## weight -5.994e-03 2.541e-04 -23.588 < 2e-16 ***
## year 7.571e-01 4.832e-02 15.668 < 2e-16 ***
## origin 1.150e+00 2.591e-01 4.439 1.18e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.348 on 388 degrees of freedom
## Multiple R-squared: 0.8175, Adjusted R-squared: 0.816
## F-statistic: 579.2 on 3 and 388 DF, p-value: < 2.2e-16
fit.lm5 = lm(mpg~I(log(weight))+year+origin, data=Auto)
summary(fit.lm5)
##
## Call:
## lm(formula = mpg ~ I(log(weight)) + year + origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9120 -1.9384 -0.0257 1.5961 13.1033
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 115.76550 7.53529 15.363 < 2e-16 ***
## I(log(weight)) -19.19080 0.72701 -26.397 < 2e-16 ***
## year 0.77969 0.04477 17.417 < 2e-16 ***
## origin 0.75026 0.24722 3.035 0.00257 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.123 on 388 degrees of freedom
## Multiple R-squared: 0.8411, Adjusted R-squared: 0.8398
## F-statistic: 684.5 on 3 and 388 DF, p-value: < 2.2e-16
fit.lm6 = lm(mpg~I(weight^2)+year+origin, data=Auto)
summary(fit.lm6)
##
## Call:
## lm(formula = mpg ~ I(weight^2) + year + origin, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.8810 -2.2688 -0.0881 1.9049 13.3968
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.824e+01 4.189e+00 -6.743 5.66e-11 ***
## I(weight^2) -8.503e-07 4.191e-08 -20.288 < 2e-16 ***
## year 7.531e-01 5.282e-02 14.259 < 2e-16 ***
## origin 1.661e+00 2.739e-01 6.064 3.15e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.638 on 388 degrees of freedom
## Multiple R-squared: 0.7844, Adjusted R-squared: 0.7827
## F-statistic: 470.5 on 3 and 388 DF, p-value: < 2.2e-16
data(Carseats)
car1_fit = lm(Sales~Price+Urban+US, data=Carseats)
summary(car1_fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
\[Sales= 13.043469 - 0.054459\times Price - 0.021916 \times Urban + 1.200573 \times US \]
car2_fit = lm(Sales~Price+US, data=Carseats)
summary(car2_fit)
##
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
confint(car2_fit)
## 2.5 % 97.5 %
## (Intercept) 11.79032020 14.27126531
## Price -0.06475984 -0.04419543
## USYes 0.69151957 1.70776632
par(mfrow = c(2, 2))
plot(car2_fit)
plot(predict(car1_fit), rstudent(car1_fit))
set.seed(1)
x=rnorm(100)
y=2*x+rnorm(100)
lm.fit = lm(y∼x+0)
summary(lm.fit)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
lm.fit = lm(x∼y+0)
summary(lm.fit)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8699 -0.2368 0.1030 0.2858 0.8938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.39111 0.02089 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
\[t = \beta / SE(\beta) \\ \beta = \frac {\sum{x_i y_i}} {\sum{x_i^2}} \\ SE(\beta) = \sqrt{\frac {\sum{(y_i - x_i \beta)^2}} {(n-1) \sum{x_i^2}}}\\\]
\[ t = {\frac {\sum{x_i y_i}} {\sum{x_i^2}}} {\sqrt{\frac {(n-1) \sum{x_i^2}} {\sum{(y_i - x_i \beta)^2}}}} \\ t = {\frac {\sqrt{n-1} \sum{x_i y_i}} {\sqrt{\sum{x_i^2} \sum{(y_i - x_i \beta)^2}}}} \\ t = {\frac {\sqrt{n-1} \sum{x_i y_i}} {\sqrt{\sum{x_i^2} \sum{(y_i^2 - 2 \beta x_i y_i + x_i^2 \beta^2)}}}} \\ t = {\frac {\sqrt{n-1} \sum{x_i y_i}} {\sqrt{\sum{x_i^2} \sum{y_i^2} - \sum{x_i^2} \beta (2 \sum{x_i y_i} - \beta \sum{x_i^2})}}} \\ t = {\frac {\sqrt{n-1} \sum{x_i y_i}} {\sqrt{\sum{x_i^2} \sum{y_i^2} - \sum{x_i y_i} (2 \sum{x_i y_i} - \sum{x_i y_i})}}} \\ t = \frac {\sqrt{n-1} \sum{x_i y_i}} {\sqrt{\sum{x_i^2} \sum{y_i^2} - (\sum{x_i y_i})^2 }} \]
n = length(x)
t = sqrt(n - 1)*(x %*% y)/sqrt(sum(x^2) * sum(y^2) - (x %*% y)^2)
t
## [,1]
## [1,] 18.72593
Which equals the t-value calculated by our model.
\[t = \frac {\sqrt{n-1} \sum{x_i y_i}} {\sqrt{\sum{x_i^2} \sum{y_i^2} - (\sum{x_i y_i})^2 }}\]
lm.fit = lm(y ~x)
summary(lm.fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
lm.fit = lm(x ~ y)
summary(lm.fit)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
set.seed(1)
x = rnorm(100)
y = 2*x + rnorm(100)
fit.lmY = lm(y ~ x)
fit.lmX = lm(x ~ y)
summary(fit.lmY)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
summary(fit.lmX)
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
set.seed(1)
x = rnorm(100)
y = -sample(x, 100)
sum(x^2)
## [1] 81.05509
sum(y^2)
## [1] 81.05509
lm.fit = lm(y~x+0)
lm.fit2 = lm(x~y+0)
summary(lm.fit)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2833 -0.6945 -0.1140 0.4995 2.1665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 0.07768 0.10020 0.775 0.44
##
## Residual standard error: 0.9021 on 99 degrees of freedom
## Multiple R-squared: 0.006034, Adjusted R-squared: -0.004006
## F-statistic: 0.601 on 1 and 99 DF, p-value: 0.4401
summary(lm.fit2)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2182 -0.4969 0.1595 0.6782 2.4017
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.07768 0.10020 0.775 0.44
##
## Residual standard error: 0.9021 on 99 degrees of freedom
## Multiple R-squared: 0.006034, Adjusted R-squared: -0.004006
## F-statistic: 0.601 on 1 and 99 DF, p-value: 0.4401
set.seed(1)
x = rnorm(100)
eps = rnorm(100, sd = sqrt(0.25))
y = -1 + (0.5*x) + eps
length(y)
## [1] 100
plot(x, y)
lm.fit1 = lm(y ~ x)
summary(lm.fit1)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.93842 -0.30688 -0.06975 0.26970 1.17309
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.01885 0.04849 -21.010 < 2e-16 ***
## x 0.49947 0.05386 9.273 4.58e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4814 on 98 degrees of freedom
## Multiple R-squared: 0.4674, Adjusted R-squared: 0.4619
## F-statistic: 85.99 on 1 and 98 DF, p-value: 4.583e-15
plot(x, y)
abline(lm.fit1, col = "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("model", "population"), col = c("blue", "red"), lty = c(1, 1))
lm.fit_sqrt = lm(y~x+I(x^2))
summary(lm.fit_sqrt)
##
## Call:
## lm(formula = y ~ x + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.98252 -0.31270 -0.06441 0.29014 1.13500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.97164 0.05883 -16.517 < 2e-16 ***
## x 0.50858 0.05399 9.420 2.4e-15 ***
## I(x^2) -0.05946 0.04238 -1.403 0.164
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.479 on 97 degrees of freedom
## Multiple R-squared: 0.4779, Adjusted R-squared: 0.4672
## F-statistic: 44.4 on 2 and 97 DF, p-value: 2.038e-14
eps2 = rnorm(100, sd=0.1)
y2 = -1 + 0.5*x + eps2
lm.fit2 = lm(y2 ~ x)
summary(lm.fit2)
##
## Call:
## lm(formula = y2 ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.291411 -0.048230 -0.004533 0.064924 0.264157
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.99726 0.01047 -95.25 <2e-16 ***
## x 0.50212 0.01163 43.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1039 on 98 degrees of freedom
## Multiple R-squared: 0.9501, Adjusted R-squared: 0.9495
## F-statistic: 1864 on 1 and 98 DF, p-value: < 2.2e-16
plot(x, y2)
abline(lm.fit2, col = "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("model", "population"), col = c("blue", "red"), lty = c(1, 1))
eps2 = rnorm(100, sd=1)
y2 = -1 + 0.5*x + eps2
lm.fit3 = lm(y2 ~ x)
summary(lm.fit3)
##
## Call:
## lm(formula = y2 ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.51626 -0.54525 -0.03776 0.67289 1.87887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.9423 0.1003 -9.397 2.47e-15 ***
## x 0.4443 0.1114 3.989 0.000128 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9955 on 98 degrees of freedom
## Multiple R-squared: 0.1397, Adjusted R-squared: 0.1309
## F-statistic: 15.91 on 1 and 98 DF, p-value: 0.000128
plot(x, y2)
abline(lm.fit3, col = "blue")
abline(-1, 0.5, col = "red")
legend("topleft", c("model", "population"), col = c("blue", "red"), lty = c(1, 1))
confint(lm.fit1)
## 2.5 % 97.5 %
## (Intercept) -1.1150804 -0.9226122
## x 0.3925794 0.6063602
confint(lm.fit2)
## 2.5 % 97.5 %
## (Intercept) -1.0180413 -0.9764850
## x 0.4790377 0.5251957
confint(lm.fit3)
## 2.5 % 97.5 %
## (Intercept) -1.1413399 -0.7433293
## x 0.2232721 0.6653558
set.seed(1)
x1 = runif(100)
x2 = 0.5 * x1 + rnorm(100)/10
y = 2 + 2 * x1 + 0.3 * x2 + rnorm(100)
The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?
cor(x1, x2)
## [1] 0.8351212
plot(x1, x2)
lm.fit1 = lm(y~ x1 + x2)
summary(lm.fit1)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8311 -0.7273 -0.0537 0.6338 2.3359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
## x1 1.4396 0.7212 1.996 0.0487 *
## x2 1.0097 1.1337 0.891 0.3754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925
## F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05
lm.fit2 = lm(y~ x1)
summary(lm.fit2)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.89495 -0.66874 -0.07785 0.59221 2.45560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1124 0.2307 9.155 8.27e-15 ***
## x1 1.9759 0.3963 4.986 2.66e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared: 0.2024, Adjusted R-squared: 0.1942
## F-statistic: 24.86 on 1 and 98 DF, p-value: 2.661e-06
lm.fit3 = lm(y~ x2)
summary(lm.fit3)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.62687 -0.75156 -0.03598 0.72383 2.44890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3899 0.1949 12.26 < 2e-16 ***
## x2 2.8996 0.6330 4.58 1.37e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared: 0.1763, Adjusted R-squared: 0.1679
## F-statistic: 20.98 on 1 and 98 DF, p-value: 1.366e-05
x1 = c(x1, 0.1)
x2 = c(x2, 0.8)
y = c(y, 6)
par(mfrow=c(2,2))
# regression with both x1 and x2
lm.fit1 = lm(y~x1+x2)
summary(lm.fit1)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.73348 -0.69318 -0.05263 0.66385 2.30619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2267 0.2314 9.624 7.91e-16 ***
## x1 0.5394 0.5922 0.911 0.36458
## x2 2.5146 0.8977 2.801 0.00614 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.075 on 98 degrees of freedom
## Multiple R-squared: 0.2188, Adjusted R-squared: 0.2029
## F-statistic: 13.72 on 2 and 98 DF, p-value: 5.564e-06
plot(lm.fit1)
# regression with x1 only
lm.fit2 = lm(y~x2)
summary(lm.fit2)
##
## Call:
## lm(formula = y ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.64729 -0.71021 -0.06899 0.72699 2.38074
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3451 0.1912 12.264 < 2e-16 ***
## x2 3.1190 0.6040 5.164 1.25e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.074 on 99 degrees of freedom
## Multiple R-squared: 0.2122, Adjusted R-squared: 0.2042
## F-statistic: 26.66 on 1 and 99 DF, p-value: 1.253e-06
plot(lm.fit2)
# regression with x2 only
lm.fit3 = lm(y~x1)
summary(lm.fit3)
##
## Call:
## lm(formula = y ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8897 -0.6556 -0.0909 0.5682 3.5665
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2569 0.2390 9.445 1.78e-15 ***
## x1 1.7657 0.4124 4.282 4.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.111 on 99 degrees of freedom
## Multiple R-squared: 0.1562, Adjusted R-squared: 0.1477
## F-statistic: 18.33 on 1 and 99 DF, p-value: 4.295e-05
plot(lm.fit3)
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
Boston$chas = factor(Boston$chas, labels = c("N","Y"))
summary(Boston)
## crim zn indus chas nox
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 N:471 Min. :0.3850
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 Y: 35 1st Qu.:0.4490
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.5380
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.5547
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.6240
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :0.8710
## rm age dis rad
## Min. :3.561 Min. : 2.90 Min. : 1.130 Min. : 1.000
## 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100 1st Qu.: 4.000
## Median :6.208 Median : 77.50 Median : 3.207 Median : 5.000
## Mean :6.285 Mean : 68.57 Mean : 3.795 Mean : 9.549
## 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188 3rd Qu.:24.000
## Max. :8.780 Max. :100.00 Max. :12.127 Max. :24.000
## tax ptratio black lstat
## Min. :187.0 Min. :12.60 Min. : 0.32 Min. : 1.73
## 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38 1st Qu.: 6.95
## Median :330.0 Median :19.05 Median :391.44 Median :11.36
## Mean :408.2 Mean :18.46 Mean :356.67 Mean :12.65
## 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23 3rd Qu.:16.95
## Max. :711.0 Max. :22.00 Max. :396.90 Max. :37.97
## medv
## Min. : 5.00
## 1st Qu.:17.02
## Median :21.20
## Mean :22.53
## 3rd Qu.:25.00
## Max. :50.00
attach(Boston)
lm.zn = lm(crim~zn)
summary(lm.zn) # yes
##
## Call:
## lm(formula = crim ~ zn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.429 -4.222 -2.620 1.250 84.523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
## zn -0.07393 0.01609 -4.594 5.51e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.435 on 504 degrees of freedom
## Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828
## F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06
lm.indus = lm(crim~indus)
summary(lm.indus) # yes
##
## Call:
## lm(formula = crim ~ indus)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.972 -2.698 -0.736 0.712 81.813
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.06374 0.66723 -3.093 0.00209 **
## indus 0.50978 0.05102 9.991 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.866 on 504 degrees of freedom
## Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637
## F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16
lm.chas = lm(crim~chas)
summary(lm.chas) # no
##
## Call:
## lm(formula = crim ~ chas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.738 -3.661 -3.435 0.018 85.232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.7444 0.3961 9.453 <2e-16 ***
## chasY -1.8928 1.5061 -1.257 0.209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.597 on 504 degrees of freedom
## Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
## F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
lm.nox = lm(crim~nox)
summary(lm.nox) # yes
##
## Call:
## lm(formula = crim ~ nox)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.371 -2.738 -0.974 0.559 81.728
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.720 1.699 -8.073 5.08e-15 ***
## nox 31.249 2.999 10.419 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.81 on 504 degrees of freedom
## Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756
## F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16
lm.rm = lm(crim~rm)
summary(lm.rm) # yes
##
## Call:
## lm(formula = crim ~ rm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.604 -3.952 -2.654 0.989 87.197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.482 3.365 6.088 2.27e-09 ***
## rm -2.684 0.532 -5.045 6.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.401 on 504 degrees of freedom
## Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618
## F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07
lm.age = lm(crim~age)
summary(lm.age) # yes
##
## Call:
## lm(formula = crim ~ age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.789 -4.257 -1.230 1.527 82.849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
## age 0.10779 0.01274 8.463 2.85e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.057 on 504 degrees of freedom
## Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227
## F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16
lm.dis = lm(crim~dis)
summary(lm.dis) # yes
##
## Call:
## lm(formula = crim ~ dis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.708 -4.134 -1.527 1.516 81.674
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.4993 0.7304 13.006 <2e-16 ***
## dis -1.5509 0.1683 -9.213 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.965 on 504 degrees of freedom
## Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425
## F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16
lm.rad = lm(crim~rad)
summary(lm.rad) # yes
##
## Call:
## lm(formula = crim ~ rad)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.164 -1.381 -0.141 0.660 76.433
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
## rad 0.61791 0.03433 17.998 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.718 on 504 degrees of freedom
## Multiple R-squared: 0.3913, Adjusted R-squared: 0.39
## F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16
lm.tax = lm(crim~tax)
summary(lm.tax) # yes
##
## Call:
## lm(formula = crim ~ tax)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.513 -2.738 -0.194 1.065 77.696
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
## tax 0.029742 0.001847 16.10 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.997 on 504 degrees of freedom
## Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383
## F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16
lm.ptratio = lm(crim~ptratio)
summary(lm.ptratio) # yes
##
## Call:
## lm(formula = crim ~ ptratio)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.654 -3.985 -1.912 1.825 83.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
## ptratio 1.1520 0.1694 6.801 2.94e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.24 on 504 degrees of freedom
## Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225
## F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11
lm.black = lm(crim~black)
summary(lm.black) # yes
##
## Call:
## lm(formula = crim ~ black)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.756 -2.299 -2.095 -1.296 86.822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.553529 1.425903 11.609 <2e-16 ***
## black -0.036280 0.003873 -9.367 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.946 on 504 degrees of freedom
## Multiple R-squared: 0.1483, Adjusted R-squared: 0.1466
## F-statistic: 87.74 on 1 and 504 DF, p-value: < 2.2e-16
lm.lstat = lm(crim~lstat)
summary(lm.lstat) # yes
##
## Call:
## lm(formula = crim ~ lstat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.925 -2.822 -0.664 1.079 82.862
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.33054 0.69376 -4.801 2.09e-06 ***
## lstat 0.54880 0.04776 11.491 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.664 on 504 degrees of freedom
## Multiple R-squared: 0.2076, Adjusted R-squared: 0.206
## F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16
lm.medv = lm(crim~medv)
summary(lm.medv) # yes
##
## Call:
## lm(formula = crim ~ medv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.071 -4.022 -2.343 1.298 80.957
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.79654 0.93419 12.63 <2e-16 ***
## medv -0.36316 0.03839 -9.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.934 on 504 degrees of freedom
## Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
## F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16
lm.all = lm(crim ~ ., data = Boston)
summary(lm.all)
##
## Call:
## lm(formula = crim ~ ., data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.924 -2.120 -0.353 1.019 75.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.033228 7.234903 2.354 0.018949 *
## zn 0.044855 0.018734 2.394 0.017025 *
## indus -0.063855 0.083407 -0.766 0.444294
## chasY -0.749134 1.180147 -0.635 0.525867
## nox -10.313535 5.275536 -1.955 0.051152 .
## rm 0.430131 0.612830 0.702 0.483089
## age 0.001452 0.017925 0.081 0.935488
## dis -0.987176 0.281817 -3.503 0.000502 ***
## rad 0.588209 0.088049 6.680 6.46e-11 ***
## tax -0.003780 0.005156 -0.733 0.463793
## ptratio -0.271081 0.186450 -1.454 0.146611
## black -0.007538 0.003673 -2.052 0.040702 *
## lstat 0.126211 0.075725 1.667 0.096208 .
## medv -0.198887 0.060516 -3.287 0.001087 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.439 on 492 degrees of freedom
## Multiple R-squared: 0.454, Adjusted R-squared: 0.4396
## F-statistic: 31.47 on 13 and 492 DF, p-value: < 2.2e-16
x = c(coefficients(lm.zn)[2],
coefficients(lm.indus)[2],
coefficients(lm.chas)[2],
coefficients(lm.nox)[2],
coefficients(lm.rm)[2],
coefficients(lm.age)[2],
coefficients(lm.dis)[2],
coefficients(lm.rad)[2],
coefficients(lm.tax)[2],
coefficients(lm.ptratio)[2],
coefficients(lm.black)[2],
coefficients(lm.lstat)[2],
coefficients(lm.medv)[2])
y = coefficients(lm.all)[2:14]
plot(x, y)
lm.zn = lm(crim~poly(zn,3))
summary(lm.zn) # 1, 2
##
## Call:
## lm(formula = crim ~ poly(zn, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.821 -4.614 -1.294 0.473 84.130
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3722 9.709 < 2e-16 ***
## poly(zn, 3)1 -38.7498 8.3722 -4.628 4.7e-06 ***
## poly(zn, 3)2 23.9398 8.3722 2.859 0.00442 **
## poly(zn, 3)3 -10.0719 8.3722 -1.203 0.22954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.372 on 502 degrees of freedom
## Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
## F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
lm.indus = lm(crim~poly(indus,3))
summary(lm.indus)
##
## Call:
## lm(formula = crim ~ poly(indus, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.278 -2.514 0.054 0.764 79.713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.330 10.950 < 2e-16 ***
## poly(indus, 3)1 78.591 7.423 10.587 < 2e-16 ***
## poly(indus, 3)2 -24.395 7.423 -3.286 0.00109 **
## poly(indus, 3)3 -54.130 7.423 -7.292 1.2e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.423 on 502 degrees of freedom
## Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
## F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
lm.nox = lm(crim~poly(nox,3))
summary(lm.nox)
##
## Call:
## lm(formula = crim ~ poly(nox, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.110 -2.068 -0.255 0.739 78.302
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3216 11.237 < 2e-16 ***
## poly(nox, 3)1 81.3720 7.2336 11.249 < 2e-16 ***
## poly(nox, 3)2 -28.8286 7.2336 -3.985 7.74e-05 ***
## poly(nox, 3)3 -60.3619 7.2336 -8.345 6.96e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.234 on 502 degrees of freedom
## Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
## F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
lm.rm = lm(crim~poly(rm,3))
summary(lm.rm)
##
## Call:
## lm(formula = crim ~ poly(rm, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.485 -3.468 -2.221 -0.015 87.219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3703 9.758 < 2e-16 ***
## poly(rm, 3)1 -42.3794 8.3297 -5.088 5.13e-07 ***
## poly(rm, 3)2 26.5768 8.3297 3.191 0.00151 **
## poly(rm, 3)3 -5.5103 8.3297 -0.662 0.50858
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.33 on 502 degrees of freedom
## Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
## F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
lm.age = lm(crim~poly(age,3))
summary(lm.age)
##
## Call:
## lm(formula = crim ~ poly(age, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.762 -2.673 -0.516 0.019 82.842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3485 10.368 < 2e-16 ***
## poly(age, 3)1 68.1820 7.8397 8.697 < 2e-16 ***
## poly(age, 3)2 37.4845 7.8397 4.781 2.29e-06 ***
## poly(age, 3)3 21.3532 7.8397 2.724 0.00668 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.84 on 502 degrees of freedom
## Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
## F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
lm.dis = lm(crim~poly(dis,3))
summary(lm.dis)
##
## Call:
## lm(formula = crim ~ poly(dis, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.757 -2.588 0.031 1.267 76.378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3259 11.087 < 2e-16 ***
## poly(dis, 3)1 -73.3886 7.3315 -10.010 < 2e-16 ***
## poly(dis, 3)2 56.3730 7.3315 7.689 7.87e-14 ***
## poly(dis, 3)3 -42.6219 7.3315 -5.814 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.331 on 502 degrees of freedom
## Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
## F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
lm.rad = lm(crim~poly(rad,3))
summary(lm.rad)
##
## Call:
## lm(formula = crim ~ poly(rad, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.381 -0.412 -0.269 0.179 76.217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.2971 12.164 < 2e-16 ***
## poly(rad, 3)1 120.9074 6.6824 18.093 < 2e-16 ***
## poly(rad, 3)2 17.4923 6.6824 2.618 0.00912 **
## poly(rad, 3)3 4.6985 6.6824 0.703 0.48231
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.682 on 502 degrees of freedom
## Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
## F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
lm.tax = lm(crim~poly(tax,3))
summary(lm.tax)
##
## Call:
## lm(formula = crim ~ poly(tax, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.273 -1.389 0.046 0.536 76.950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3047 11.860 < 2e-16 ***
## poly(tax, 3)1 112.6458 6.8537 16.436 < 2e-16 ***
## poly(tax, 3)2 32.0873 6.8537 4.682 3.67e-06 ***
## poly(tax, 3)3 -7.9968 6.8537 -1.167 0.244
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.854 on 502 degrees of freedom
## Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
## F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
lm.ptratio = lm(crim~poly(ptratio,3))
summary(lm.ptratio)
##
## Call:
## lm(formula = crim ~ poly(ptratio, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.833 -4.146 -1.655 1.408 82.697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.361 10.008 < 2e-16 ***
## poly(ptratio, 3)1 56.045 8.122 6.901 1.57e-11 ***
## poly(ptratio, 3)2 24.775 8.122 3.050 0.00241 **
## poly(ptratio, 3)3 -22.280 8.122 -2.743 0.00630 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.122 on 502 degrees of freedom
## Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
## F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
lm.black = lm(crim~poly(black,3))
summary(lm.black)
##
## Call:
## lm(formula = crim ~ poly(black, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.096 -2.343 -2.128 -1.439 86.790
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3536 10.218 <2e-16 ***
## poly(black, 3)1 -74.4312 7.9546 -9.357 <2e-16 ***
## poly(black, 3)2 5.9264 7.9546 0.745 0.457
## poly(black, 3)3 -4.8346 7.9546 -0.608 0.544
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.955 on 502 degrees of freedom
## Multiple R-squared: 0.1498, Adjusted R-squared: 0.1448
## F-statistic: 29.49 on 3 and 502 DF, p-value: < 2.2e-16
lm.lstat = lm(crim~poly(lstat,3))
summary(lm.lstat)
##
## Call:
## lm(formula = crim ~ poly(lstat, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.234 -2.151 -0.486 0.066 83.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6135 0.3392 10.654 <2e-16 ***
## poly(lstat, 3)1 88.0697 7.6294 11.543 <2e-16 ***
## poly(lstat, 3)2 15.8882 7.6294 2.082 0.0378 *
## poly(lstat, 3)3 -11.5740 7.6294 -1.517 0.1299
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.629 on 502 degrees of freedom
## Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
## F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
lm.medv = lm(crim~poly(medv,3))
summary(lm.medv)
##
## Call:
## lm(formula = crim ~ poly(medv, 3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.427 -1.976 -0.437 0.439 73.655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.614 0.292 12.374 < 2e-16 ***
## poly(medv, 3)1 -75.058 6.569 -11.426 < 2e-16 ***
## poly(medv, 3)2 88.086 6.569 13.409 < 2e-16 ***
## poly(medv, 3)3 -48.033 6.569 -7.312 1.05e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.569 on 502 degrees of freedom
## Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
## F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16