3. Linear Regression

Simple Linear Regression

\[Y \approx \beta_0 + \beta_1X\] Simple as it gets. Use the training data \(X \in \Bbb R^n\) to produce estimates for \(\hat{\beta_0}\) and \(\hat{\beta_1}\) then create predictions \(\hat{y}\). ###Estimating Coefficients Measure closeness of fit. Least sqaures is the most common method.

library(broom)
y <-rnorm(10)
x <-1:10
mod <- lm(y ~ x)
df <- augment(mod)
ggplot(df, aes(x = x, y = y)) + 
  geom_point() +
  geom_point(aes(y = .fitted), shape = 1) +
  geom_segment(aes(xend = x, yend = .fitted), color = "red")

Let \(\hat{y}_i = \hat{\beta_0} + \hat{\beta_1}x_i\) be the prediction for \(Y\) based on the \(i\)th value of \(X\). Then the residual (error) is \(e_i = y_i - \hat{y_i}\). The residual sum of squares defines as \[RSS = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n (y_i - \hat{\beta_0} + \hat{\beta_1}x_i)^2\] Using some calculus (set the derivative to zero) we calculate the minimized coefficients as \[\begin{eqnarray} \hat{\beta_1} &=& \frac{\sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})} \\ \hat{\beta_1} &=& \bar{y} - \hat{\beta_1}\bar{x} \end{eqnarray}\] where \(\bar{y} \equiv \frac{1}{n} \sum_{i=1}^n y_i\) and \(\bar{x} \equiv \frac{1}{n} \sum_{i=1}^n x_i\), the means of \(Y\) and \(X\) respectively.

Assessing Coefficient Estimate Accuracy

In the same way we estimate the population mean with the sample mean \(\mu \approx \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i\), the least squares coefficients estimate the population regression line. This analogy is based on the concept of bias. If the sample mean \(\hat\mu\) is used to estimate the population mean \(\mu\), the estimate is unbiased, in that on average \(\hat\mu \approx \mu\). Given a huge set of observations \(Y \in \Bbb R^n\) where \(n >>> 1\), there is no systematice over or under estimate of the true parameter, thus \(\hat\mu = \mu\). So how accurate is a single \(\hat\mu\) to \(\mu\)? Compute the standard error defined as \[\text{Var}(\hat\mu) = \text{SE}(\hat\mu)^2 = \frac{\sigma^2}{n}\] where \(\sigma\) is the standard deviation of each of the realizations \(y_i\) of \(Y\). The formulas for computing the least squares coefficient errors are the following \[ \begin{eqnarray} \text{SE}(\hat\beta_0)^2 &=& \sigma^2 \left \lbrack \frac{1}{n} + \frac{\bar x^2}{\sum_{i=1}^2 (x_i - \bar x)^2} \right \rbrack\\ \text{SE}(\hat\beta_1)^2 &=& \frac{\sigma^2}{\sum_{i=1}^2 (x_i - \bar x)^2} \\ \sigma^2 &=& \text{Var}(\epsilon) \\ \sigma = \text{RSE} &=& \sqrt{\text{RSS} / (n - 2)} \end{eqnarray} \] This assumes the residual variance is uncorrelated, doesn’t change with respect to any variable. The residual standard error (RSE) is computed from the dataset, because \(\sigma\) is generally unknown. Then a confidence interval may be computed. A 95% confidence interval means there is approximately a 95% chance the interval \(\hat\beta_1 \pm 2 \cdot \text{SE}(\hat\beta_1)\) contains the true value of \(\beta_1\). Standard errors can alos be used for hypothesis testing. The null hypothesis \(H_0\) versus the alternative hypothesis \(H_a\): \[ \begin{eqnarray} H_0 : \text{There is no relationship between } X \text{ and } Y \Rightarrow \beta_1 &=& 0\\ H_a : \text{There is some relationship between } X \text{ and } Y \Rightarrow \beta_1 &\neq& 0 \end{eqnarray} \] So we need to test if \(\hat\beta_1\), our estimate for \(\beta_1\), is sufficiently far away enough from zero to be non-zero. How far is far enough? Use the t-statistic! This measures the number of standard deviations \(\hat\beta_1\) is away from \(0\). \[t = \frac{\hat\beta_1 - 0}{\text{SE} (\hat\beta_1)}\] If there is no relationship between \(X\) and \(Y\), then the expected t-distribution has \(n-2\) degrees of freedom. We check the probability of \(|t|\) if \(\beta_1=0\) using a p-value. Small p-values indicate it is unlikely to observe such a substantial association solely because of chance. Small p-values allow us to reject the null-hypothesis and declare there is some relationship between \(X\) and \(Y\). Typical cutoff p-values are 5 or 1%.

Assessing Model Accuracy

Residual Standard Error (RSE)

Even if we knew the true regression line, we could not perfectly predict \(Y\) from \(X\) because of errors \(\epsilon\). The RSE is an estimate of SE(\(\epsilon\)), roughly speaking the average amount the response will deviate from the true regression line. \[ \text{RSE} = \sqrt{\frac{1}{n-2} \text{RSS}} = \sqrt{\frac{1}{n-2} \sum_{i=1}^n (y_i - \hat y_i)^2} \] Again, RSS is the residual sum of squares. In general, RSE measures the lack of fit of the model to the data.

R^2 Statistic

Similar to the RSE, but the \(R^2\) statistic measures proportion so the value is always between 0 and 1, and independent of the scale of \(Y\). \[ R^2 = \frac{\text{TSS} - \text{RSS}}{\text{TSS}} = 1 - \frac{\text{RSS}}{\text{TSS}} = 1 - \frac{\sum_{i=1}^n (y_i - \hat y_i)^2}{\sum_{i=1}^n (y_i - \bar y)^2} \] TSS measures the total variance in the response \(Y\).In contrast, RSS measures the amount of unexplained variability after regression. Hence, TSS - RSS measures the amount of explainable response variability. If the model well explains the response variability, RSS << 1 and \(R^2 \approx 1\). Thus \(R^2\) measures how well \(X\) explains the variability in \(Y\).

3.2 Multiple Linear Regression

Adding the effects of multiple predictors. We’ll add the \(\mathbf 1\) vector in this equation, but omit it later for ease of notation. \[Y_n = \begin{bmatrix}\mathbf 1 & X \end{bmatrix}_{n \times p+1} \cdot\beta_{p+1}\] ### Estimating Regression Coefficients Since \(\beta\) is unknown, we estimate it using the same least squares method. \[ \begin{eqnarray} \hat Y &=& X \cdot \hat\beta \\ \text{RSS} &=& (Y - \hat Y)^T (Y - \hat Y) \end{eqnarray} \] Estimating multiple regression can have different outcomes than single regression. The slope coefficients are the change in the response to that one variable while holding all the others constant. We can test this by correlating the predictors together. Strong correlations mean one variable is strongly dependent upon another, and it’s effect on the response it superfluous.
This counter-intuitive idea can be displayed by stating high ice cream sales cause more shark attacks. While no one has banned ice cream sales to reduce shark attacks, a more realistic explanation is higher temperatures draw more people to the beach where they buy ice cream and succomb to shark attacks.

Important Questions

  1. Is at least one of the predictors useful in predicting the response? Test the following hypothesis using the F-statistic: \[ \begin{eqnarray} H_0 &:& \beta = 0 \\ H_a &:& \text{ at least one } \beta_j \neq 0 \\ \\ F &=& \frac{(TSS - RSS) / p}{RSS / (n - p - 1)} \end{eqnarray} \] It can be shown that the denominator evalutates to \(\sigma^2\) and, if \(H__0\) is true, the numerator is also \(\sigma^2\), thus if there is no relationship between the response and the predictors we expect the F-statistic to evaluate close to 1. If \(H_a\) is true, then the numerator is \(> \sigma^2\) and \(F > 1\).

  2. Do all of the predictors help explain \(Y\), or is only a subset useful?

The F-statistic and it’s associated p-value explains if at least one of the predictors is useful. Then which one’s are useless? Variable selection pares down the total column space to a set of independent vectors. There are a total of \(2^p\) models that contain subsets of \(p\) variables, so brute-force is infeasible. Use a more automated and efficient approach with three classical methods:

+Forward Selection: begin with the null model – contains only an intercept and no predictors– then fit \(p\) simple linear regressions and add to the null model the variable that results in the lowest RSS. Continue adding predictors until some stopping rule is satisfied.

+Backward Selection: begin with all the model variables, and remove the variable with the largest p-value (the least statistically significant). Continue removing predictors until a stopping rule is reached.

+Mixed Selection: begin with the null model and add predictors until one of the variable’s p-values rises above a certain threshold, then remove that one. Continue adding and removing predictors until all the model variables are significant and all possible variables to add would have large p-values.

  1. How well does the model fit the data?

\(R^2\): the proportion of variance explained, or the correlation of the response and the variable. In multiple linear regression this equals \(\text{Cor}(Y,\hat Y)^2\). Adding more variables always increases the \(R^2\) score, even if the predictor is weakly associated with the response. More predictors allows more degrees of freedom to fit the training data (but not the test data). However, RSE increases slightly when weak predictors add to the model. This is because the RSE is inversely proportional to its degrees of freedom \[\text{RSE} = \sqrt{\frac{1}{n-p-1} \text{RSS}} \]

  1. Given a set of predictor values, what response value is predicted and how accurate is the prediction?

Three sorts of fitted model prediction uncertainty:

  • The least squares plane \(\hat Y = X\hat\beta\) is only an approximation of the *true population regression plane \(f(X) = X\beta\). This is because of the reducible error in how close \(\hat Y \approx f(X)\) bounded by a confidence interval.
  • The real regression function is probably more complicated than a linear model, introducing another irreducible error called model bias.
  • Irreducible error \(\epsilon\) caused by noise (unknown predictors). Prediction intervals are wider than confidence intervals because they incorporate both the reducible and irreducible error.

More to think about Regression Models

Qualitative Predictors

Two Level Predictors

Suppose a predictor has two qualitative (factor) levels: male and female. Incorporating them into a regression model is simple: create a dummy variable that takes on two numerical values

\[x_i = \begin{cases} 1 \quad &\text{if the } i \text{th person is female} \\ 0 \quad &\text{if the } i \text{th person is male} \end{cases} \] \[y_i = \beta_0 + \beta_1 x_i + \epsilon = \begin{cases} \beta_0 + \beta_1 x_i + \epsilon \quad &\text{if the } i \text{th person is female} \\ \beta_0 + \epsilon \quad &\text{if the } i \text{th person is male} \end{cases} \] So \(\beta_0\) is the average predicted response for male and \(\beta_0 + \beta_1\) is the average predicted response for female. But remember the p-value! High p-value means the predictor is not statistically significant. The coefficient ordering is arbitrary, only the numerical interpretation matters.

Multiple Factor Level Predictors

An example is ethnicity where the levels are Asian, Caucasion, and African American. The model just adds another dummy variable \[y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i + \epsilon = \begin{cases} \beta_0 + \beta_1 x_i + \beta_2 x_i + \epsilon \quad &\text{if the } i \text{th person is Asian} \\ \beta_0 + \beta_1 x_i + \epsilon \quad &\text{if the } i \text{th person is Caucasion} \\ \beta_0 + \epsilon \quad &\text{if the } i \text{th person is African American} \end{cases} \] These new coefficients act as average values for the new factor levels in reference to the baseline (in this case African Americans).

Extensions of the Linear Model

Two of the most important assumptions are that the relationship between predictors and response is additive and linear.

Removing the Additive Assumption

  • The effect of changes in a predictor \(X_j\) on the response \(Y\) is independent of the values of the other predictors. There’s no interaction term. For example, spending $100,000 on TV and Radio advertising may be more effective than solely TV or Radio advertising. \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \epsilon \] \(\beta_3\) can be interpreted as the increase in effectiveness of \(X_1\) for a given increase in \(X_2\). A rule of thumb is, if we add an interaction term, we should include the main effects even is they’re insignificant. Fitting qualitative data is just as easy and has an intuitive interpretation. The qualitative coefficient changes the slope and intercept of a another predictor.

Non-Linear Relationships

A simple way to extend linear into nonlinear regression is with polynomial regression. We can add second, third, fourth, as many degrees polynomials as we want and it’s still linear regression! \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 (X_1)^2 + \epsilon \]

Potential Problems

1. Data is Non-linear

Linear regression assumes a straight line (or plane) relationship between the predictors and response, but non-linearity is the rule not the exception. Residual plots are useful visualizations for identifying non-linearity. We plot the residuals \(e_i = y_i - \hat y_i\) versus the fitted response values \(\hat y_i\). Ideally the plot will show no discernible pattern indicating non-linear effects.

2. Error Term Correlation

Given the error terms \(e_1, e_2, ... , e_n\), they are uncorrelated if the value of \(e_i\) gives no information about \(e_{i+1}\). If they are correlated, then the estimated standard errors will tend to underestimate the true standard errors. Meaning, actual confidence intervals are narrower than predicted and p-values are smaller than they should be, leading to erroneous conclusions.

3. Error-Term Variance Variability

Homoscedasticity: An important assumption of linear models is the error terms have constant variance \(\text{Var}(\epsilon_i) = \sigma^2\). The standard errors, confidence intervals, and hypothesis tests rely on this assumption. A residual plot with a funnel shape indicates heterscedasticity.

4. Outliers

Outlier: a point \(y_i\) that is far away from its predicted value \(\hat y_i\). Outliers increase the standard error, throwing off the associated statistics. A useful idea is standardizing the residuals normalizing by the estimated standard deviation. Observations outside \(3\sigma\) would be considered outliers.

5. High-Leverage Points

Outliers are unsual \(y_i\) values given a predictor \(x_i\). High-leverage observations have an unusual \(x_i\) value. These can tweak the regression coefficients. The leverage statistic \(h_i\) computes the weighted distance of the \(x_i\) with an average value equal to \((p+1)/n\).

6. Collinearity

Collinear: two variables are correlated with each other. One can predict the other (dependent). The predictor variance increases, and the power of the t-statistic for each predictor decreases. The variance inflation factor measures collinearity.

Ch3 Linear Regression Lab

library(MASS); library(ISLR)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
# medv: median value of Boston homes. lstat: percent of households with low socioeconomic status
lm.fit <- lm(medv ~ lstat, data = Boston) # fit a linear model to the data
summary(lm.fit) # linear model information
## 
## Call:
## lm(formula = medv ~ lstat, data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.168  -3.990  -1.318   2.034  24.500 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.55384    0.56263   61.41   <2e-16 ***
## lstat       -0.95005    0.03873  -24.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.216 on 504 degrees of freedom
## Multiple R-squared:  0.5441, Adjusted R-squared:  0.5432 
## F-statistic: 601.6 on 1 and 504 DF,  p-value: < 2.2e-16
confint(lm.fit) # coefficient confidence intervals
##                 2.5 %     97.5 %
## (Intercept) 33.448457 35.6592247
## lstat       -1.026148 -0.8739505
# now use the model to predict on unseen values of lstat (x)
# Confidence and prediction intervals are centered on the same point, 
# but prediction intervals include irreducible error as well as model error.
predict(lm.fit, tibble(lstat = c(5,10,15)),
        interval = "confidence") 
##        fit      lwr      upr
## 1 29.80359 29.00741 30.59978
## 2 25.05335 24.47413 25.63256
## 3 20.30310 19.73159 20.87461
predict(lm.fit, tibble(lstat = c(5,10,15)),
        interval = "prediction") 
##        fit       lwr      upr
## 1 29.80359 17.565675 42.04151
## 2 25.05335 12.827626 37.27907
## 3 20.30310  8.077742 32.52846
ggplot(Boston, aes(lstat, medv)) +
  geom_point() +
  geom_smooth(method = "lm")

plot(lm.fit)

# Multiple Linear Regression
lm.fit <- lm(medv ~ lstat + age, data = Boston)
summary(lm.fit)
## 
## Call:
## lm(formula = medv ~ lstat + age, data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.981  -3.978  -1.283   1.968  23.158 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.22276    0.73085  45.458  < 2e-16 ***
## lstat       -1.03207    0.04819 -21.416  < 2e-16 ***
## age          0.03454    0.01223   2.826  0.00491 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.173 on 503 degrees of freedom
## Multiple R-squared:  0.5513, Adjusted R-squared:  0.5495 
## F-statistic:   309 on 2 and 503 DF,  p-value: < 2.2e-16
# short hand for all the predictors (.)
lm.fit <- lm(medv ~ ., data = Boston)
summary(lm.fit)
## 
## Call:
## lm(formula = medv ~ ., data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.595  -2.730  -0.518   1.777  26.199 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
## crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
## zn           4.642e-02  1.373e-02   3.382 0.000778 ***
## indus        2.056e-02  6.150e-02   0.334 0.738288    
## chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
## nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
## rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
## age          6.922e-04  1.321e-02   0.052 0.958229    
## dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
## rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
## tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
## ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
## black        9.312e-03  2.686e-03   3.467 0.000573 ***
## lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared:  0.7406, Adjusted R-squared:  0.7338 
## F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16
# exclude a predictor
lm.fit1 <- lm(medv ~ . -age, data = Boston)
lm.fit1 <- update(lm.fit, ~. -age)
summary(lm.fit1)
## 
## Call:
## lm(formula = medv ~ crim + zn + indus + chas + nox + rm + dis + 
##     rad + tax + ptratio + black + lstat, data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.6054  -2.7313  -0.5188   1.7601  26.2243 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  36.436927   5.080119   7.172 2.72e-12 ***
## crim         -0.108006   0.032832  -3.290 0.001075 ** 
## zn            0.046334   0.013613   3.404 0.000719 ***
## indus         0.020562   0.061433   0.335 0.737989    
## chas          2.689026   0.859598   3.128 0.001863 ** 
## nox         -17.713540   3.679308  -4.814 1.97e-06 ***
## rm            3.814394   0.408480   9.338  < 2e-16 ***
## dis          -1.478612   0.190611  -7.757 5.03e-14 ***
## rad           0.305786   0.066089   4.627 4.75e-06 ***
## tax          -0.012329   0.003755  -3.283 0.001099 ** 
## ptratio      -0.952211   0.130294  -7.308 1.10e-12 ***
## black         0.009321   0.002678   3.481 0.000544 ***
## lstat        -0.523852   0.047625 -10.999  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.74 on 493 degrees of freedom
## Multiple R-squared:  0.7406, Adjusted R-squared:  0.7343 
## F-statistic: 117.3 on 12 and 493 DF,  p-value: < 2.2e-16
# Interaction Terms
# lstat:black means interaction term
# lstat*black is shorthand for lstat + black + lstat:black
summary(lm(medv ~ lstat*age, data = Boston))
## 
## Call:
## lm(formula = medv ~ lstat * age, data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.806  -4.045  -1.333   2.085  27.552 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.0885359  1.4698355  24.553  < 2e-16 ***
## lstat       -1.3921168  0.1674555  -8.313 8.78e-16 ***
## age         -0.0007209  0.0198792  -0.036   0.9711    
## lstat:age    0.0041560  0.0018518   2.244   0.0252 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.149 on 502 degrees of freedom
## Multiple R-squared:  0.5557, Adjusted R-squared:  0.5531 
## F-statistic: 209.3 on 3 and 502 DF,  p-value: < 2.2e-16
# Non-linear predictor transformations
lm.fit2 <- lm(medv ~ lstat + I(lstat^2), data = Boston)
summary(lm.fit2)
## 
## Call:
## lm(formula = medv ~ lstat + I(lstat^2), data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2834  -3.8313  -0.5295   2.3095  25.4148 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 42.862007   0.872084   49.15   <2e-16 ***
## lstat       -2.332821   0.123803  -18.84   <2e-16 ***
## I(lstat^2)   0.043547   0.003745   11.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.524 on 503 degrees of freedom
## Multiple R-squared:  0.6407, Adjusted R-squared:  0.6393 
## F-statistic: 448.5 on 2 and 503 DF,  p-value: < 2.2e-16
lm.fit <- lm(medv ~ lstat, data = Boston)
# anova performs a hypothesis test comparing the two models. 
# H_0: both models fit the data equally well
# H_a: the full model is superior
anova(lm.fit, lm.fit2)
## Analysis of Variance Table
## 
## Model 1: medv ~ lstat
## Model 2: medv ~ lstat + I(lstat^2)
##   Res.Df   RSS Df Sum of Sq     F    Pr(>F)    
## 1    504 19472                                 
## 2    503 15347  1    4125.1 135.2 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(lm.fit2)

lm.fit5 <- lm(medv ~ poly(lstat, 5), data = Boston)
summary(lm.fit5)
## 
## Call:
## lm(formula = medv ~ poly(lstat, 5), data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5433  -3.1039  -0.7052   2.0844  27.1153 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       22.5328     0.2318  97.197  < 2e-16 ***
## poly(lstat, 5)1 -152.4595     5.2148 -29.236  < 2e-16 ***
## poly(lstat, 5)2   64.2272     5.2148  12.316  < 2e-16 ***
## poly(lstat, 5)3  -27.0511     5.2148  -5.187 3.10e-07 ***
## poly(lstat, 5)4   25.4517     5.2148   4.881 1.42e-06 ***
## poly(lstat, 5)5  -19.2524     5.2148  -3.692 0.000247 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.215 on 500 degrees of freedom
## Multiple R-squared:  0.6817, Adjusted R-squared:  0.6785 
## F-statistic: 214.2 on 5 and 500 DF,  p-value: < 2.2e-16
summary(lm(medv ~ log(rm), data = Boston))
## 
## Call:
## lm(formula = medv ~ log(rm), data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.487  -2.875  -0.104   2.837  39.816 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -76.488      5.028  -15.21   <2e-16 ***
## log(rm)       54.055      2.739   19.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.915 on 504 degrees of freedom
## Multiple R-squared:  0.4358, Adjusted R-squared:  0.4347 
## F-statistic: 389.3 on 1 and 504 DF,  p-value: < 2.2e-16
# Qualitative predictors
str(Carseats)
## 'data.frame':    400 obs. of  11 variables:
##  $ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
##  $ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
##  $ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
##  $ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
##  $ Population : num  276 260 269 466 340 501 45 425 108 131 ...
##  $ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
##  $ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
##  $ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
##  $ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
##  $ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
##  $ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...
lm.fit <- lm(Sales ~ . +Income:Advertising + Price:Age, data = Carseats)
summary(lm.fit)
## 
## Call:
## lm(formula = Sales ~ . + Income:Advertising + Price:Age, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9208 -0.7503  0.0177  0.6754  3.3413 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         6.5755654  1.0087470   6.519 2.22e-10 ***
## CompPrice           0.0929371  0.0041183  22.567  < 2e-16 ***
## Income              0.0108940  0.0026044   4.183 3.57e-05 ***
## Advertising         0.0702462  0.0226091   3.107 0.002030 ** 
## Population          0.0001592  0.0003679   0.433 0.665330    
## Price              -0.1008064  0.0074399 -13.549  < 2e-16 ***
## ShelveLocGood       4.8486762  0.1528378  31.724  < 2e-16 ***
## ShelveLocMedium     1.9532620  0.1257682  15.531  < 2e-16 ***
## Age                -0.0579466  0.0159506  -3.633 0.000318 ***
## Education          -0.0208525  0.0196131  -1.063 0.288361    
## UrbanYes            0.1401597  0.1124019   1.247 0.213171    
## USYes              -0.1575571  0.1489234  -1.058 0.290729    
## Income:Advertising  0.0007510  0.0002784   2.698 0.007290 ** 
## Price:Age           0.0001068  0.0001333   0.801 0.423812    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.011 on 386 degrees of freedom
## Multiple R-squared:  0.8761, Adjusted R-squared:  0.8719 
## F-statistic:   210 on 13 and 386 DF,  p-value: < 2.2e-16
contrasts(Carseats$ShelveLoc) # shows the ordering of dummy variables
##        Good Medium
## Bad       0      0
## Good      1      0
## Medium    0      1

Linear Regression Exercises pg. 120

  1. Explain the conclusions stated by Table 3.4 pg. 74 Coefficient, Std. error, t-statistic, p-value Std.error quantifies how close the coefficient is to the true population parameter. It’s the average amount the \(\hat\beta\) differs from \(\beta\). The further \(x_i\) are spread out, the smaller the standard error because they provide more leverage. Larger standard errors cause larger confidence intervals. If the confidence interval is statistically different (using the t-statistic to compute a p-value) from the null hypothesis \(H_0: \beta = 0\), then we can reject it. Table 3.4 states that advertising on TV and radio have a significant effect on Sales while newspaper advertising does not have a significant effect (p-value = 0.8599).

  2. What’s the difference between KNN classifier and KNN regression methods? KNN regression averages the \(y_i\) values for the closest \(k\) \(x_i\) points. KNN classifier picks a point \(x_0\) and classifies it based on the closest \(k\) \(x_i\) points.

  3. Answer the following about predictors and responses.
  • The female (1) male (0) coefficient is 35, meaning on average, females earn $35k more than males all else equal.
  • See previous answer
  • The interaction term is -10, meaning for each point increase in GPA, females earn $10k less on average.
  • See previous answer
  1. \(\text{Female salary} = 50 + 20 (4.0) + 0.07 (110) + 35 + 0.01 (4.0 \times 110) - 10 (4.0) = 137.1\)
  2. We don’t know the standard error of the interaction between GPA and IQ, so we can’t state whether it’s insignificant or not.
  1. a higher order linear model will always have a lower RSS because it has fewer degrees of freedom in the training set.
  2. However, high order linear models have higher variance which may perform poorly on a test set because of overfitting.
  3. Given a non-linear response, a lower order linear model will have high bias and a lower RSS for the training set than a higher order linear model.
  4. A high order linear model will perform better on a non-linear test set than a simple linear model.
  1. Not sure what this \(i'\) number is… Basically this is an algebra problem.

  2. use Eq 3.4 to prove a linear model always passes through \(\bar x, \bar y\). Since the intercept is dependent on \(\bar x\) and \(\bar y\),, and \(\beta_1\) is a straight line, it will always go through the average. \[ \begin{eqnarray} \text{Example: }\bar x, \bar y = 0 \\ \beta_0 = \bar y - \beta_1 \bar x = 0 \end{eqnarray} \]

  3. Prove eq 3.18 \(\text R^2 = \text{Cor}(X,Y)^2\) given \(\bar x = \bar y = 0\).

lm.fit <- lm(mpg ~ horsepower, data = Auto)
summary(lm.fit)
## 
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 39.935861   0.717499   55.66   <2e-16 ***
## horsepower  -0.157845   0.006446  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16
print("predict mpg for horsepower = 98")
## [1] "predict mpg for horsepower = 98"
print("confidence interval")
## [1] "confidence interval"
predict(lm.fit, tibble(horsepower = 98), interval = "confidence")
##        fit      lwr      upr
## 1 24.46708 23.97308 24.96108
print("prediction interval")
## [1] "prediction interval"
predict(lm.fit, tibble(horsepower = 98), interval = "prediction")
##        fit     lwr      upr
## 1 24.46708 14.8094 34.12476

Strong negative relationship between horsepower and mpg.

## 
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
## 
##     nasa

## 
## Call:
## lm(formula = mpg ~ ., data = Auto1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5903 -2.1565 -0.1169  1.8690 13.0604 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.218435   4.644294  -3.707  0.00024 ***
## cylinders     -0.493376   0.323282  -1.526  0.12780    
## displacement   0.019896   0.007515   2.647  0.00844 ** 
## horsepower    -0.016951   0.013787  -1.230  0.21963    
## weight        -0.006474   0.000652  -9.929  < 2e-16 ***
## acceleration   0.080576   0.098845   0.815  0.41548    
## year           0.750773   0.050973  14.729  < 2e-16 ***
## origin         1.426141   0.278136   5.127 4.67e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared:  0.8215, Adjusted R-squared:  0.8182 
## F-statistic: 252.4 on 7 and 384 DF,  p-value: < 2.2e-16

library(ISLR)
lm.fit <- lm(Sales ~ Price + Urban + US, data = Carseats)
summary(lm.fit) # Urban not significant
## 
## Call:
## lm(formula = Sales ~ Price + Urban + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.043469   0.651012  20.036  < 2e-16 ***
## Price       -0.054459   0.005242 -10.389  < 2e-16 ***
## UrbanYes    -0.021916   0.271650  -0.081    0.936    
## USYes        1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16
lm.fit1 <- lm(Sales ~ Price + US, data = Carseats)
summary(lm.fit1)
## 
## Call:
## lm(formula = Sales ~ Price + US, data = Carseats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## Price       -0.05448    0.00523 -10.416  < 2e-16 ***
## USYes        1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16
anova(lm.fit, lm.fit1) # basically the models are the same
## Analysis of Variance Table
## 
## Model 1: Sales ~ Price + Urban + US
## Model 2: Sales ~ Price + US
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    396 2420.8                           
## 2    397 2420.9 -1  -0.03979 0.0065 0.9357
plot(lm.fit1)

set.seed(1)
x <- rnorm(100)
y <- 2 * x + rnorm(100)
ggplot() + geom_point(aes(x, y))

ggplot() + geom_point(aes(y, x))

lm.fit <- lm(y ~ x + 0)
summary(lm.fit)
## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9154 -0.6472 -0.1771  0.5056  2.3109 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   1.9939     0.1065   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16
lm.fit1 <- lm(x ~ y + 0)
summary(lm.fit1)
## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8699 -0.2368  0.1030  0.2858  0.8938 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## y  0.39111    0.02089   18.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared:  0.7798, Adjusted R-squared:  0.7776 
## F-statistic: 350.7 on 1 and 99 DF,  p-value: < 2.2e-16
  1. Reference eq 3.38 pg. 121
x <- rnorm(100)
y <- rnorm(100)

ggplot() + geom_point(aes(x, y))

ggplot() + geom_point(aes(y, x))

lm.fit <- lm(y ~ x + 0)
summary(lm.fit)
## 
## Call:
## lm(formula = y ~ x + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.69463 -0.51184  0.03249  0.72865  1.89998 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)
## x  0.10757    0.09585   1.122    0.264
## 
## Residual standard error: 0.9868 on 99 degrees of freedom
## Multiple R-squared:  0.01256,    Adjusted R-squared:  0.00259 
## F-statistic:  1.26 on 1 and 99 DF,  p-value: 0.2644
lm.fit1 <- lm(x ~ y + 0)
summary(lm.fit1)
## 
## Call:
## lm(formula = x ~ y + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.70567 -0.54321  0.04474  0.65648  2.65721 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)
## y   0.1168     0.1041   1.122    0.264
## 
## Residual standard error: 1.028 on 99 degrees of freedom
## Multiple R-squared:  0.01256,    Adjusted R-squared:  0.00259 
## F-statistic:  1.26 on 1 and 99 DF,  p-value: 0.2644
set.seed(1)
x <- rnorm(100)
eps <- rnorm(100, sd = 0.25)
y <- -1 + 0.5 * x + eps
ggplot() + 
  geom_point(aes(x, y)) +
  geom_smooth(method = "lm", se = F, aes(x = x, y = y, color = "lsq")) +
  geom_abline(aes(slope = 0.5, intercept = -1, color = "pop")) +
  labs(color = "Regression Lines")

?geom_abline
summary(lm(y ~ x))
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46921 -0.15344 -0.03487  0.13485  0.58654 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.00942    0.02425  -41.63   <2e-16 ***
## x            0.49973    0.02693   18.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2407 on 98 degrees of freedom
## Multiple R-squared:  0.7784, Adjusted R-squared:  0.7762 
## F-statistic: 344.3 on 1 and 98 DF,  p-value: < 2.2e-16
summary(lm(y ~ x + I(x^2)))
## 
## Call:
## lm(formula = y ~ x + I(x^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4913 -0.1563 -0.0322  0.1451  0.5675 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.98582    0.02941 -33.516   <2e-16 ***
## x            0.50429    0.02700  18.680   <2e-16 ***
## I(x^2)      -0.02973    0.02119  -1.403    0.164    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2395 on 97 degrees of freedom
## Multiple R-squared:  0.7828, Adjusted R-squared:  0.7784 
## F-statistic: 174.8 on 2 and 97 DF,  p-value: < 2.2e-16
  1. Collinearity
set.seed (1)
x1=runif (100)
x2 =0.5* x1+rnorm (100) /10
y=2+2* x1 +0.3* x2+rnorm (100)

cor(x1, x2)
## [1] 0.8351212
ggplot() + geom_point(aes(x1, x2))

summary(lm(y ~ x1 + x2))
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8311 -0.7273 -0.0537  0.6338  2.3359 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1305     0.2319   9.188 7.61e-15 ***
## x1            1.4396     0.7212   1.996   0.0487 *  
## x2            1.0097     1.1337   0.891   0.3754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared:  0.2088, Adjusted R-squared:  0.1925 
## F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05
summary(lm(y ~ x1))
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.89495 -0.66874 -0.07785  0.59221  2.45560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1124     0.2307   9.155 8.27e-15 ***
## x1            1.9759     0.3963   4.986 2.66e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared:  0.2024, Adjusted R-squared:  0.1942 
## F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06
summary(lm(y ~ x2))
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.62687 -0.75156 -0.03598  0.72383  2.44890 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3899     0.1949   12.26  < 2e-16 ***
## x2            2.8996     0.6330    4.58 1.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared:  0.1763, Adjusted R-squared:  0.1679 
## F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05
x1 <- c(x1, 0.1)
x2 <- c(x2, 0.8)
y <- c(y, 6)

summary(lm(y ~ x1 + x2))
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.73348 -0.69318 -0.05263  0.66385  2.30619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2267     0.2314   9.624 7.91e-16 ***
## x1            0.5394     0.5922   0.911  0.36458    
## x2            2.5146     0.8977   2.801  0.00614 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.075 on 98 degrees of freedom
## Multiple R-squared:  0.2188, Adjusted R-squared:  0.2029 
## F-statistic: 13.72 on 2 and 98 DF,  p-value: 5.564e-06
summary(lm(y ~ x1))
## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8897 -0.6556 -0.0909  0.5682  3.5665 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2569     0.2390   9.445 1.78e-15 ***
## x1            1.7657     0.4124   4.282 4.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.111 on 99 degrees of freedom
## Multiple R-squared:  0.1562, Adjusted R-squared:  0.1477 
## F-statistic: 18.33 on 1 and 99 DF,  p-value: 4.295e-05
summary(lm(y ~ x2))
## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.64729 -0.71021 -0.06899  0.72699  2.38074 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3451     0.1912  12.264  < 2e-16 ***
## x2            3.1190     0.6040   5.164 1.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.074 on 99 degrees of freedom
## Multiple R-squared:  0.2122, Adjusted R-squared:  0.2042 
## F-statistic: 26.66 on 1 and 99 DF,  p-value: 1.253e-06
library(MASS);library(GGally)

ggpairs(Boston)

lm.fit <- lm(medv ~ ., data = Boston)
summary(lm.fit)
## 
## Call:
## lm(formula = medv ~ ., data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.595  -2.730  -0.518   1.777  26.199 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
## crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
## zn           4.642e-02  1.373e-02   3.382 0.000778 ***
## indus        2.056e-02  6.150e-02   0.334 0.738288    
## chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
## nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
## rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
## age          6.922e-04  1.321e-02   0.052 0.958229    
## dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
## rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
## tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
## ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
## black        9.312e-03  2.686e-03   3.467 0.000573 ***
## lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared:  0.7406, Adjusted R-squared:  0.7338 
## F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16