library(tidyverse)
library(scales)
options(scipen=999)
data(SaratogaHouses, package="mosaicData")
houses_lm <- lm(price ~ lotSize + age + landValue +
                  livingArea + bedrooms + bathrooms +
                  waterfront, 
                data = SaratogaHouses)

# View summary of model 1
summary(houses_lm)
## 
## Call:
## lm(formula = price ~ lotSize + age + landValue + livingArea + 
##     bedrooms + bathrooms + waterfront, data = SaratogaHouses)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -220208  -35416   -5443   27570  464320 
## 
## Coefficients:
##                   Estimate    Std. Error t value             Pr(>|t|)    
## (Intercept)   139878.80484   16472.92736   8.491 < 0.0000000000000002 ***
## lotSize         7500.79232    2075.13554   3.615             0.000309 ***
## age             -136.04011      54.15794  -2.512             0.012099 *  
## landValue          0.90931       0.04583  19.841 < 0.0000000000000002 ***
## livingArea        75.17866       4.15811  18.080 < 0.0000000000000002 ***
## bedrooms       -5766.75988    2388.43256  -2.414             0.015863 *  
## bathrooms      24547.10644    3332.26775   7.366    0.000000000000271 ***
## waterfrontNo -120726.62066   15600.82783  -7.738    0.000000000000017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59370 on 1720 degrees of freedom
## Multiple R-squared:  0.6378, Adjusted R-squared:  0.6363 
## F-statistic: 432.6 on 7 and 1720 DF,  p-value: < 0.00000000000000022

Interpretation

Q1 Build a regression model to predict the volume of trail users using hightemp, and precip.

Hint: The variables are available in the RailTrail data set from the mosaicData package.

data(RailTrail, package="mosaicData")
Trails_lm <- lm(volume ~ hightemp + precip, 
                data = RailTrail)

# View summary of model 1
summary(Trails_lm)
## 
## Call:
## lm(formula = volume ~ hightemp + precip, data = RailTrail)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -271.311  -56.545    5.915   48.962  296.453 
## 
## Coefficients:
##              Estimate Std. Error t value        Pr(>|t|)    
## (Intercept)  -31.5197    55.2383  -0.571         0.56973    
## hightemp       6.1177     0.7941   7.704 0.0000000000197 ***
## precip      -153.2608    39.3071  -3.899         0.00019 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.68 on 87 degrees of freedom
## Multiple R-squared:  0.4377, Adjusted R-squared:  0.4247 
## F-statistic: 33.85 on 2 and 87 DF,  p-value: 0.00000000001334

Q2 Is the coefficient of hightemp statistically significant at 5%?

The coefficient of ‘hightemp’ is statistically significant at 5% because the p value of ‘hightemp’ is smaller than 5%.

Q3 Interpret the coefficient of hightemp?

For every change in an additional unit of ‘hightemp’ which is in degrees fahrenheit an additional six people will be added to the volume of trail users.

Q4 Is the intercept statistically significant at 5%?

No the intercept is not statistically signinficant at 5% because the p value of the intercept is greater than 5%.

Q5 Interpret the intercept?

When all predictors are at the value 0 the intercept is the value the volume of trail users will become.

Q6 Interpret the reported residual standard error.

On average the difference between the actual volume and the volume predicted by the model is 97 people we can see this in the reported residual standard.

Q7 Interpret the reported adjusted R squared.

42.47% of the variability in Trail Volume can be explained by the model.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.