library(tidyverse)
library(scales)
options(scipen=999)
data(SaratogaHouses, package="mosaicData")
houses_lm <- lm(price ~ lotSize + age + landValue +
                  livingArea + bedrooms + bathrooms +
                  waterfront, 
                data = SaratogaHouses)

# View summary of model 1
summary(houses_lm)
## 
## Call:
## lm(formula = price ~ lotSize + age + landValue + livingArea + 
##     bedrooms + bathrooms + waterfront, data = SaratogaHouses)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -220208  -35416   -5443   27570  464320 
## 
## Coefficients:
##                   Estimate    Std. Error t value             Pr(>|t|)    
## (Intercept)   139878.80484   16472.92736   8.491 < 0.0000000000000002 ***
## lotSize         7500.79232    2075.13554   3.615             0.000309 ***
## age             -136.04011      54.15794  -2.512             0.012099 *  
## landValue          0.90931       0.04583  19.841 < 0.0000000000000002 ***
## livingArea        75.17866       4.15811  18.080 < 0.0000000000000002 ***
## bedrooms       -5766.75988    2388.43256  -2.414             0.015863 *  
## bathrooms      24547.10644    3332.26775   7.366    0.000000000000271 ***
## waterfrontNo -120726.62066   15600.82783  -7.738    0.000000000000017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59370 on 1720 degrees of freedom
## Multiple R-squared:  0.6378, Adjusted R-squared:  0.6363 
## F-statistic: 432.6 on 7 and 1720 DF,  p-value: < 0.00000000000000022

Interpretation

Q1 Build a regression model to predict the volume of trail users using hightemp, and precip.

Hint: The variables are available in the RailTrail data set from the mosaicData package.

data(RailTrail, package="mosaicData")
volume_users <- lm(volume ~ hightemp + precip, 
                data = RailTrail)

# View summary of model 1
summary(volume_users)
## 
## Call:
## lm(formula = volume ~ hightemp + precip, data = RailTrail)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -271.311  -56.545    5.915   48.962  296.453 
## 
## Coefficients:
##              Estimate Std. Error t value        Pr(>|t|)    
## (Intercept)  -31.5197    55.2383  -0.571         0.56973    
## hightemp       6.1177     0.7941   7.704 0.0000000000197 ***
## precip      -153.2608    39.3071  -3.899         0.00019 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.68 on 87 degrees of freedom
## Multiple R-squared:  0.4377, Adjusted R-squared:  0.4247 
## F-statistic: 33.85 on 2 and 87 DF,  p-value: 0.00000000001334

Q2 Is the coefficient of hightemp statistically significant at 5%?

Yes, because the P value is less than 5%.

Q3 Interpret the coefficient of hightemp?

The coefficient of ‘hightemp’ is 6.1177. This means that an increase of 1 for the ‘hightemp’ would is associated with an increase in the trail users by 6.1177.

Q4 Is the intercept statistically significant at 5%?

No, because the P value is greater than 5%.

Q5 Interpret the intercept?

The intercept is -31.5197. This translates to that if all other variables were zero, then the volume of trail users would be that amount. This doesn’t however really mean anything, because it is impossible to have a negative number of people.

Q6 Interpret the reported residual standard error.

The difference between the actual number of trail users and how this number is predicted by the model is 96.68. ## Q7 Interpret the reported adjusted R squared. The reported adjusted R squared of the model is 0.4377. This means that 43.77% of the variability in the volume of trail users can be explained by the model.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.