Regression Analysis

Q1 Build a regression model to predict the volume of trail users using hightemp, and precip.
Q2 Is the coefficient of hightemp statistically significant at 5%?
Q3 Interpret the coefficient of hightemp?
Q4 Is the intercept statistically significant at 5%?
Q5 Interpret the intercept?
Q6 Interpret the reported residual standard error.
Q7 Interpret the reported adjusted R squared.
Q8 Hide the messages, but display the code and its results on the webpage.
Q9 Display the title and your name correctly at the top of the webpage.
Q10 Use the correct slug.

library(tidyverse)
library(scales)
options(scipen=999)

data(SaratogaHouses, package="mosaicData")
houses_lm <- lm(price ~ lotSize + age + landValue +
                  livingArea + bedrooms + bathrooms +
                  waterfront, 
                data = SaratogaHouses)

# View summary of model 1
summary(houses_lm)

## 
## Call:
## lm(formula = price ~ lotSize + age + landValue + livingArea + 
##     bedrooms + bathrooms + waterfront, data = SaratogaHouses)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -220208  -35416   -5443   27570  464320 
## 
## Coefficients:
##                   Estimate    Std. Error t value             Pr(>|t|)    
## (Intercept)   139878.80484   16472.92736   8.491 < 0.0000000000000002 ***
## lotSize         7500.79232    2075.13554   3.615             0.000309 ***
## age             -136.04011      54.15794  -2.512             0.012099 *  
## landValue          0.90931       0.04583  19.841 < 0.0000000000000002 ***
## livingArea        75.17866       4.15811  18.080 < 0.0000000000000002 ***
## bedrooms       -5766.75988    2388.43256  -2.414             0.015863 *  
## bathrooms      24547.10644    3332.26775   7.366    0.000000000000271 ***
## waterfrontNo -120726.62066   15600.82783  -7.738    0.000000000000017 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59370 on 1720 degrees of freedom
## Multiple R-squared:  0.6378, Adjusted R-squared:  0.6363 
## F-statistic: 432.6 on 7 and 1720 DF,  p-value: < 0.00000000000000022

Interpretation

significance of coefficients summary(houses_lm) returns coefficients and their significance under Coefficients. The number of * at the end of the line indicates how significant the coefficient is. Three stars *** at the end of the Intercept line, for example, indicates that the coefficient is significant at 0.1% signficance level (low p-values). It means that we are 99.9% confident that the interecept is true. One the other hand, The variable age has only one star. It means that we are only 95% confident that age is meaningful in explaining home prices. If a variable had no star, it would have meant that we are not confident of the reported coefficient at all. In other words, it would be highly unlikely that changes in the variable with no star is meaningful in explaining changes in the home prices.
coefficient of living area The coefficient of living area is 75.18. It means that an increase of one square foot of living area is associated with a home price increase of $75, holding the other variables constant. When interpreting coeffcients, make sure to check the unit of the variables in the data.
intercept The intercept is 139878.80. A technical interpretation would be that a home would cost about $139K if all other variables were zero (i.e., living area = 0). Of coure, living area can’t be zero. Often, interpret is meaningless.
residual standard error The typical difference between the actual home price and the home price predicted by the model is 59370. In other words, the model estimated home price misses the actual home price by 59370 on average.
Adjusted R-squared The reported R^2 of the model is 0.6363. It means that 63.63% of the variability in home price can be explained by the model.

Q1 Build a regression model to predict the volume of trail users using `hightemp`, and `precip`.

Hint: The variables are available in the RailTrail data set from the mosaicData package.

data(RailTrail, package="mosaicData")
Trail_Users<- lm(volume ~ hightemp + precip, 
                data = RailTrail)
# View summary of model 1
summary(Trail_Users)

## 
## Call:
## lm(formula = volume ~ hightemp + precip, data = RailTrail)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -271.311  -56.545    5.915   48.962  296.453 
## 
## Coefficients:
##              Estimate Std. Error t value        Pr(>|t|)    
## (Intercept)  -31.5197    55.2383  -0.571         0.56973    
## hightemp       6.1177     0.7941   7.704 0.0000000000197 ***
## precip      -153.2608    39.3071  -3.899         0.00019 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 96.68 on 87 degrees of freedom
## Multiple R-squared:  0.4377, Adjusted R-squared:  0.4247 
## F-statistic: 33.85 on 2 and 87 DF,  p-value: 0.00000000001334

Q2 Is the coefficient of `hightemp` statistically significant at 5%?

The coefficient is significant at 5%, its P value is at 0.0000000000197 *** which is much less than 5.

Q3 Interpret the coefficient of `hightemp`?

The three stars at the end of the P value indicate that the variable is signicant at low P values. This also indicates that we are 99.9% confident that the intercept and true.

Q4 Is the intercept statistically significant at 5%?

The intercept is 0.56973 so at 5% it is not statistaclly signicant

Q5 Interpret the intercept?

The intercept is -31.5197, at this intercept it is technically incorrect because there cannot be a negative amount of trail users.

Q6 Interpret the reported residual standard error.

The reported residual error is at 96.68. This represents the difference between the amount of predicted trail users and the amount of actual trail users.

Q7 Interpret the reported adjusted R squared.

The reported adjusted R^2 is 0.4247, this means that 42.47% of variability in trail users can be explaind by the model.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Regression Analysis

Mia LeBoeuf

Q1 Build a regression model to predict the volume of trail users using hightemp, and precip.

Q2 Is the coefficient of hightemp statistically significant at 5%?

Q3 Interpret the coefficient of hightemp?