library(tidyverse)
## ── Attaching packages ──────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ─────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
options(scipen=999)
data(SaratogaHouses, package="mosaicData")
houses_lm <- lm(price ~ lotSize + age + landValue +
livingArea + bedrooms + bathrooms +
waterfront,
data = SaratogaHouses)
# View summary of model 1
summary(houses_lm)
##
## Call:
## lm(formula = price ~ lotSize + age + landValue + livingArea +
## bedrooms + bathrooms + waterfront, data = SaratogaHouses)
##
## Residuals:
## Min 1Q Median 3Q Max
## -220208 -35416 -5443 27570 464320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 139878.80484 16472.92736 8.491 < 0.0000000000000002 ***
## lotSize 7500.79232 2075.13554 3.615 0.000309 ***
## age -136.04011 54.15794 -2.512 0.012099 *
## landValue 0.90931 0.04583 19.841 < 0.0000000000000002 ***
## livingArea 75.17866 4.15811 18.080 < 0.0000000000000002 ***
## bedrooms -5766.75988 2388.43256 -2.414 0.015863 *
## bathrooms 24547.10644 3332.26775 7.366 0.000000000000271 ***
## waterfrontNo -120726.62066 15600.82783 -7.738 0.000000000000017 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59370 on 1720 degrees of freedom
## Multiple R-squared: 0.6378, Adjusted R-squared: 0.6363
## F-statistic: 432.6 on 7 and 1720 DF, p-value: < 0.00000000000000022
Interpretation
Intercept line, for example, indicates that the coefficient is significant at 0.1% signficance level (low p-values). It means that we are 99.9% confident that the interecept is true. One the other hand, The variable age has only one star. It means that we are only 95% confident that age is meaningful in explaining home prices. If a variable had no star, it would have meant that we are not confident of the reported coefficient at all. In other words, it would be highly unlikely that changes in the variable with no star is meaningful in explaining changes in the home prices.living area is 75.18. It means that an increase of one square foot of living area is associated with a home price increase of $75, holding the other variables constant. When interpreting coeffcients, make sure to check the unit of the variables in the data.living area = 0). Of coure, living area can’t be zero. Often, interpret is meaningless.hightemp, and precip.Hint: The variables are available in the RailTrail data set from the mosaicData package.
data(RailTrail, package="mosaicData")
trailusers_lm <- lm(volume ~ hightemp + precip,
data = RailTrail)
# View summary of model 1
summary(trailusers_lm)
##
## Call:
## lm(formula = volume ~ hightemp + precip, data = RailTrail)
##
## Residuals:
## Min 1Q Median 3Q Max
## -271.311 -56.545 5.915 48.962 296.453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -31.5197 55.2383 -0.571 0.56973
## hightemp 6.1177 0.7941 7.704 0.0000000000197 ***
## precip -153.2608 39.3071 -3.899 0.00019 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 96.68 on 87 degrees of freedom
## Multiple R-squared: 0.4377, Adjusted R-squared: 0.4247
## F-statistic: 33.85 on 2 and 87 DF, p-value: 0.00000000001334
hightemp statistically significant at 5%?Yes the coeffiecient of hightemp is statistically significant at 5% because the p is 0.0000000000197, which is less than 5%. ## Q3 Interpret the coefficient of hightemp? Three stars at the end of the p value means the variable is significant at p value. Also indicates we are 99.9% true. ## Q4 Is the intercept statistically significant at 5%? Yes intercept is statistically significant at 5% because the p value is 0.0000000000000002, which is less than 5%. ## Q5 Interpret the intercept? Three stars at the end of p value means variable is significant. Also indicates we are 99.9% true. ## Q6 Interpret the reported residual standard error. Reported risdual error is at 96.68 is the difference btween the amount of predicted trail users abd the amount of actual trail users. ## Q7 Interpret the reported adjusted R squared. Reported adjusted R^2 is .4247, this means that 42.47% of variablity in trail users can be explained by the mdoel. ## Q8 Hide the messages, but display the code and its results on the webpage. Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.