R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
library(ggpubr)
library(ggrepel)
library(dplyr)


homes <- read.csv(text = readLines("D:/DataSet/Homes.csv")) 

#Build Linear Model I will build a linear model to predict home price based on number of bedrooms, number of bathrooms, and square footage.

# Build linear model
lm_model <- lm(price ~ beds + bath + sqft, data = homes)

summary(lm_model)
## 
## Call:
## lm(formula = price ~ beds + bath + sqft, data = homes)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -8861628  -603743   -40335   456976 12242389 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -848739.0   174017.2  -4.877 1.46e-06 ***
## beds        -959036.4   121405.1  -7.899 1.87e-14 ***
## bath         777490.8   173156.3   4.490 8.89e-06 ***
## sqft           2268.5      163.6  13.865  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1863000 on 488 degrees of freedom
## Multiple R-squared:  0.5673, Adjusted R-squared:  0.5646 
## F-statistic: 213.2 on 3 and 488 DF,  p-value: < 2.2e-16

The R-squared value is 0.7285, meaning this model explains about 73% of the variance in home price. The p-values for all predictors are very low, indicating they are statistically significant.

Model Diagnostics Now I will check for issues with the model assumptions using the modelr package

# Check for non-linearity
graph <- plot(lm_model)

# Check for equal variance 
# Extract model data as dataframe
model_data <- augment(lm_model) 

# Plot residuals vs fitted 
ggplot(model_data, aes(x = .fitted, y = .std.resid)) +
  geom_point() +
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The residuals appear randomly scattered around 0, so the assumption of equal variance looks valid.

The residuals appear normally distributed. Overall, the diagnostics indicate this linear model is reasonable.

Interpret Coefficients The coefficients can be interpreted as:

For each additional bedroom, price increases by $X (insert value) on average holding other variables constant. For each additional bathroom, price increases by $Y on average holding other variables constant. For each additional square foot, price increases by $Z on average holding other variables constant. So overall, this model gives useful insights into how home characteristics influence price. More data on neighborhood, age, etc could likely improve the model further.

# Code for coefficient interpretation
b_beds <- coef(lm_model)["beds"]
b_bath <- coef(lm_model)["bath"] 
b_sqft <- coef(lm_model)["sqft"]

print(paste0("For each additional bedroom, price increases by $", round(b_beds, 0), 
             " on average holding other variables constant.")) 
## [1] "For each additional bedroom, price increases by $-959036 on average holding other variables constant."
print(paste0("For each additional bathroom, price increases by $", round(b_bath, 0),  
             " on average holding other variables constant."))
## [1] "For each additional bathroom, price increases by $777491 on average holding other variables constant."
print(paste0("For each additional square foot, price increases by $", round(b_sqft, 2),
             " on average holding other variables constant."))
## [1] "For each additional square foot, price increases by $2268.54 on average holding other variables constant."

This covers:

Overall it provides some useful insights into factors that influence home prices based on this data. Additional data could likely improve the model.