Introduction:

My goal for this regression project is to predict the sale price of residential properties.

Thesis

I believe that home type, neighborhood, sale year, and number of bedrooms will significantly influence the sale price of a home.

Data Description:

The ames data set provides 80+ variables that impact sale price of homes.

Method:

I built a multiple linear regression model to predict sale_price using the variables: home type, neighborhood, sale year, and number of bedrooms

Key Findings:

ames <- read_csv("ames.csv")
## Rows: 2930 Columns: 79
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (46): MSSubClass, MSZoning, Street, Alley, LotShape, LandContour, Utilit...
## dbl (33): LotFrontage, LotArea, YearBuilt, YearRemodAdd, MasVnrArea, BsmtFin...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ames <- ames %>% 
  janitor::clean_names()


ames <- ames %>%
  mutate(one_family_house = if_else(bldg_type == "OneFam", 1, 0))
ames <- ames %>% 
  mutate(bedroom_capped = ifelse(bedroom_abv_gr > 6, 6, bedroom_abv_gr))



m <- lm(sale_price ~ one_family_house + year_sold + neighborhood + bedroom_capped, data = ames)
predicted <- predict(m, ames)
rmse <- sqrt(mean((predicted - ames$sale_price)^2))
rsq <- summary(m)$r.squared


ggplot(ames, aes(x = bedroom_capped, y = sale_price)) +
  geom_jitter() +
  geom_smooth() +
  labs(title = "Relationship between Sale Price and Number of Bedrooms")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Failed to fit group -1.
## Caused by error in `smooth.construct.cr.smooth.spec()`:
## ! x has insufficient unique values to support 10 knots: reduce k.