The goal of this project was to create a “heatmap” of price per square foot living space in the King County area and create a predictive multiple regression model for explaining price based off of a number of different variables. The data for these sales comes from the official public records of home sales in the King County area.

Here is a greatly detailed list of King County Housing Sales variables and what they represent: http://your.kingcounty.gov/assessor/eRealProperty/ResGlossaryOfTerms.html

OR for quick reference you can here are the variables used by their column name with a brief description:

~ Initial Variables


id - Unique ID for each home sold
date - Date of the home sale
price - Price of each home sold
bedrooms - Number of bedrooms
bathrooms - Number of bathrooms, where .5 accounts for a room with a toilet but no shower
sqft_living - Square footage of the apartments interior living space
sqft_lot - Square footage of the land space
floors - Number of floors
waterfront - A dummy variable for whether the apartment was overlooking the waterfront or not 1’s represent a waterfront property, 0’s represent a non-waterfront property
view - An index from 0 to 4 of how good the view of the property was, 0 - lowest, 4 - highest
condition - An index from 1 to 5 on the condition of the apartment, 1 - lowest, 4 - highest
grade - An index from 1 to 13, where 1-3 falls short of building construction and design, 7 has an average level of construction and design, and 11-13 have a high quality level of construction and design.
sqft_above - The square footage of the interior housing space that is above ground level
sqft_basement - The square footage of the interior housing space that is below ground level
yr_built - The year the house was initially built
yr_renovated - The year of the house’s last renovation
zipcode - What zipcode area the house is in
lat - Lattitude
long - Longitude
sqft_living15 - The square footage of interior housing living space for the nearest 15 neighbors
sqft_lot15 - The square footage of the land lots of the nearest 15 neighbors

~ Variables I created


Bedr_bathr_ratio - # of bedrooms / # bathrooms
log_sqft_living - Logarithm base 10 of square foot living space
log_bedbathratio - Logarithm base 10 of the bedr_bathr_ratio
rennovate_index - If the house was renovated : The difference between the year 2015 and the year the house was last rennovated. If it was never renovated : The difference between the year 2015 and the year it was built
log_sqftliving15 - Logarithm base 10 of the square foot living spaces of the 15 nearest neighbors
left_date - Date without the useless T00000 appendage
log_total_sqft - Logarithm base 10 of the sum of sqft living and sqft lot
Geo_dist_index - to show how sales price is generally much higher when its within Seattle city limits and lower when its not, based off lattitude and longitude from just South of South Lake Union.

~

First the heatmap:

The purpose of the “heatmap” is to show “hotspots” where the price per square foot of living is quite high based off of its geographical longitude and lattitude. This way we can look at what areas are more and less affordable to live in.

Here are the libraries you should load up and the assigned csv file itself:

library(ggplot2)
library(maps)
## 
##  # ATTENTION: maps v3.0 has an updated 'world' map.        #
##  # Many country borders and names have changed since 1990. #
##  # Type '?world' or 'news(package="maps")'. See README_v3. #
library(rbokeh)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
hspricing <- read.csv("KC_home_sales.csv", header = TRUE)

And here we assign price_per_sqft_living as a new column in our data frame. Then create the heatmap using longitude and lattitude for looking at price_per_sqft_living.

hspricing$price_per_sqft_living <- (hspricing$price/hspricing$sqft_living)

price_per_sqft_living_lat_long_map <- gmap(lat = mean(hspricing$lat), lng = mean(hspricing$long), 
                                           zoom = 11,
                                           width = 680, height = 600) %>%
  ly_points(hspricing$long, hspricing$lat, data = hspricing, 
            color = hspricing$price_per_sqft_living, alpha(1)
  )

The darker shaded areas are the more expensive square foot values, the lighter shaded areas are the less expensive square foot values.

It’s noteworthy that Bellevue has the largest cluster of expensive square foot living space prices and there are some smaller clusters of expensive square foot living space prices in Upper Queen Anne, Vashon Island, and Kirkland near the water front.

Next, the model for housing prices is going to be created.

Before it’s created we should look through the normal quantile plots to assess the normality of our independent variables. Normal quantile plots should be roughly linear if these variables are normally distributed. It plots the expected vs. actual values for a given independent variable.

Here’s the code:

par(mfrow=c(2,3))
for (i in 3:8){
  qqnorm(hspricing[,i], xlab=colnames(hspricing[i]), col="blue")
}

par(mfrow=c(2,3))
for (i in 9:14){
  qqnorm(hspricing[,i], xlab=colnames(hspricing[i]), col="green")
}

par(mfrow=c(2,3))
for (i in 15:20){
  qqnorm(hspricing[,i], xlab=colnames(hspricing[i]), col="black")
}

Then a multiple regression model was formed to model housing prices. The model shows housing prices as explained by log_sqft_living, Left_date, log_bedbathratio, (sqft_above/sqft_basement), grade, waterfront, log_sqftliving15, geo_dist_index, and rennovate_index.

fit <- lm(hspricing$price~hspricing$log_sqft_living+ hspricing$Left_date +
            hspricing$log_bedbathratio +(hspricing$sqft_above/hspricing$sqft_basement) +
            hspricing$grade+hspricing$waterfront+hspricing$log_sqftliving15 +
            hspricing$Geo_dist_index+ hspricing$rennovate_index)

Which out of many models that I explored, tended to have the highest corelation coefficient of .853

## 
## Call:
## lm(formula = hspricing$price ~ hspricing$log_sqft_living + hspricing$Left_date + 
##     hspricing$log_bedbathratio + (hspricing$sqft_above/hspricing$sqft_basement) + 
##     hspricing$grade + hspricing$waterfront + hspricing$log_sqftliving15 + 
##     hspricing$Geo_dist_index + hspricing$rennovate_index)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1878526   -99998   -12014    81868  2690036 
## 
## Coefficients:
##                                                Estimate Std. Error t value
## (Intercept)                                  -5.285e+07  5.957e+06  -8.872
## hspricing$log_sqft_living                    -4.307e+05  1.997e+04 -21.562
## hspricing$Left_date                           2.667e+00  2.957e-01   9.020
## hspricing$log_bedbathratio-0.026328939       -8.699e+05  1.012e+05  -8.596
## hspricing$log_bedbathratio-0.034762106       -1.060e+06  9.759e+04 -10.857
## hspricing$log_bedbathratio-0.041392685       -3.500e+05  1.360e+05  -2.573
## hspricing$log_bedbathratio-0.051152522       -1.055e+06  9.783e+04 -10.782
## hspricing$log_bedbathratio-0.057991947       -5.496e+06  2.280e+05 -24.101
## hspricing$log_bedbathratio-0.06069784        -5.237e+05  1.678e+05  -3.122
## hspricing$log_bedbathratio-0.06694679        -1.050e+06  9.789e+04 -10.728
## hspricing$log_bedbathratio-0.074633618       -7.444e+05  1.208e+05  -6.164
## hspricing$log_bedbathratio-0.079181246       -8.096e+05  1.363e+05  -5.941
## hspricing$log_bedbathratio-0.096910013       -1.044e+06  9.745e+04 -10.711
## hspricing$log_bedbathratio-0.111150452        2.376e+06  2.159e+05  11.005
## hspricing$log_bedbathratio-0.113943352       -9.325e+05  2.153e+05  -4.332
## hspricing$log_bedbathratio-0.118099312       -9.234e+05  1.291e+05  -7.151
## hspricing$log_bedbathratio-0.124938737       -8.389e+05  1.111e+05  -7.549
## hspricing$log_bedbathratio-0.130333768       -1.088e+06  2.181e+05  -4.987
## hspricing$log_bedbathratio-0.138302698       -1.003e+06  1.038e+05  -9.665
## hspricing$log_bedbathratio-0.151267675       -7.100e+05  1.243e+05  -5.710
## hspricing$log_bedbathratio-0.157607853       -1.397e+06  2.152e+05  -6.492
## hspricing$log_bedbathratio-0.176091259       -1.047e+06  1.027e+05 -10.193
## hspricing$log_bedbathratio-0.210853365       -1.146e+06  1.182e+05  -9.699
## hspricing$log_bedbathratio-0.243038049       -9.633e+05  1.295e+05  -7.440
## hspricing$log_bedbathratio-0.301029996       -1.151e+06  1.246e+05  -9.238
## hspricing$log_bedbathratio-0.352182518       -1.098e+06  1.364e+05  -8.048
## hspricing$log_bedbathratio-0.397940009       -9.940e+05  1.669e+05  -5.955
## hspricing$log_bedbathratio#DIV/0!            -9.591e+05  1.127e+05  -8.507
## hspricing$log_bedbathratio#NUM!              -1.043e+06  1.246e+05  -8.370
## hspricing$log_bedbathratio0                  -1.049e+06  9.691e+04 -10.826
## hspricing$log_bedbathratio0.015794267        -3.063e+06  2.157e+05 -14.202
## hspricing$log_bedbathratio0.022276395        -7.866e+05  1.123e+05  -7.003
## hspricing$log_bedbathratio0.028028724        -8.954e+05  9.885e+04  -9.058
## hspricing$log_bedbathratio0.037788561        -1.067e+06  9.734e+04 -10.964
## hspricing$log_bedbathratio0.045757491        -9.888e+05  1.016e+05  -9.732
## hspricing$log_bedbathratio0.057991947        -1.050e+06  9.683e+04 -10.839
## hspricing$log_bedbathratio0.070581074        -1.058e+06  1.037e+05 -10.201
## hspricing$log_bedbathratio0.079181246        -1.082e+06  9.674e+04 -11.187
## hspricing$log_bedbathratio0.085430195        -1.418e+06  2.152e+05  -6.590
## hspricing$log_bedbathratio0.09017663         -9.755e+05  9.722e+04 -10.033
## hspricing$log_bedbathratio0.096910013        -9.707e+05  1.003e+05  -9.681
## hspricing$log_bedbathratio0.101457641        -6.898e+05  1.470e+05  -4.694
## hspricing$log_bedbathratio0.104735351        -2.005e+06  2.153e+05  -9.313
## hspricing$log_bedbathratio0.124938737        -1.073e+06  9.676e+04 -11.085
## hspricing$log_bedbathratio0.14976232         -1.260e+06  1.179e+05 -10.688
## hspricing$log_bedbathratio0.15490196         -1.169e+06  9.756e+04 -11.982
## hspricing$log_bedbathratio0.162727297        -1.088e+06  9.693e+04 -11.224
## hspricing$log_bedbathratio0.16840443         -9.241e+05  1.666e+05  -5.545
## hspricing$log_bedbathratio0.176091259        -1.056e+06  9.689e+04 -10.894
## hspricing$log_bedbathratio0.187086643        -1.058e+06  9.793e+04 -10.807
## hspricing$log_bedbathratio0.191885526        -9.309e+05  1.470e+05  -6.333
## hspricing$log_bedbathratio0.204119983        -1.118e+06  9.672e+04 -11.555
## hspricing$log_bedbathratio0.21670911         -6.403e+05  1.667e+05  -3.841
## hspricing$log_bedbathratio0.22184875         -1.081e+06  9.777e+04 -11.059
## hspricing$log_bedbathratio0.234083206        -1.068e+06  9.683e+04 -11.029
## hspricing$log_bedbathratio0.243038049        -1.146e+06  1.293e+05  -8.864
## hspricing$log_bedbathratio0.249877473        -1.088e+06  9.695e+04 -11.227
## hspricing$log_bedbathratio0.259637311        -1.103e+06  9.752e+04 -11.314
## hspricing$log_bedbathratio0.266267889        -1.240e+06  1.113e+05 -11.141
## hspricing$log_bedbathratio0.271066772        -1.319e+06  1.667e+05  -7.912
## hspricing$log_bedbathratio0.279840697        -1.122e+06  2.151e+05  -5.216
## hspricing$log_bedbathratio0.301029996        -1.100e+06  9.676e+04 -11.372
## hspricing$log_bedbathratio0.329058719        -1.109e+06  2.152e+05  -5.154
## hspricing$log_bedbathratio0.333214679        -1.686e+06  2.152e+05  -7.834
## hspricing$log_bedbathratio0.338818557        -1.123e+06  1.026e+05 -10.949
## hspricing$log_bedbathratio0.346787486        -1.082e+06  9.830e+04 -11.010
## hspricing$log_bedbathratio0.352182518        -1.358e+06  2.152e+05  -6.313
## hspricing$log_bedbathratio0.359021943        -1.088e+06  9.701e+04 -11.211
## hspricing$log_bedbathratio0.367976785        -8.286e+05  1.471e+05  -5.632
## hspricing$log_bedbathratio0.380211242        -1.185e+06  1.023e+05 -11.583
## hspricing$log_bedbathratio0.391206626        -1.639e+06  2.151e+05  -7.617
## hspricing$log_bedbathratio0.397940009        -1.122e+06  9.846e+04 -11.395
## hspricing$log_bedbathratio0.405765346        -1.177e+06  1.473e+05  -7.990
## hspricing$log_bedbathratio0.425968732        -1.108e+06  9.736e+04 -11.378
## hspricing$log_bedbathratio0.447158031        -1.166e+06  1.669e+05  -6.990
## hspricing$log_bedbathratio0.455931956        -1.120e+06  9.816e+04 -11.414
## hspricing$log_bedbathratio0.463757293        -9.866e+05  1.472e+05  -6.701
## hspricing$log_bedbathratio0.477121255        -1.097e+06  9.684e+04 -11.329
## hspricing$log_bedbathratio0.492915522        -1.116e+06  1.472e+05  -7.580
## hspricing$log_bedbathratio0.505149978        -9.925e+05  2.153e+05  -4.609
## hspricing$log_bedbathratio0.522878745        -1.123e+06  1.006e+05 -11.168
## hspricing$log_bedbathratio0.535113202        -1.139e+06  1.080e+05 -10.539
## hspricing$log_bedbathratio0.56427143         -1.204e+06  2.152e+05  -5.593
## hspricing$log_bedbathratio0.602059991        -1.115e+06  9.729e+04 -11.457
## hspricing$log_bedbathratio0.669006781        -1.270e+06  1.668e+05  -7.610
## hspricing$log_bedbathratio0.698970004        -1.117e+06  1.010e+05 -11.058
## hspricing$log_bedbathratio0.726998728        -1.463e+06  1.669e+05  -8.767
## hspricing$log_bedbathratio0.77815125         -9.953e+05  1.246e+05  -7.987
## hspricing$log_bedbathratio0.84509804         -1.307e+06  2.153e+05  -6.072
## hspricing$log_bedbathratio0.903089987        -1.188e+06  2.153e+05  -5.519
## hspricing$log_bedbathratio1.275475891        -8.886e+05  2.153e+05  -4.127
## hspricing$sqft_above                          2.183e+02  3.989e+00  54.723
## hspricing$grade                               9.138e+04  2.061e+03  44.335
## hspricing$waterfront                          6.875e+05  1.549e+04  44.372
## hspricing$log_sqftliving15                    3.353e+05  1.507e+04  22.249
## hspricing$Geo_dist_index                     -5.887e+05  9.541e+03 -61.705
## hspricing$rennovate_index                     1.489e+03  6.372e+01  23.374
## hspricing$sqft_above:hspricing$sqft_basement  1.161e-01  1.998e-03  58.104
##                                              Pr(>|t|)    
## (Intercept)                                   < 2e-16 ***
## hspricing$log_sqft_living                     < 2e-16 ***
## hspricing$Left_date                           < 2e-16 ***
## hspricing$log_bedbathratio-0.026328939        < 2e-16 ***
## hspricing$log_bedbathratio-0.034762106        < 2e-16 ***
## hspricing$log_bedbathratio-0.041392685       0.010093 *  
## hspricing$log_bedbathratio-0.051152522        < 2e-16 ***
## hspricing$log_bedbathratio-0.057991947        < 2e-16 ***
## hspricing$log_bedbathratio-0.06069784        0.001802 ** 
## hspricing$log_bedbathratio-0.06694679         < 2e-16 ***
## hspricing$log_bedbathratio-0.074633618       7.20e-10 ***
## hspricing$log_bedbathratio-0.079181246       2.88e-09 ***
## hspricing$log_bedbathratio-0.096910013        < 2e-16 ***
## hspricing$log_bedbathratio-0.111150452        < 2e-16 ***
## hspricing$log_bedbathratio-0.113943352       1.48e-05 ***
## hspricing$log_bedbathratio-0.118099312       8.92e-13 ***
## hspricing$log_bedbathratio-0.124938737       4.54e-14 ***
## hspricing$log_bedbathratio-0.130333768       6.17e-07 ***
## hspricing$log_bedbathratio-0.138302698        < 2e-16 ***
## hspricing$log_bedbathratio-0.151267675       1.14e-08 ***
## hspricing$log_bedbathratio-0.157607853       8.66e-11 ***
## hspricing$log_bedbathratio-0.176091259        < 2e-16 ***
## hspricing$log_bedbathratio-0.210853365        < 2e-16 ***
## hspricing$log_bedbathratio-0.243038049       1.05e-13 ***
## hspricing$log_bedbathratio-0.301029996        < 2e-16 ***
## hspricing$log_bedbathratio-0.352182518       8.83e-16 ***
## hspricing$log_bedbathratio-0.397940009       2.64e-09 ***
## hspricing$log_bedbathratio#DIV/0!             < 2e-16 ***
## hspricing$log_bedbathratio#NUM!               < 2e-16 ***
## hspricing$log_bedbathratio0                   < 2e-16 ***
## hspricing$log_bedbathratio0.015794267         < 2e-16 ***
## hspricing$log_bedbathratio0.022276395        2.58e-12 ***
## hspricing$log_bedbathratio0.028028724         < 2e-16 ***
## hspricing$log_bedbathratio0.037788561         < 2e-16 ***
## hspricing$log_bedbathratio0.045757491         < 2e-16 ***
## hspricing$log_bedbathratio0.057991947         < 2e-16 ***
## hspricing$log_bedbathratio0.070581074         < 2e-16 ***
## hspricing$log_bedbathratio0.079181246         < 2e-16 ***
## hspricing$log_bedbathratio0.085430195        4.51e-11 ***
## hspricing$log_bedbathratio0.09017663          < 2e-16 ***
## hspricing$log_bedbathratio0.096910013         < 2e-16 ***
## hspricing$log_bedbathratio0.101457641        2.70e-06 ***
## hspricing$log_bedbathratio0.104735351         < 2e-16 ***
## hspricing$log_bedbathratio0.124938737         < 2e-16 ***
## hspricing$log_bedbathratio0.14976232          < 2e-16 ***
## hspricing$log_bedbathratio0.15490196          < 2e-16 ***
## hspricing$log_bedbathratio0.162727297         < 2e-16 ***
## hspricing$log_bedbathratio0.16840443         2.97e-08 ***
## hspricing$log_bedbathratio0.176091259         < 2e-16 ***
## hspricing$log_bedbathratio0.187086643         < 2e-16 ***
## hspricing$log_bedbathratio0.191885526        2.46e-10 ***
## hspricing$log_bedbathratio0.204119983         < 2e-16 ***
## hspricing$log_bedbathratio0.21670911         0.000123 ***
## hspricing$log_bedbathratio0.22184875          < 2e-16 ***
## hspricing$log_bedbathratio0.234083206         < 2e-16 ***
## hspricing$log_bedbathratio0.243038049         < 2e-16 ***
## hspricing$log_bedbathratio0.249877473         < 2e-16 ***
## hspricing$log_bedbathratio0.259637311         < 2e-16 ***
## hspricing$log_bedbathratio0.266267889         < 2e-16 ***
## hspricing$log_bedbathratio0.271066772        2.64e-15 ***
## hspricing$log_bedbathratio0.279840697        1.84e-07 ***
## hspricing$log_bedbathratio0.301029996         < 2e-16 ***
## hspricing$log_bedbathratio0.329058719        2.58e-07 ***
## hspricing$log_bedbathratio0.333214679        4.95e-15 ***
## hspricing$log_bedbathratio0.338818557         < 2e-16 ***
## hspricing$log_bedbathratio0.346787486         < 2e-16 ***
## hspricing$log_bedbathratio0.352182518        2.78e-10 ***
## hspricing$log_bedbathratio0.359021943         < 2e-16 ***
## hspricing$log_bedbathratio0.367976785        1.81e-08 ***
## hspricing$log_bedbathratio0.380211242         < 2e-16 ***
## hspricing$log_bedbathratio0.391206626        2.70e-14 ***
## hspricing$log_bedbathratio0.397940009         < 2e-16 ***
## hspricing$log_bedbathratio0.405765346        1.42e-15 ***
## hspricing$log_bedbathratio0.425968732         < 2e-16 ***
## hspricing$log_bedbathratio0.447158031        2.84e-12 ***
## hspricing$log_bedbathratio0.455931956         < 2e-16 ***
## hspricing$log_bedbathratio0.463757293        2.12e-11 ***
## hspricing$log_bedbathratio0.477121255         < 2e-16 ***
## hspricing$log_bedbathratio0.492915522        3.59e-14 ***
## hspricing$log_bedbathratio0.505149978        4.08e-06 ***
## hspricing$log_bedbathratio0.522878745         < 2e-16 ***
## hspricing$log_bedbathratio0.535113202         < 2e-16 ***
## hspricing$log_bedbathratio0.56427143         2.27e-08 ***
## hspricing$log_bedbathratio0.602059991         < 2e-16 ***
## hspricing$log_bedbathratio0.669006781        2.84e-14 ***
## hspricing$log_bedbathratio0.698970004         < 2e-16 ***
## hspricing$log_bedbathratio0.726998728         < 2e-16 ***
## hspricing$log_bedbathratio0.77815125         1.45e-15 ***
## hspricing$log_bedbathratio0.84509804         1.28e-09 ***
## hspricing$log_bedbathratio0.903089987        3.44e-08 ***
## hspricing$log_bedbathratio1.275475891        3.68e-05 ***
## hspricing$sqft_above                          < 2e-16 ***
## hspricing$grade                               < 2e-16 ***
## hspricing$waterfront                          < 2e-16 ***
## hspricing$log_sqftliving15                    < 2e-16 ***
## hspricing$Geo_dist_index                      < 2e-16 ***
## hspricing$rennovate_index                     < 2e-16 ***
## hspricing$sqft_above:hspricing$sqft_basement  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 192300 on 21515 degrees of freedom
## Multiple R-squared:  0.7272, Adjusted R-squared:  0.7259 
## F-statistic: 591.2 on 97 and 21515 DF,  p-value: < 2.2e-16