Packages

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)

Dataset

## # A tibble: 8,602 x 9
##    city     year month sales   volume median listings inventory  date
##    <chr>   <int> <int> <dbl>    <dbl>  <dbl>    <dbl>     <dbl> <dbl>
##  1 Abilene  2000     1    72  5380000  71400      701       6.3 2000 
##  2 Abilene  2000     2    98  6505000  58700      746       6.6 2000.
##  3 Abilene  2000     3   130  9285000  58100      784       6.8 2000.
##  4 Abilene  2000     4    98  9730000  68600      785       6.9 2000.
##  5 Abilene  2000     5   141 10590000  67300      794       6.8 2000.
##  6 Abilene  2000     6   156 13910000  66900      780       6.6 2000.
##  7 Abilene  2000     7   152 12635000  73500      742       6.2 2000.
##  8 Abilene  2000     8   131 10710000  75000      765       6.4 2001.
##  9 Abilene  2000     9   104  7615000  64500      771       6.5 2001.
## 10 Abilene  2000    10   101  7040000  59300      764       6.6 2001.
## # ... with 8,592 more rows
##      city                year          month            sales       
##  Length:8602        Min.   :2000   Min.   : 1.000   Min.   :   6.0  
##  Class :character   1st Qu.:2003   1st Qu.: 3.000   1st Qu.:  86.0  
##  Mode  :character   Median :2007   Median : 6.000   Median : 169.0  
##                     Mean   :2007   Mean   : 6.406   Mean   : 549.6  
##                     3rd Qu.:2011   3rd Qu.: 9.000   3rd Qu.: 467.0  
##                     Max.   :2015   Max.   :12.000   Max.   :8945.0  
##                                                     NA's   :568     
##      volume              median          listings       inventory     
##  Min.   :8.350e+05   Min.   : 50000   Min.   :    0   Min.   : 0.000  
##  1st Qu.:1.084e+07   1st Qu.:100000   1st Qu.:  682   1st Qu.: 4.900  
##  Median :2.299e+07   Median :123800   Median : 1283   Median : 6.200  
##  Mean   :1.069e+08   Mean   :128131   Mean   : 3217   Mean   : 7.175  
##  3rd Qu.:7.512e+07   3rd Qu.:150000   3rd Qu.: 2954   3rd Qu.: 8.150  
##  Max.   :2.568e+09   Max.   :304200   Max.   :43107   Max.   :55.900  
##  NA's   :568         NA's   :616      NA's   :1424    NA's   :1467    
##       date     
##  Min.   :2000  
##  1st Qu.:2004  
##  Median :2008  
##  Mean   :2008  
##  3rd Qu.:2012  
##  Max.   :2016  
## 

I will be using the texas housing dataset referenced in the slides for last week, which was published by the TAMU Real Estate Center. This dataset includes 9 variables - the City, the year & month, # of sales, volume of sales, the median sale amount, the # of listings, the amount of inventory and a full date field.

##  [1] 2000.000 2000.083 2000.167 2000.250 2000.333 2000.417 2000.500 2000.583
##  [9] 2000.667 2000.750

Our date variable is expressed as a decimal, we can convert this value in our dataset so that it is more interpretable to the user.

df$date <- date_decimal(df$date, tz = "UTC")

This conversion will help us interpret our model more effectively.

##      city                year          month       sales       
##  Length:322         Min.   :2015   Min.   :1   Min.   :  16.0  
##  Class :character   1st Qu.:2015   1st Qu.:2   1st Qu.:  91.0  
##  Mode  :character   Median :2015   Median :4   Median : 194.5  
##                     Mean   :2015   Mean   :4   Mean   : 658.6  
##                     3rd Qu.:2015   3rd Qu.:6   3rd Qu.: 523.0  
##                     Max.   :2015   Max.   :7   Max.   :8945.0  
##                                                NA's   :6       
##      volume              median          listings         inventory     
##  Min.   :2.516e+06   Min.   : 78800   Min.   :   83.0   Min.   : 0.800  
##  1st Qu.:1.533e+07   1st Qu.:135650   1st Qu.:  529.0   1st Qu.: 2.800  
##  Median :3.228e+07   Median :156750   Median :  947.5   Median : 4.350  
##  Mean   :1.713e+08   Mean   :168665   Mean   : 2055.4   Mean   : 5.553  
##  3rd Qu.:1.113e+08   3rd Qu.:198425   3rd Qu.: 1953.5   3rd Qu.: 7.300  
##  Max.   :2.568e+09   Max.   :304200   Max.   :23875.0   Max.   :20.100  
##  NA's   :6           NA's   :6        NA's   :8         NA's   :8       
##       date                    
##  Min.   :2015-01-01 00:00:00  
##  1st Qu.:2015-01-31 10:00:00  
##  Median :2015-04-02 06:00:00  
##  Mean   :2015-04-02 06:00:00  
##  3rd Qu.:2015-06-02 02:00:00  
##  Max.   :2015-07-02 12:00:00  
## 

Plotting w/ GGPlot

The graphic below summarizes the data by month for the median value of a listing vs.ย the total number of listings by month. This graphic could be improved by adding color coding for the region that a given city in the dataset belongs to, as there are too many individual city records to meaningfully discern the impact of cities on the variables plotted in the graphic.

ggplot(data = df2, aes(listings, median)) +
  geom_point() +
  facet_wrap(~month) +
  ggtitle("Listings vs. Median Value by Month",
          "Real Estate Data from 2016 only from TAMA Real Estate Center")