require(magrittr)
require(tidyverse)
require(lubridate)

Data

Data Query Summary Home sale and listing prices, and dates. Data were collected from zillow using manual data entry and a query filters of: 350k - 550k, single family or multifamily, and within the neighborhoods we are interested in buying.

Data Import and Cleanup

prices <- read_tsv("price_data_1.0.txt")
prices %<>%
  distinct() %>%
  mutate(date_listed = mdy(date_listed),
         date_sold = mdy(date_sold),
         days_listed = (date_sold - date_listed),
         sold_listed = sale_price / listing_price)

Results

Price vs Asking

What is the difference between the listed price and the sale price and are there any patterns in the data?

First, let’s make a simple histogram

ggplot(data = prices)+geom_histogram(aes(x = sold_listed))+
  geom_vline(aes(xintercept = mean(sold_listed)), color = "red", linetype = 2)+
  theme_bw()+xlab("Sold:Listed Price Ratio")

ggplot(data = prices, aes(sale_price, sold_listed))+geom_point()+geom_smooth(, method = "lm")+theme_classic()+ylab("Sold:Listed Price Ratio")+xlab("Sale Price")