require(magrittr)
require(tidyverse)
require(lubridate)
Data Query Summary Home sale and listing prices, and dates. Data were collected from zillow using manual data entry and a query filters of: 350k - 550k, single family or multifamily, and within the neighborhoods we are interested in buying.
Data Import and Cleanup
prices <- read_tsv("price_data_1.0.txt")
prices %<>%
distinct() %>%
mutate(date_listed = mdy(date_listed),
date_sold = mdy(date_sold),
days_listed = (date_sold - date_listed),
sold_listed = sale_price / listing_price)
What is the difference between the listed price and the sale price and are there any patterns in the data?
First, let’s make a simple histogram
ggplot(data = prices)+geom_histogram(aes(x = sold_listed))+
geom_vline(aes(xintercept = mean(sold_listed)), color = "red", linetype = 2)+
theme_bw()+xlab("Sold:Listed Price Ratio")
ggplot(data = prices, aes(sale_price, sold_listed))+geom_point()+geom_smooth(, method = "lm")+theme_classic()+ylab("Sold:Listed Price Ratio")+xlab("Sale Price")