The data set contains 2,928 observations of 8 variables related to MLS housing listings across 4 ZIP Codes. For the analysis, we will assume the data were not generated/collected via a random process. Prior to evaluating the data, I closely reviewed the data and didn’t see any obvious errors (NA’s, Infinite numbers, divergent trends, etc). In preparing the data for analysis, I made 10 transformations.
1. day — Day of week (Factor with 7 levels)
2. month — Month of year (Factor with 12 levels)
3. qtr - Quarter of year (Factor with 4 levels)
Note: Quarter is done evenly with 732 observaations in each quarter beginning 1/7/17, NOT purely by date
4. mls_rs_listing - Reduced mls_listings by 10% based on St. Louis FED data of New to Resale ratio (Integer)
5. pct_contracts_total - MLS Contracts as a percent of MLS Listings (Numeric)
6. pct_contracts_rs - MLS Contracts as a percent of estimated MLS Resale Listings (Numeric)
7. od_pct_contracts - OD Contracts as a percent of OD Listings (Numeric)
8. od_visits_listing - Number of total OD visits divided by number of OD Listings (Numeric)
9. od_contracts_visit - Number of OD contracts per number of OD home visits (Numeric)
10. od_mls_listing_ratio - Ratio of OD listings to MLS listings, expressed as decimal (Numeric)
We’ve given you disguised data for the MLS (market) and Opendoor. The spreadsheet includes counts of active listings, resale contracts, and related data. How enthusiastic or worried are you about Opendoor’s resale performance in 2017?
To evaluate Opendoor’s performance, we first need to establish how Opendoor’s core business model operates. Opendoor is a two-sided marketplace. It’s supply side allows consumers to sell their existing home for a ~6% fee without the hassle/delay of working with a traditional agent. Opendoor typically will make modest improvements (to boost home value) before listing the purchased home for resale. It’s demand side allows consumers/realtors to view Opendoor home listings in an app/browser and purchase homes they are interested in. For this question, we will ignore any of the potential business line extensions (financing, services, etc) and focus only on the marketplace dynamics.
We will evaluate Opendoor’s 2017 performance at both aggregate and cohort levels.
Listing Growth: Across all ZIP codes and price bands, Opendoor listings contracted 45% from 430 listings on 1/7/17 to 235 listings on 1/7/18. Over the same one year period, MLS listings contracted just 22% from 4,364 to 3,385. While this relative decline is unlikely a sign of overperformance, it’s not conclusively underperformance. The decline could also be attributable to a strategic decision.
Visitor Growth: Across all ZIP codes and price bands, Opendoor home visits dropped 25% from 2,556 visits on 1/7/17 to 1905 listings on 1/7/18. Over the same period, average visitors/listing increased from 5.9 to 8.1 (37%). Part of this was likely due to a reduction in OD housing listings, concentrating potential buyers in fewer homes. However, it’s possible this indicates Opendoor is selecting more attractive offerings.
Contract Conversion: Across all ZIP codes and price bands, Opendoor’s median contract conversion rate is 2.94% versus the overall MLS rate of 3.22%. This 8% delta is fairly small, especially considering the 11.5x difference in sample size (OD contracts: 3,668 vs. MLS contracts: 42,364).
Splitting the data by ZIP Code (4 levels), Price Level (2 levels), and Quarter (4 levels) reveals interesting trends. This split implies 32 distinct cohorts.
Listing Growth: Between 1/7/17 and 1/7/18, Opendoor experienced significant listing growth (~100%) in listings <$200K. This trend holds across all ZIP Codes. In contrast, overall MLS listing count decreased considerably (~45%) over the same period for listings <$200k (again, similar across all ZIP codes). Conversely, for listings $200K+, the opposite trend holds. Opendoor’s aggregate $200+ listing count plummeted 78% over the year. Meanwhile, MLS listings grew modestly for the first three quarters of the year, before dropping off in Q4. Overall, there was a 10% reduction in MLS listing volume from the start of the year to the end. Opendoor’s listings decline accelerated in Q4 too, implying this trend was likely market driven. Finally, the ratio of Opendoor listings to total MLS listings as a percent varied dramatically across the two price levels. In 2017, Opendoor experienced 4x relative growth in <$200K vs. 3x relative contraction in $200K+ (again, trend holds across all ZIP codes). By the end of the year, Opendoor comprised 20% of the <$250K market. The 5 graphs below help visualize these trends. The graphs do not appear to show significant seasonality, except for the dip around the holidays. All 4 listing charts show an additional dip around Thanksgiving that lasts through New Years, before turning sharply up again. This seems normal and reflects real world dynamics.
ggp.un +
geom_jitter(aes(color=zip_code,y=mls_listings))+
labs(x = "Time", y = "Listing Count", title = "MLS Listing Count: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp.un +
geom_jitter(aes(color=zip_code,y=od_listings))+
labs(x = "Time", y = "Listing Count", title = "OD Listing Count: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp.ov +
geom_jitter(aes(color=zip_code,y=mls_listings))+
labs(x = "Time", y = "Listing Count", title = "MLS Listing Count: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp.ov +
geom_jitter(aes(color=zip_code,y=od_listings))+
labs(x = "Time", y = "Listing Count", title = "OD Listing Count: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp <- ggplot(data.frame(data), aes(date,100*od_mls_listing_ratio))
ggp +
geom_point(aes(color = price_band))+
labs(x = "Time", y = "Percent", title = "OD:MLS Listing Ratio (All Zip Codes)")+
theme(axis.text.x=element_blank())
Visitor Growth: Visitor growth per listing trends were consistent across all ZIP codes. At an aggregate level, visitors per listing grew 37% over the year. However, the trends varied substantially between the two price levels. Visitors per <$200K listing fell 25% over the course of the year. Meanwhile, for listings $200K+, visits per listing grew over 60% during the year (from 5.2 to 8.5). In aggregate, the total number of daily visits divided by the total number of daily listings were 9.6 and 6.4, respectively, for <$200K and $200K+. Over the course of the year, it took, in aggregate and on average, 187 visits to generate a contract for a <$200K listing and 298 visits to generate a contract for a $200K+ listing. The two graphs below help visualize these trends. Furthermore, comparing these graphs to the listing graphs above shows a high degree of (inverse) correlation between visitors and listings. Given there is likely a reasonably stable supply of potential home buyers in the area, it’s likely that as the number of homes decreases the number of visits increases as home buyers are either spread less thin or have less options to filter on. Conversely, with many offerings, the visits likely get spread across a larger base.
ggp.un +
geom_jitter(aes(color=zip_code,y=od_visits_listing))+
labs(x = "Time", y = "Count", title = "Visits Per Listing: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp.ov +
geom_jitter(aes(color=zip_code,y=od_visits_listing))+
labs(x = "Time", y = "Count", title = "Visits Per Listing: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
Contract Conversion: Looking at aggregate performance across each of the 4 quarters, Opendoor’s median contract conversion rate performance was on par with the overall MLS market. Opendoor lagged by 1% point in Q1, exceed by 0.25% point in Q2, and was virtually identical in Q3 and Q4. Across all 32 cohorts, Opendoor outperformed MLS in contract conversion rate 19 times (out of 32 possible). On an annual basis, among the 8 cohorts (4 ZIPs * 2 Prices), Opendoor outperformed 4 times and underperformed 4 times. Opendoor was more likely to outperform in the <$200K price level than in the $200K+ one (3 of it’s 4 outperforms were in <$200K). Opendoor is decidely better than MLS average in ZIP Code A and decidely worse in ZIP Code B. Opendoor tends to outperofrm in ZIP C and D, except for Q1 in <$200K and in Q4 for $200K+, when data are too thin to properly assess.
The two graphs below show the difference in OD vs. MLS contract conversion rate. The black line is 0, so anything above shows Opendoor converting better than MLS. The left scale is percentage points.
## <$200K subsection
ggp.cohorts +
geom_point(aes(y=100*A..200K), color = "#F8766D")+
geom_line(aes(y=100*A..200K), color = "#F8766D")+
geom_point(aes(y=100*B..200K), color = "#7CAE00")+
geom_line(aes(y=100*B..200K), color = "#7CAE00")+
geom_point(aes(y=100*C..200K), color = "#00CFB4")+
geom_line(aes(y=100*C..200K), color = "#00CFB4")+
geom_point(aes(y=100*D..200K), color = "#C77CFF")+
geom_line(aes(y=100*D..200K), color = "#C77CFF")+
geom_line(aes(y=0), color="Black")+
theme(legend.position = "right")+
labs(x = "Quarter", y = "Difference: Percentage Points", title = "Difference in OD:MLS Contract/Listing Win Rate: <$200K, By Quarter")
## $200K+ subsection
ggp.cohorts +
geom_point(aes(y=100*A.200K.), color = "#F8766D")+
geom_line(aes(y=100*A.200K.), color = "#F8766D")+
geom_point(aes(y=100*B.200K.), color = "#7CAE00")+
geom_line(aes(y=100*B.200K.), color = "#7CAE00")+
geom_point(aes(y=100*C.200K.), color = "#00CFB4")+
geom_line(aes(y=100*C.200K.), color = "#00CFB4")+
geom_point(aes(y=100*D.200K.), color = "#C77CFF")+
geom_line(aes(y=100*D.200K.), color = "#C77CFF")+
geom_line(aes(y=0), color = "Black")+
labs(x = "Quarter", y = "Difference: Percentage Points", title = "Difference in OD:MLS Contract/Listing Win Rate: $200K+, By Quarter")
At an aggregate level, it’s inconclusive if Opendoor is under or overperforming. Listings are down 45% and visits are down 25%, yet visits/listing are up 37% and average conversion rate difference is only marginally lower (8%). If forced to make a conclusion on this data alone, I’d lean toward underperformance.
Looking at the cohort level data provides greater clarity. Opendoor overperformed in the <$200K category, growing listings nearly 2x over the year while only letting visitors/listing slide 25%. While it’s contract conversion rate steadily decreased over the year (potentially due to increasing supply), it still maintained a higher median conversion rate than the overall MLS market rate. Opendoor’s performance in $200K+ underperformed slightly over the year. Listings shrunk 78% and median contract conversion rate only increased in ZIPs A and B. The median contract conversion rate for ZIPs C and D fell to 0% as homes virtually stopped selling (just 13 contracts between the 2 ZIPs in Q4.) Note: I purposefully used median for contract conversion rate for just this reason. Using average provides “better” results, but on such thin data, it feels misleading. The reality is on most days in Q4 Opendoor is not selling $200K+ homes in ZIPs C and D.
Based on what you learned in Question 1, what do you hypothesize is driving any over/underperformance trends? For each hypothesis, what additional data would you request and what analyses would you run to validate?
Hypothesis #1: Opendoor is bidding differently on <$200K and $200K+ Homes: An intriguing data point is the ratio of $200K+ listings to <$200K listings. Opendoor begins 2017 with a ratio of 4:1 (4 $200K : 1 <$200K) while general MLS listings are at 2:1. During Q1, Opendoor accelerates to a peak of 8:1, before precipitously declining to a ratio of 1:2 at the end of the year (a factor of 16x)! In comparison, MLS listings grow from 2:1 at the beginning of the sample set to a peak of 4:1 at mid-year, before settling at 3.25:1. Again, given the sample sizes are highly skewed (11.5:1), it’s not surprising to see a larger swing in Opendoor’s range. However, such a reversal likely implies a trend worth examining. My hypothesis is that Opendoor sees an economic opportunity in homes <$200K. Homes <$200k comprise ~30% of the overall market (similar across all ZIPs), naturally implying the median home price is above $250K. Perhaps Opendoor has developed a strategy for increasing the value of homes <$250K more substantially than the median home. If this is the case, Opendoor may be focusing on the low end of the market where it sees a greater opportunity for gross profit.
To confirm this hypothesis, I’d request the following data:
- Median home price across all ZIPs
- Actual pricing data for all Opendoor home purchases
- Actual pricing data for all Opendoor home sales
- Actual home data (Beds, baths, sq. ft, year built, etc)
- Actual cost data for anything involved in the remodel/sale (to establish Gross Profit)
- Actual visitor data at housing unit level, not just daily level
- Actual time on market for each listing (both OD and MLS)
- Overall Rate of Opendoor offer acceptance/decline for homes <$250K vs. homes $250K+
- Delta of Internal Fair Market Valuation metrics and advertised offer to potential home sellers
- Estimates on adjacent home price // historical data on adjacent home sales
- Aggregate Opendoor financial data for the 4 ZIP Codes
From here, I’d run an analysis to see if Opendoor is in fact creating more gross profit/home in the <$250K bracket. I’d construct unit economics on an individual housing level and then calculate IRR based on timing information related to the sale. I’d also be curious to understand the bid acceptance rate for these cheaper homes – potentially there is more customer appetite to sell at a discount. So If my prediction is right, the unit economics will be higher, and the time on market will be lower for these improved Opendoor listings. I’d be curious to check if the price of surrounding homes was predictive on eventual sale price after remodel. Perhaps light facade work on a worn house on a nice street/between nicer homes has the ability to create excess gross profit (or not!?). Either way, this data seems to imply a systematic strategy at the low end.
par(mfrow=c(1,2))
## Opendoor
ratio_200K.od <- ov.data$od_listings/un.data$od_listings
plot(ratio_200K.od,main = "OD List: Ratio $200K+/<$200K ", xlab = "Time", ylab = "Ratio", col = "dodgerblue2")
## MLS
ratio_200K.mls <- ov.data$mls_listings/un.data$mls_listings
plot(ratio_200K.mls,main = "MLS List: Ratio $200K+/<$200K ", xlab = "Time", ylab = "Ratio", col = "grey75")
Hypothesis #2: Opendoor is preferred among younger buyers & first time buyers/sellers: The transition from higher priced inventory to lower priced inventory may be reflective of Opendoor user preferences. As a tech product, potentially Opendoor is more popular and trusted among a younger audience. Generally speaking, younger buyers are earlier in their career with less amassed savings/wealth. If Opendoor is indeed popular with first time buyers/sellers, then it would makes sense to cater to the lower end of the market. These users may also have a lower CAC and higher potential LTV, as they will be more likely to engage in multiple future sales/purchases as their needs evolve.
To confirm this hypothesis, I’d request the following data:
- Historical demographic data on all Opendoor sales
- Current demographic data on current Opendoor listings
- Demographic data on Opendoor mobile/web users
- Customer Acquisition Cost data
- Engagement and retention data w/ cohort analysis by demographic
- Types of homes saved/viewed in app by demographic
- Reported income and net worth by sellers and buyers
- User growth data as a function of home listings & home prices
From here, I’d run an analysis to see if Opendoor is in fact preferred by younger/lower wealth people. To do this, I’d look at engagement (digital and physical) relative to age/wealth and tie it to purchase/sale data history. I’d be especially curious to establish patterns of intent. For example, how many home visits does the average person make before purchasing a house. What patterns of behaviors in app (saving listings?) signal the high likelihood of placing an offer?
It’s also worth noting this strategy may help Opendoor increase it’s brand awareness at lower total cost (by increasing inventory in lower priced homes). If the current focus is user growth and expansion, then increasing inventory in lower cost homes will likely allow Opendoor to grow more cost effectively.
Hypothesis #3i/ii: ZIP Codes A and B have very different i) Demographics -OR- ii) Opendoor Marketing Strategies: ZIP Codes A and B are the two largest groups, yet have the most deviation in contract conversion performance. Luckily, they are the largest groups, making the samples more stable. Let’s quickly recap cohort performance for each ZIP code by quarter and price band (8 cohorts for each ZIP). In ZIP Code A, OD contract conversion rate exceeds MLS conversion rate in 7 of 8 cohorts. In contrast, in ZIP Code B, OD underperforms MLS in 5 of 8 cohorts. For the 3 it outperforms, the margin is quite small. ZIP Code A is Opendoor’s strongest performer and ZIP Code B is it’s weakest. My hypothesis is that either these ZIP codes have i) very different demographics OR ii) Opendoor has used two very different marketing strategies in the ZIP Codes.
To confirm this hypothesis, I’d request the following data:
- Demographic data on each ZIP Code (age, race, political, income, main language, etc.)
- Current demographic data on current Opendoor listings in each ZIP
- Demographic data on Opendoor mobile/web users in each ZIP
- Customer Acquisition Cost data for each ZIP
- Engagement and retention data w/ cohort analysis in each ZIP
- Marketing budget and marketing strategy for each ZIP
- Number of licensed brokers/real estate agents for each ZIP
From here, I’d first compare the marketing budget and strategy. Did Opendoor run the same strategy with proportionally similar budgets -or- is Opendoor A/B testing different marketing strategies these markets? If the marketing strategies were materially different, I’d then examine the impact of each strategy to the extent it causes under/over performance. If the marketing strategies were materially the same, then I’d turn to investigate the demographics of the two ZIP Codes and the performance of the marketing strategy in the two ZIP Codes. If the demographics and the marketing were nearly identical, I’d get on a plane and go figure out what is going on via boots on the ground.
Chose one of your top hypotheses from Question 2 and assume you’ve validated that it’s the driver of the performance trend. Propose a plan of action to either course correct (in the instance of underperformance) or drive continued growth (if you’ve identified positive performance).
Hypothesis #1: Opendoor is bidding differently on <$200K and $200K+ Homes: (Restated from Q2) An intriguing data point is the ratio of $200K+ listings to <$200K listings. Opendoor begins 2017 with a ratio of 4:1 (4 $200K : 1 <$200K) while general MLS listings are at 2:1. During Q1, Opendoor accelerates to a peak of 8:1, before precipitously declining to a ratio of 1:2 at the end of the year (a factor of 16x)! In comparison, MLS listings grow from 2:1 at the beginning of the sample set to a peak of 4:1 at mid-year, before settling at 3.25:1. Again, given the sample sizes are highly skewed (11.5:1), it’s not surprising to see a larger swing in Opendoor’s range. However, such a reversal likely implies a trend worth examining. My hypothesis is that Opendoor sees an economic opportunity in homes <$200K. Homes <$200k comprise ~30% of the overall market (similar across all ZIPs), naturally implying the median home price is above $250K. Perhaps Opendoor has developed a strategy for increasing the value of homes <$250K more substantially than the median home. If this is the case, Opendoor may be focusing on the low end of the market where it sees a greater opportunity for gross profit.
par(mfrow=c(1,2))
## Opendoor
ratio_200K.od <- ov.data$od_listings/un.data$od_listings
plot(ratio_200K.od,main = "OD List: Ratio $200K+/<$200K ", xlab = "Time", ylab = "Ratio", col = "dodgerblue2")
## MLS
ratio_200K.mls <- ov.data$mls_listings/un.data$mls_listings
plot(ratio_200K.mls,main = "MLS List: Ratio $200K+/<$200K ", xlab = "Time", ylab = "Ratio", col = "grey75")
Expanded Strategy: Gross Profit Arbitrage <$200K: At this point we’ve validated that there is a gross profit arbitrage at <$200K, which is why Opendoor is so focused on purchasing homes in this segment. Here is the strategy I’d use to operationalize this profit opportunity.
Step I: Define characteristics most predictive of housing price (hypothetical list below):
- Previous sale/purchase price
- Estimated surrounding home value/historical prices
- Sub-divided ZIP codes for more accurate K-Nearest Neighbor approach/zoning
- Housing details: # of beds/baths, special attributes (gym/office/garage type), Sq Ft, Year Built
- Access to high speed internet / multiple cable providers
- Proximity to amenities (grocery, restaurants, parks, nature)
- Commute/Walk score
Step II: Add human inspection by local operator (Photos to assess health of paint, roof, grass, landscape, etc.)
Step III: Understand highest ROI home improvements that can be deployed cheaply/quickly and create immediate return (hypothetical list below):
- Paint
- Counter tops
- Zero maintenance landscaping
- Appliances and light fixtures
Step IV: Build model to predict price delta of current condition and Opendoor improved condition
Step V: Create aggressive mail/digital campaign customized to houses with highest potential pricing delta. Use local operational team to help distribute and build awareness. Create unsolicited offers if legally allowed in area.
Step VI: Operationalize all procedures for fixing houses.
Step VII: Create best practices around listing and open houses. A/B test with marketing team to boost conversion and try to create multiple bids on any listed house.
## OPENDOOR CITY DATA PROJECT (3/18/19)
## rm(list=ls()) ## clear Workspace
## data prep
## load data
setwd("/Users/jeffreyellington/Dropbox/Code/OpenDoor/")
data <- read.csv(file = "CityOpsData.csv") ## transformations done in excel
## first glance at data
dim(data) ## 18 columns --> added 10 columns in excel
str(data) ## data types loaded correctly
summary(data) ## no obvious errors
sum(is.na(data)) ## no N/A
## data cleaning
## check day / month / qtr
table(table(data$date)) ## 366 days
table(data$date) ## 01/07 appears in both 2017 and 2018
table(data$month) ## ok
table(data$qtr) ## ok
## check zip code
table(data$zip_code) ## ok
## cohort analysis
## create 2 cohorts (2 Price Levels)
un.data <- subset(data, data$price_band == "<$200K")
ov.data <- subset(data, data$price_band == "$200K+")
## create 4 cohorts (4 Zip Codes)
a.data <- subset(data, data$zip_code == "A")
b.data <- subset(data, data$zip_code == "B")
c.data <- subset(data, data$zip_code == "C")
d.data <- subset(data, data$zip_code == "D")
## create 8 cohorts (4 ZipCodes * 2 Price Levels)
a.un.data <- subset(data, data$zip_code == "A" & data$price_band == "<$200K")
a.ov.data <- subset(data, data$zip_code == "A" & data$price_band == "$200K+")
b.un.data <- subset(data, data$zip_code == "B" & data$price_band == "<$200K")
b.ov.data <- subset(data, data$zip_code == "B" & data$price_band == "$200K+")
c.un.data <- subset(data, data$zip_code == "C" & data$price_band == "<$200K")
c.ov.data <- subset(data, data$zip_code == "C" & data$price_band == "$200K+")
d.un.data <- subset(data, data$zip_code == "D" & data$price_band == "<$200K")
d.ov.data <- subset(data, data$zip_code == "D" & data$price_band == "$200K+")
366*8 ## dimensions correct
## Plot to see if any systematic data error
par(mar=c(4,4,4,4))
par(mfrow=c(2,2)) ## Plots for MLS Listings by Zipcode for <$200k
plot(a.un.data$mls_listings)
plot(b.un.data$mls_listings)
plot(c.un.data$mls_listings)
plot(d.un.data$mls_listings)
par(mfrow=c(2,2)) ## Plots for OD Listings by Zipcode for <$200k
plot(a.un.data$od_listings)
plot(b.un.data$od_listings)
plot(c.un.data$od_listings)
plot(d.un.data$od_listings)
par(mfrow=c(2,2)) ## Plots for MLS Listings by Zipcode for $200k+
plot(a.ov.data$mls_listings)
plot(b.ov.data$mls_listings)
plot(c.ov.data$mls_listings)
plot(d.ov.data$mls_listings)
par(mfrow=c(2,2)) ## Plots for OD Listings by Zipcode for $200k+
plot(a.ov.data$od_listings)
plot(b.ov.data$od_listings)
plot(c.ov.data$od_listings)
plot(d.ov.data$od_listings)
## data appears highly correlated across zipcodes for MLS and OD data in each price bracket
## load ggplot2 for graphing
## install.packages("ggplot2")
library(ggplot2)
## Aggregate stats
## 1 Year Listing Growth
Day1.OD <- sum(data$od_listings[data$date == "2017/01/07" & data$zip_code == c("A", "B", "C", "D")])
Day366.OD <- sum(data$od_listings[data$date == "2018/01/07" & data$zip_code == c("A", "B", "C", "D")])
Day1.MLS <-sum(data$mls_listings[data$date == "2017/01/07" & data$zip_code == c("A", "B", "C", "D")])
Day366.MLS <-sum(data$mls_listings[data$date == "2018/01/07" & data$zip_code == c("A", "B", "C", "D")])
Day1.OD
Day366.OD
Day1.MLS
Day366.MLS
(Day366.OD - Day1.OD)/Day1.OD
(Day366.MLS - Day1.MLS)/Day1.MLS
## 1 Year Visitor Growth
Day1.OD.hv <- sum(data$od_home_visits[data$date == "2017/01/07" & data$zip_code == c("A", "B", "C", "D")])
Day366.OD.hv <- sum(data$od_home_visits[data$date == "2018/01/07" & data$zip_code == c("A", "B", "C", "D")])
Day1.OD.hv
Day366.OD.hv
(Day366.OD.hv - Day1.OD.hv)/Day1.OD.hv
Day1.OD.hv/Day1.OD
Day366.OD.hv/Day366.OD
## comparing listing growth by price band and zip code
## <$200K
data.frame(un.data)
ggp.un <- ggplot(data.frame(un.data), aes(date))
## plotting MLS,OD listings
ggp.un +
geom_jitter(aes(color=zip_code,y=mls_listings))+
labs(x = "Time", y = "Listing Count", title = "MLS Listing Count: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp.un +
geom_jitter(aes(color=zip_code,y=od_listings))+
labs(x = "Time", y = "Listing Count", title = "OD Listing Count: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
## $200K+
data.frame(ov.data)
ggp.ov <- ggplot(data.frame(ov.data), aes(date))
## plotting MLS,OD listings
ggp.ov +
geom_jitter(aes(color=zip_code,y=mls_listings))+
labs(x = "Time", y = "Listing Count", title = "MLS Listing Count: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
ggp.ov +
geom_jitter(aes(color=zip_code,y=od_listings))+
labs(x = "Time", y = "Listing Count", title = "OD Listing Count: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
## $200K+::<$200K Ratio
par(mfrow=c(1,2))
## Opendoor
ratio_200K.od <- ov.data$od_listings/un.data$od_listings
plot(ratio_200K.od,main = "OD Listings: Ratio of $200K+/<$200K ", xlab = "Time", ylab = "Ratio", col = "dodgerblue2")
## MLS
ratio_200K.mls <- ov.data$mls_listings/un.data$mls_listings
plot(ratio_200K.mls,main = "MLS Listings: Ratio of $200K+/<$200K ", xlab = "Time", ylab = "Ratio", col = "grey75")
## plotting OD to MLS listing ratio
## full data
data.frame(data)
ggp <- ggplot(data.frame(data), aes(date,100*od_mls_listing_ratio))
ggp +
geom_point(aes(color = price_band))+
labs(x = "Time", y = "Percent", title = "OD:MLS Listing Ratio (All Zip Codes)")+
theme(axis.text.x=element_blank())
## comparing visit data by price band and zip code
## <$200K
data.frame(un.data)
ggp.un <- ggplot(data.frame(un.data), aes(date))
## plotting visits as a function of listings
ggp.un +
geom_jitter(aes(color=zip_code,y=od_visits_listing))+
labs(x = "Time", y = "Count", title = "Visits Per Listing: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
## plotting contracts as a function of visits
ggp.un +
geom_jitter(aes(color=zip_code,y=100*od_contracts_visit))+
labs(x = "Time", y = "Percent", title = "Contracts Per Visit: <$200K (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
## $200K+
data.frame(ov.data)
ggp.ov <- ggplot(data.frame(ov.data), aes(date))
## plotting visits as a function of listings
ggp.ov +
geom_jitter(aes(color=zip_code,y=od_visits_listing))+
labs(x = "Time", y = "Count", title = "Visits Per Listing: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
## plotting contracts as a function of visits
ggp.ov +
geom_jitter(aes(color=zip_code,y=100*od_contracts_visit))+
labs(x = "Time", y = "Percent", title = "Contracts Per Visit: $200K+ (Zip Codes A,B,C,D)")+
theme(axis.text.x=element_blank())
## evaluating contracts per listing by cohort
## OD average by cohort
a.u.od <- median(a.un.data$od_contracts/a.un.data$od_listings)
a.o.od <- median(a.ov.data$od_contracts/a.ov.data$od_listings)
b.u.od <- median(b.un.data$od_contracts/b.un.data$od_listings)
b.o.od <- median(b.ov.data$od_contracts/b.ov.data$od_listings)
c.u.od <- median(c.un.data$od_contracts/c.un.data$od_listings)
c.o.od <- median(c.ov.data$od_contracts/c.ov.data$od_listings)
d.u.od <- median(d.un.data$od_contracts/d.un.data$od_listings)
d.o.od <- median(d.ov.data$od_contracts/d.ov.data$od_listings)
## OD average by cohort
a.u.mls <- median(a.un.data$mls_contracts/a.un.data$mls_listings)
a.o.mls <- median(a.ov.data$mls_contracts/a.ov.data$mls_listings)
b.u.mls <- median(b.un.data$mls_contracts/b.un.data$mls_listings)
b.o.mls <- median(b.ov.data$mls_contracts/b.ov.data$mls_listings)
c.u.mls <- median(c.un.data$mls_contracts/c.un.data$mls_listings)
c.o.mls <- median(c.ov.data$mls_contracts/c.ov.data$mls_listings)
d.u.mls <- median(d.un.data$mls_contracts/d.un.data$mls_listings)
d.o.mls <- median(d.ov.data$mls_contracts/d.ov.data$mls_listings)
## create lists and dataframes
od.medians <- 100*c(a.u.od, a.o.od, b.u.od, b.o.od, c.u.od, c.o.od, d.u.od, d.o.od)
mls.medians <- 100*c(a.u.mls, a.o.mls, b.u.mls, b.o.mls, c.u.mls, c.o.mls, d.u.mls, d.o.mls)
diff.medians <- (od.medians-mls.medians)
cohort.names <- c("A <$200K", "A $200K+","B <$200K", "B $200K+","C <$200K", "C $200K+","D <$200K", "D $200K+")
cohort.df <- data.frame(Cohort = cohort.names, ODmedian = od.medians, MLSmedian = mls.medians, Diff=diff.medians)
cohort.df
## adding time series information by quarter
qtr.data <- read.csv(file = "CityCohorts.csv") ## read in Pivot Table from excel, ignore error message
qtr.data
## 32 data points: 4 Quarters * 2 Price Levels * 4 Zip Codes
## QTR data is result of OD median contract win rate/MLS median contract win rate (so, it reflects the difference in % points)
qtr.data$QTR <- c(1:4)
qtr.data$QTR <- as.numeric(qtr.data$QTR)
data.frame(qtr.data)
str(qtr.data)
ggp.cohorts <- ggplot(data.frame(qtr.data), aes(x=QTR))
## <$200K subsection
ggp.cohorts +
geom_point(aes(y=100*A..200K), color = "#F8766D")+
geom_line(aes(y=100*A..200K), color = "#F8766D")+
geom_point(aes(y=100*B..200K), color = "#7CAE00")+
geom_line(aes(y=100*B..200K), color = "#7CAE00")+
geom_point(aes(y=100*C..200K), color = "#00CFB4")+
geom_line(aes(y=100*C..200K), color = "#00CFB4")+
geom_point(aes(y=100*D..200K), color = "#C77CFF")+
geom_line(aes(y=100*D..200K), color = "#C77CFF")+
geom_line(aes(y=0), color="Black")+
theme(legend.position = "right")+
labs(x = "Quarter", y = "Difference: Percentage Points", title = "Difference in OD:MLS Contract/Listing Win Rate: <$200K, By Quarter")
## $200K+ subsection
ggp.cohorts +
geom_point(aes(y=100*A.200K.), color = "#F8766D")+
geom_line(aes(y=100*A.200K.), color = "#F8766D")+
geom_point(aes(y=100*B.200K.), color = "#7CAE00")+
geom_line(aes(y=100*B.200K.), color = "#7CAE00")+
geom_point(aes(y=100*C.200K.), color = "#00CFB4")+
geom_line(aes(y=100*C.200K.), color = "#00CFB4")+
geom_point(aes(y=100*D.200K.), color = "#C77CFF")+
geom_line(aes(y=100*D.200K.), color = "#C77CFF")+
geom_line(aes(y=0), color = "Black")+
labs(x = "Quarter", y = "Difference: Percentage Points", title = "Difference in OD:MLS Contract/Listing Win Rate: $200K+, By Quarter")