2/5/2021

Agenda

  • Data Gathering
  • Airbnb and Zillow Data Quality
  • EDA of Final Data
  • Analysis & Metadata
  • Final Data Visuals
  • ROI Zip Codes
  • Popular Zip Codes
  • Conclusion
  • Future Research

Data Gathering with S3

ABNB = read.csv('https://dataprojects1.s3.amazonaws.com/listingsCap1.csv') 
Zillow_cost = read.csv('https://dataprojects1.s3.amazonaws.com/Zip_Zhvi_2B.csv') 
  • S3 enable us to read data from the cloud
  • Superior for security, and can be read from any machine

Data Quality

Airbnb & Zillow Data Quality

Zillow

dim(Zillow_cost) 
[1] 8946  262
  • Presence of missing values before ’10
  • Data types are logical
  • The data shows property costs by zip code

Zillow Costs Through Time

- Property Costs have been increasing steadily

Final Zillow

zipcode Median_cost Cities
10003 2147000 New York
10011 2480400 New York
10013 3316500 New York
10014 2491600 New York
10021 1815600 New York
10022 2031600 New York
  • We grouped by zip code and filter for NY
  • We are using Median Costs to avoid outliers
  • The Zillow Data is now Tidy and Clean

Airbnb Data Quality

dim(ABNB)
[1] 48895   106
  • Numerous columns which are irrelevant to the study
  • Presence of Missing values
  • Relevant data types are not accurate
  • Presence of symbols which hinders calculations

Airbnb Data Quality (2)

Missing Values

Final Airbnb data

zip price state p_type Neighbourhood Cl_fee Beds Reviews Rating
10029 190 NY Apartment Manhattan NA 2 0 NA
11221 115 NY Townhouse Brooklyn 85 2 11 94
11206 228 NY Loft Brooklyn 128 2 82 94
10001 375 NY Apartment Manhattan 120 2 5 100
10162 250 NY Apartment Manhattan 200 2 66 93
11215 225 NY Condominium Brooklyn NA 2 4 100
  • Data types were altered to match logical variables
  • Significant variables were selected
  • Symbols were removed to ease future calculations
  • The Airbnb data is now cleaned to visualize

Neighbourhood Allocation

Property Allocation

Data Merge & EDA

Combined Data

Combined_data1 = ABNB_2 %>% inner_join(Zillow_final, by = 'zipcode')
  • The tables were inner joined to have only zip codes with property cost and Airbnb price
  • This is not our final data, yet we can see insightful visuals with it

EDA of the Combined Data

dim(Combined_data1)
[1] 1563   15
  • We imputed missing values in Cleaning_fee using KNN algorithm
  • The Combined data is cleaned, and ready to be analyzed
[1] FALSE

Vizualisations on the Combined Data

  • Outliers present in zip codes 10003 and 11217 and a few other codes
  • To avoid biased results, we will use median prices for the analysis

Airbnb Price by Neighborhood

  • Manhattan has the higher prices, surpassing Brooklyn by 44%
  • Queens and Staten Island, have considerable lower prices

Property Count by Neighbourhood

  • Both Brooklyn and Manhattan had the most properties

Prices and Costs

Prices and Costs (2)

  • There is correlation between price and cost, yet it is weak (.24)
  • There can be zip codes that are not as expensive yet charge high prices

Analysis & Metadata

Analysis

The purpose of the study is to find zip codes with attractive ROIs. To do so, we created new variables on our cleaned and merged data which can assist the client to make an educated financial decision.

  • These new variables measure profitability, ROI and break even point (ignoring Occupancy 75%)
  • For more details on the metadata, we can observe the code

Metadata

  • Annual Revenue - Total sales produced by each property
  • Profit - Revenue minus Airbnb 14% fee to the host (ignoring other costs)
  • Break even in Years - how long it will take for a property to break even in years?
  • Profit in 5 years - how much profit will each property make in 5 years?
  • ROI in 5 years - What would be the return on equity in 5 years?
  • Profit in 10 years - How much profit will each property make in 5 years?
  • ROI in 10 years - What would be the return on equity in 5 years?

Final Data

zipcode Median_cost Annual_Rev Profit Breakeven_years Profit_in_5 ROI_in_5 Profit_in_10 ROI_in_10
10003 2147000 87417.50 75179.05 28.55852 -1771105 0.1750793 -1395210 0.3501586
10011 2480400 110020.12 94617.31 26.21541 -2007313 0.1907299 -1534227 0.3814599
10013 3316500 118716.25 102095.98 32.48414 -2806020 0.1539213 -2295540 0.3078425
10014 2491600 98066.38 84337.08 29.54407 -2069915 0.1692428 -1648229 0.3384856
10021 1815600 74460.00 64035.60 28.35298 -1495422 0.1763483 -1175244 0.3526966
10022 2031600 108843.00 93604.98 21.70397 -1563575 0.2303726 -1095550 0.4607451
  • We finally have a glance at the final data
  • Isn’t clean?

Final Data Visuals

Break-even in Years by Zip code

According payback period, zip codes: - 10306 - 10303 - 11234 - 11234 - 10304 - 11434, will be paid off first

  • Are the lower cost zip codes paid first?

Payback Period with Median Prices

  • As we implied, lower cost properties are usually paid first. Does that mean they are the best investments?

Annual Revenue by Zip code

In terms of revenue, zip codes: 10022 - 10036 - 11201 - 11215 - 11231 have the highest sales

  • These codes are areas such as Midtown, Times Square and Dumbo
  • These locations will maximize revenue, yet their costs are also remarkable

Expected ROI in 5 Years by Zipcode

Looking at the ROI in 5 years, our best zip codes are:

  • 10306 - 10303 - 11234 - 10304 - 11434
  • Does the outcome changes in 10 years?

Expected ROI in 10 Years by Zipcode

- The outcome does not seem to change from the 5 year to 10 year mark

ROI Zip Codes

Leading ROI Zip Codes

According to the analysis, the Zip codes that will maximize the return on investment are:

  • 10306

  • 10303

  • 11234

  • 10304

  • 11434

  • Now, let us know more about these codes

ROI Zip Code Table

zipcode mean.price mean.review median_cost Revenue location review.score
10303 104.00000 18.00000 327700 39766.75 Staten Island 91.75000
10304 93.33333 31.66667 328300 37047.50 Staten Island 91.66667
10306 117.50000 10.50000 352900 52240.62 Staten Island 89.00000
11234 135.11111 34.88889 476900 55315.75 Brooklyn 94.25000
11434 136.87500 37.12500 382300 42294.38 Queens 95.53846
  • 3 of our most profitable zip codes are located in Staten Island.
    • This could sound surprising, yet their price/night and property cost relationships are the most attractive (ignoring demand).
  • The zip code with most reviews on average is 11434, followed by 11234 located Brooklyn
  • Average number of reviews in the data is 19.79, and most of the zip code surpasses it

Popular Zip Code Analysis

Popular Zip Codes

Popular Zip Code Table

Popular Zip Codes ROI in 5 years

Conclusion and Future Research

Conclusion

The goal of this study was to clear the noise in the data, and provide a list of zip codes which can provide superior ROIs in the NY real estate market. To accomplish this objective, we cleaned and join our tables, created variables that measure ROI, and analyzed those metrics. As a result, these are our findings:

ROI Zip Codes

  • 10306
  • 10303
  • 11234
  • 10304
  • 11434

Popular Zip Codes

  • 10305
  • 10308
  • 11231
  • 11215

Zip Code Highlights

Leaflet Map

Future Research

  • Explore other potential costs (mortgage, reparations, property manager, taxes)

  • The idea of property appreciation is an important aspect to study.

  • For this study, we did not take occupancy and demand into account. This metric should be explored in the future.

Q&A