Exploring most profitable zipcodes within New York

1.0 Case Overview

1.1 Executive Summary

You are consulting for a real estate company that has a role in purchasing properties to rent out short-term as part of their business model specifically within New York City. The real estate company has already concluded that two-bedroom properties are the most profitable; however, they do not know which zip codes are the best to invest in.

The real estate company has engaged your firm to build out a data product and provide your conclusions to help them understand which zip codes would generate the most profit on short term rentals within New York City.

You will be looking at publicly available data from Zillow and AirBnB:

Cost data: Zillow provides us an estimate of value for two-bedroom properties

Revenue data: AirBnB is the medium through which the investor plans to lease out their investment property. Fortunately for you, we can see how many properties in certain neighbourhoods rent out for in New York City.

Results: Analysing the data under different dimensions of Customer Preference, Market Demand, Customer Feedback/Reviews, Break-Even Time, Three zip codes that are proposed suitable for investment are : 11215, 11217,11231

1.2 Given Assumptions

After meeting with the strategy team, you’ve got an idea of where to start, key concerns, and how you can help this real estate company with the market data while keeping the following assumptions in mind :

  • You can assume an occupancy rate of 75% or you can come up with your model to calculate occupancy;
  • The investor will pay for the property in cash.
  • The time value of money discount rate is 0%.
  • All properties and all square feet within each locale can be assumed to be homogeneous

1.3 Taken Assumptions

Below Assumptions are considered :

  • Actual Rent/Night of property includes rental price in addition to a mandatory Cleaning fee.
  • Cleaning fee and security deposit is considered as 0 where Blank
  • Seasonality has little or no impact on the number of bookings.
  • An increase in the Cost of property post 2017-July had no seasonality effect(Time series has no seasonality).
  • Rent of all properties increases by 10% every year.
  • The number of reviews are indicative of the high occupancy of the property.
  • All reviews are positive.
  • Availablity for the next 365 days: availablity_365 is a status quo as far as occupancy in concerned.
    Less availability indicates High demand.

2.0 Initial Set Up

2.1 Package Loading

Required Packages

  • Tidyverse (dplyr, ggplot2..) - Data Read, Manipulation and visualisation
  • Plotly - Interactive Visualization
  • KableExtra - Styling for table (Styling Data Tables within Markdown)
  • gridExtra- Graphical arrangement
  • forecast- Time series and forecasting
  • ggplot2- Graphical representation
  • stringr- string manipulations
  • corrplot-making correlogram
  • knitrr- Dynamic report generation

2.2 Data Loading

Revenue(Airbnb) and Cost(Zillow) Datasets are loaded into R-Environment with the help of below code..

Please note the below code uses .csv files to load data

costPrice_Zillow<-read.csv("Zip_Zhvi_2bedroom.csv")
rentAirbnbListings<-read.csv("listings.csv")

dim_zlow<-dim(costPrice_Zillow)
dim_abnb<-dim(rentAirbnbListings)
dim_zlow #Zillow data
## [1] 8946  262
dim_abnb #Airbnb data
## [1] 48895   106

Check Dimensions

The dimension of Zillow Cost data is 8946, 262 The dimension of Airbnb Revenue data is 48895, 106

3.0 Data Preparation

Cost and Revenue datasets are scrubbed separately in an attempt to enrich the data quality for Exploratory data analysis.

3.1 Zillow Cost data

kable(head(costPrice_Zillow))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
RegionID RegionName City State Metro CountyName SizeRank X1996.04 X1996.05 X1996.06 X1996.07 X1996.08 X1996.09 X1996.10 X1996.11 X1996.12 X1997.01 X1997.02 X1997.03 X1997.04 X1997.05 X1997.06 X1997.07 X1997.08 X1997.09 X1997.10 X1997.11 X1997.12 X1998.01 X1998.02 X1998.03 X1998.04 X1998.05 X1998.06 X1998.07 X1998.08 X1998.09 X1998.10 X1998.11 X1998.12 X1999.01 X1999.02 X1999.03 X1999.04 X1999.05 X1999.06 X1999.07 X1999.08 X1999.09 X1999.10 X1999.11 X1999.12 X2000.01 X2000.02 X2000.03 X2000.04 X2000.05 X2000.06 X2000.07 X2000.08 X2000.09 X2000.10 X2000.11 X2000.12 X2001.01 X2001.02 X2001.03 X2001.04 X2001.05 X2001.06 X2001.07 X2001.08 X2001.09 X2001.10 X2001.11 X2001.12 X2002.01 X2002.02 X2002.03 X2002.04 X2002.05 X2002.06 X2002.07 X2002.08 X2002.09 X2002.10 X2002.11 X2002.12 X2003.01 X2003.02 X2003.03 X2003.04 X2003.05 X2003.06 X2003.07 X2003.08 X2003.09 X2003.10 X2003.11 X2003.12 X2004.01 X2004.02 X2004.03 X2004.04 X2004.05 X2004.06 X2004.07 X2004.08 X2004.09 X2004.10 X2004.11 X2004.12 X2005.01 X2005.02 X2005.03 X2005.04 X2005.05 X2005.06 X2005.07 X2005.08 X2005.09 X2005.10 X2005.11 X2005.12 X2006.01 X2006.02 X2006.03 X2006.04 X2006.05 X2006.06 X2006.07 X2006.08 X2006.09 X2006.10 X2006.11 X2006.12 X2007.01 X2007.02 X2007.03 X2007.04 X2007.05 X2007.06 X2007.07 X2007.08 X2007.09 X2007.10 X2007.11 X2007.12 X2008.01 X2008.02 X2008.03 X2008.04 X2008.05 X2008.06 X2008.07 X2008.08 X2008.09 X2008.10 X2008.11 X2008.12 X2009.01 X2009.02 X2009.03 X2009.04 X2009.05 X2009.06 X2009.07 X2009.08 X2009.09 X2009.10 X2009.11 X2009.12 X2010.01 X2010.02 X2010.03 X2010.04 X2010.05 X2010.06 X2010.07 X2010.08 X2010.09 X2010.10 X2010.11 X2010.12 X2011.01 X2011.02 X2011.03 X2011.04 X2011.05 X2011.06 X2011.07 X2011.08 X2011.09 X2011.10 X2011.11 X2011.12 X2012.01 X2012.02 X2012.03 X2012.04 X2012.05 X2012.06 X2012.07 X2012.08 X2012.09 X2012.10 X2012.11 X2012.12 X2013.01 X2013.02 X2013.03 X2013.04 X2013.05 X2013.06 X2013.07 X2013.08 X2013.09 X2013.10 X2013.11 X2013.12 X2014.01 X2014.02 X2014.03 X2014.04 X2014.05 X2014.06 X2014.07 X2014.08 X2014.09 X2014.10 X2014.11 X2014.12 X2015.01 X2015.02 X2015.03 X2015.04 X2015.05 X2015.06 X2015.07 X2015.08 X2015.09 X2015.10 X2015.11 X2015.12 X2016.01 X2016.02 X2016.03 X2016.04 X2016.05 X2016.06 X2016.07 X2016.08 X2016.09 X2016.10 X2016.11 X2016.12 X2017.01 X2017.02 X2017.03 X2017.04 X2017.05 X2017.06
61639 10025 New York NY New York New York 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 798600 798800 801500 804600 814900 828300 835700 849300 858100 854000 834800 821700 830300 853700 868300 875200 882200 892400 905000 924000 934400 932100 927500 923600 907900 890900 883400 896100 923900 952900 964600 972500 973800 973400 966500 966800 967100 974800 976800 976100 973700 974500 973200 966400 950400 933300 920900 909400 891400 873300 858800 850200 842800 834000 828800 821400 813900 813300 821500 831700 845100 854500 858900 859200 863500 876000 886100 890000 894200 901800 909500 913300 907400 900000 897700 896300 892300 890400 888600 891700 899500 904400 908200 914000 915100 912300 914000 921100 923300 917300 915000 922800 929100 937700 955700 974200 995500 1019500 1035100 1054900 1079900 1092600 1103500 1118800 1139300 1154600 1144100 1120300 1125500 1136000 1135100 1130000 1138200 1153700 1174800 1185400 1188400 1189700 1193700 1199900 1201400 1202600 1214200 1235200 1258000 1287700 1307200 1313900 1317100 1327400 1338800 1350400 1356600 1358500 1364000 1373300 1382600 1374400 1364100 1366300 1354800 1327500 1317300 1333700 1352100 1390000 1431000
84654 60657 Chicago IL Chicago Cook 2 167700 166400 166700 167200 166900 166900 168000 170100 171700 173000 174600 177600 180100 182300 184400 186300 187600 189400 190300 189700 189800 191900 194500 195500 196000 196900 198900 201400 204600 207900 211800 214600 216000 217500 220200 222800 226200 229600 232400 234400 236300 238300 241800 246100 249500 251300 253200 255700 259200 263100 266600 269500 272800 275500 278800 283400 288600 291300 292400 294600 297100 298200 299800 302000 304200 307900 311000 311400 311000 311700 312300 312000 311800 312600 313000 314400 317300 319700 320500 321000 321600 323800 326100 329000 332200 334700 336000 337300 337500 337100 334900 333100 332800 333400 335100 337800 339400 340700 343200 345000 345200 344600 344500 346200 349800 353400 355100 356300 358300 359500 359200 358500 359100 361400 364700 367500 369400 369800 369600 368700 368100 369000 370300 371700 374200 375700 376400 378200 379800 380100 380400 381200 382700 382700 380200 377800 376300 375600 376000 376200 375800 376300 377200 376800 373700 370100 368700 368600 366600 362200 358600 355300 352300 350900 350100 347900 345400 343400 342800 342600 341100 339900 338900 338200 335200 329800 325500 323600 323400 325000 325800 323200 320100 318600 317400 315700 315000 315300 315600 313900 309800 305700 301800 299500 299900 301100 300300 298900 298500 298500 297000 296800 298700 299600 300700 303900 306800 307500 308500 310000 310800 311200 313000 315800 319000 323400 327500 330000 331800 334500 336000 335700 335400 336300 338800 342400 344400 344000 343900 345100 346100 346900 348000 349700 351200 351700 350700 350400 352000 354300 355900 356500 355200 353800 353700 354600 356200 357800 358200 358500 360300 362400 363700 365200 367100 368600 370200 372300 375300 378700 381400 381800 382100 383300 385100
61637 10023 New York NY New York New York 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1526800 1424500 1346600 1331300 1322500 1289300 1265400 1249700 1241100 1232700 1225500 1228200 1252600 1266100 1288700 1308100 1333000 1356400 1362000 1353600 1364000 1373900 1389600 1401600 1404100 1415800 1432400 1455400 1474200 1462300 1438300 1435500 1427800 1411200 1407400 1419700 1457400 1500800 1524900 1537800 1558700 1586100 1602300 1621100 1639300 1657400 1657400 1656100 1649400 1643400 1632400 1618200 1588300 1543600 1500800 1464200 1426100 1387300 1362600 1351700 1344300 1331800 1334800 1314200 1271900 1252300 1262300 1279200 1309000 1335300 1353800 1366400 1372100 1381300 1385000 1388100 1399100 1399800 1389300 1384700 1380900 1367900 1365400 1375100 1380400 1377000 1375100 1379000 1395200 1414500 1419000 1403100 1383200 1376700 1378200 1378700 1375900 1366700 1365500 1382200 1404700 1428000 1445700 1452900 1460100 1484400 1508400 1522800 1538300 1568600 1597400 1622900 1654300 1684600 1713000 1728800 1736100 1745900 1753800 1736600 1730400 1734500 1728700 1720800 1717700 1700100 1680400 1676400 1685600 1708100 1730400 1751800 1778300 1810400 1831600 1844400 1861600 1889600 1901500 1895300 1890200 1898400 1924500 1967300 1993500 1980700 1960900 1951300 1937800 1929800 1955000 2022400 2095000 2142300
84616 60614 Chicago IL Chicago Cook 4 195800 193500 192600 192300 192600 193600 195500 197600 199400 201300 203600 206500 209200 211100 212600 214400 215600 216500 217900 220100 222200 223900 225400 227700 230100 231700 232700 233700 234700 235600 236800 238800 240800 242400 243800 246400 250200 254300 257600 261100 264800 267900 270700 272800 274400 276200 278600 280100 283100 287700 293600 298500 302700 305000 306800 309400 313100 314900 316200 318200 320600 322900 325500 328400 330700 332800 334400 335900 337400 339700 342300 343800 343400 342300 341800 341700 342400 344300 346900 348900 350200 351700 353500 355700 358000 361600 364000 365500 366400 367000 365200 363100 362000 362500 364800 368200 370800 371400 371200 371800 374200 377800 380100 381700 384400 386600 385700 385400 388500 392200 394500 394900 394800 395500 400000 404300 407600 409500 410400 408700 406400 403800 402400 402300 403000 403100 403600 404500 406100 407700 408700 409600 409800 408700 408600 411000 412200 411400 410200 408400 406600 406500 406400 404600 402900 400600 397500 392600 387500 383700 381000 378900 377700 377400 376700 374000 369600 366200 364500 364100 364300 363600 361900 361900 356400 347100 342400 344000 345200 346900 346100 342600 340100 339900 338600 335500 333900 335000 336000 334200 330300 327000 326000 326100 326700 326300 324400 322700 323200 322800 320700 319500 320100 320500 321800 323600 324300 324100 324700 326000 327600 329800 332600 336800 342300 348100 353600 358900 361900 363900 366200 368300 369800 371400 372400 373200 373800 374800 376200 376800 376300 374900 373800 373900 374700 375300 375000 374700 376300 378100 378000 377700 378300 380000 383100 385900 388100 389700 391800 393400 394700 394900 395700 396400 397500 398900 401200 403200 405700 408300 408800 408000 410100 412200 412200
93144 79936 El Paso TX El Paso El Paso 5 59100 60500 60900 60800 60300 60400 61200 61700 61000 60100 59300 59000 58700 58400 58000 57800 57900 57800 57800 58100 58400 58700 59200 59400 58700 58100 57900 57900 57700 57600 57500 57800 58000 58000 57900 57800 57800 58000 58500 58700 59000 59200 59300 59500 59900 60300 60400 60300 60300 60000 59500 59400 59800 59900 59700 59500 59400 59500 59700 59700 59200 58700 58400 57800 57000 56700 56800 56600 56500 56500 56600 56700 56600 56700 57000 57300 57300 57300 57100 56900 56900 57000 56700 56700 57000 57400 57700 58000 58300 58600 58800 59100 59500 60200 61100 61700 62400 63100 63600 63800 64400 64900 65100 65200 65300 65600 66300 67100 67900 68800 69600 70600 71500 72100 72600 73400 74200 74500 75000 75700 76400 77100 77700 78100 78700 79600 80600 81500 82300 83200 83900 84400 84300 84400 85200 86100 86800 87400 87700 87800 88300 88800 88500 87900 87500 86600 85500 84600 84200 83900 83600 83400 83500 84100 84600 84600 84700 84600 84600 85100 85500 85800 86600 87600 86500 84200 83300 83800 83800 83600 83600 83400 82900 82400 82200 82100 82300 82800 82900 82600 82700 83100 83300 83100 82800 82500 82300 82200 82300 82200 81900 81700 82100 82600 82900 83000 83000 82900 82100 81200 80800 80700 81200 81800 81800 81400 81400 81500 81900 82000 81900 81900 82100 82100 81500 80800 80300 80100 80100 80700 81200 81700 81900 81700 81500 81700 81700 80900 81000 81500 81400 80500 80000 80100 80500 80800 81400 82300 82600 82600 82500 82500 82600 82700 82600 82400 82300 82400 82300 82500 83200 83900 84100 83900 83700
84640 60640 Chicago IL Chicago Cook 6 123300 122600 122000 121500 120900 120600 120900 121300 121600 122100 122900 124200 125300 126100 126700 127900 129300 130400 131300 131700 132300 133500 134500 134800 135200 135500 136300 137700 139600 141600 143400 144500 145600 147400 149200 149900 150500 151700 152800 153900 156400 159400 161800 163800 165700 167100 168400 169600 171000 172400 174800 177900 180400 182300 184600 187700 190700 193100 196100 200000 202900 205500 207600 208600 209200 210000 210200 211300 213000 214100 215200 217600 220200 222400 224600 227000 228600 230100 232400 234300 235300 236200 237000 237900 239200 241300 243600 244200 243800 244500 245500 245700 246400 247500 248000 249000 251100 252900 253700 256100 259400 261300 262200 263800 265300 267600 270500 272800 273700 273600 273200 273300 273500 273800 274300 274400 274400 275300 276600 278200 280600 283100 284400 284200 283000 282000 282400 283700 285200 286700 286700 284900 283300 282700 282100 280800 280300 280200 280100 279900 281200 282300 282100 279300 275800 273100 272600 271800 270300 268600 266900 264800 262900 260700 258500 256600 255000 252900 250600 248900 248900 250100 250300 250400 248000 243800 240200 239700 239000 238200 236800 235400 233300 231300 230300 228200 225200 223100 222300 220700 218300 215900 213200 210900 209600 208200 205900 204000 203200 202500 200800 199400 199900 201900 204500 207000 208100 207100 205300 204700 204700 205800 208600 211800 213500 213800 215100 218400 221500 223900 226100 227900 229100 230200 230600 230400 230000 229000 227600 226100 225700 226200 226500 226500 227100 227800 229400 231800 234100 235400 235100 233900 233700 235300 237200 238500 239300 239600 239500 240200 242700 244900 247700 249500 248800 247000 247300 248700 250800 252800 253800 253800 253400 254100 255100

Please note Only 6 rows are shown above with head function

Structure of Zillow data

#Check the structure
str(costPrice_Zillow)
## 'data.frame':    8946 obs. of  262 variables:
##  $ RegionID  : int  61639 84654 61637 84616 93144 84640 97564 91982 71831 84646 ...
##  $ RegionName: int  10025 60657 10023 60614 79936 60640 94109 77494 32162 60647 ...
##  $ City      : Factor w/ 4684 levels "Aberdeen","Abilene",..: 2702 649 2702 649 1119 649 3479 672 3912 649 ...
##  $ State     : Factor w/ 48 levels "AK","AL","AR",..: 34 15 34 15 42 15 5 42 10 15 ...
##  $ Metro     : Factor w/ 467 levels "","Aberdeen",..: 292 76 292 76 125 76 368 188 415 76 ...
##  $ CountyName: Factor w/ 722 levels "Ada","Adams",..: 460 158 460 158 212 158 574 231 634 158 ...
##  $ SizeRank  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X1996.04  : int  NA 167700 NA 195800 59100 123300 336800 117300 72600 129600 ...
##  $ X1996.05  : int  NA 166400 NA 193500 60500 122600 337400 117900 74800 129800 ...
##  $ X1996.06  : int  NA 166700 NA 192600 60900 122000 337700 118800 77300 129900 ...
##  $ X1996.07  : int  NA 167200 NA 192300 60800 121500 338200 121100 80000 130600 ...
##  $ X1996.08  : int  NA 166900 NA 192600 60300 120900 339700 123200 81700 131100 ...
##  $ X1996.09  : int  NA 166900 NA 193600 60400 120600 340400 124600 82600 131200 ...
##  $ X1996.10  : int  NA 168000 NA 195500 61200 120900 339700 125800 83500 131000 ...
##  $ X1996.11  : int  NA 170100 NA 197600 61700 121300 338900 126400 84700 131400 ...
##  $ X1996.12  : int  NA 171700 NA 199400 61000 121600 338300 125600 85100 132000 ...
##  $ X1997.01  : int  NA 173000 NA 201300 60100 122100 340000 125100 84200 132100 ...
##  $ X1997.02  : int  NA 174600 NA 203600 59300 122900 345800 127000 82200 132200 ...
##  $ X1997.03  : int  NA 177600 NA 206500 59000 124200 350600 129400 79900 133100 ...
##  $ X1997.04  : int  NA 180100 NA 209200 58700 125300 351300 130400 78400 133500 ...
##  $ X1997.05  : int  NA 182300 NA 211100 58400 126100 352600 130400 77500 133300 ...
##  $ X1997.06  : int  NA 184400 NA 212600 58000 126700 357300 131200 76800 134100 ...
##  $ X1997.07  : int  NA 186300 NA 214400 57800 127900 363900 130600 77400 136100 ...
##  $ X1997.08  : int  NA 187600 NA 215600 57900 129300 370600 129100 78600 138700 ...
##  $ X1997.09  : int  NA 189400 NA 216500 57800 130400 374300 127600 79800 141300 ...
##  $ X1997.10  : int  NA 190300 NA 217900 57800 131300 375600 125800 80700 143100 ...
##  $ X1997.11  : int  NA 189700 NA 220100 58100 131700 381000 124500 81300 144100 ...
##  $ X1997.12  : int  NA 189800 NA 222200 58400 132300 389500 124600 81500 144800 ...
##  $ X1998.01  : int  NA 191900 NA 223900 58700 133500 395400 125100 81900 145300 ...
##  $ X1998.02  : int  NA 194500 NA 225400 59200 134500 400800 125500 82300 145800 ...
##  $ X1998.03  : int  NA 195500 NA 227700 59400 134800 409000 126000 82200 146100 ...
##  $ X1998.04  : int  NA 196000 NA 230100 58700 135200 414600 127200 81700 146500 ...
##  $ X1998.05  : int  NA 196900 NA 231700 58100 135500 418500 128700 81300 147500 ...
##  $ X1998.06  : int  NA 198900 NA 232700 57900 136300 424200 128300 81000 148500 ...
##  $ X1998.07  : int  NA 201400 NA 233700 57900 137700 430500 126300 80500 149200 ...
##  $ X1998.08  : int  NA 204600 NA 234700 57700 139600 437500 126000 80300 150000 ...
##  $ X1998.09  : int  NA 207900 NA 235600 57600 141600 448900 126500 80100 150500 ...
##  $ X1998.10  : int  NA 211800 NA 236800 57500 143400 460300 126300 79900 151100 ...
##  $ X1998.11  : int  NA 214600 NA 238800 57800 144500 461900 127400 80600 152000 ...
##  $ X1998.12  : int  NA 216000 NA 240800 58000 145600 461900 130600 82300 153700 ...
##  $ X1999.01  : int  NA 217500 NA 242400 58000 147400 466200 131600 83800 156300 ...
##  $ X1999.02  : int  NA 220200 NA 243800 57900 149200 470500 130400 84700 159200 ...
##  $ X1999.03  : int  NA 222800 NA 246400 57800 149900 470500 129900 85600 161000 ...
##  $ X1999.04  : int  NA 226200 NA 250200 57800 150500 473100 131900 86900 162300 ...
##  $ X1999.05  : int  NA 229600 NA 254300 58000 151700 480000 134600 88300 163500 ...
##  $ X1999.06  : int  NA 232400 NA 257600 58500 152800 487800 137400 88900 164700 ...
##  $ X1999.07  : int  NA 234400 NA 261100 58700 153900 496000 140600 89300 165700 ...
##  $ X1999.08  : int  NA 236300 NA 264800 59000 156400 506900 141800 91000 166800 ...
##  $ X1999.09  : int  NA 238300 NA 267900 59200 159400 516300 141200 93400 169600 ...
##  $ X1999.10  : int  NA 241800 NA 270700 59300 161800 522900 141800 95100 173700 ...
##  $ X1999.11  : int  NA 246100 NA 272800 59500 163800 533800 142800 95600 177200 ...
##  $ X1999.12  : int  NA 249500 NA 274400 59900 165700 549400 141700 95000 179100 ...
##  $ X2000.01  : int  NA 251300 NA 276200 60300 167100 564600 141200 95500 180900 ...
##  $ X2000.02  : int  NA 253200 NA 278600 60400 168400 577300 143000 98700 183600 ...
##  $ X2000.03  : int  NA 255700 NA 280100 60300 169600 592200 145600 101800 187200 ...
##  $ X2000.04  : int  NA 259200 NA 283100 60300 171000 610400 145500 102500 190700 ...
##  $ X2000.05  : int  NA 263100 NA 287700 60000 172400 626700 145400 102300 193500 ...
##  $ X2000.06  : int  NA 266600 NA 293600 59500 174800 640600 145000 103300 195700 ...
##  $ X2000.07  : int  NA 269500 NA 298500 59400 177900 654400 143000 103300 196500 ...
##  $ X2000.08  : int  NA 272800 NA 302700 59800 180400 663500 140800 101900 197800 ...
##  $ X2000.09  : int  NA 275500 NA 305000 59900 182300 664700 140000 101800 200200 ...
##  $ X2000.10  : int  NA 278800 NA 306800 59700 184600 670100 142200 102700 202700 ...
##  $ X2000.11  : int  NA 283400 NA 309400 59500 187700 679000 145400 102500 204800 ...
##  $ X2000.12  : int  NA 288600 NA 313100 59400 190700 676500 146400 103100 208000 ...
##  $ X2001.01  : int  NA 291300 NA 314900 59500 193100 663100 145900 105000 211800 ...
##  $ X2001.02  : int  NA 292400 NA 316200 59700 196100 656800 145800 105300 214400 ...
##  $ X2001.03  : int  NA 294600 NA 318200 59700 200000 658700 142400 104500 215700 ...
##  $ X2001.04  : int  NA 297100 NA 320600 59200 202900 661300 138200 104300 218100 ...
##  $ X2001.05  : int  NA 298200 NA 322900 58700 205500 661100 135200 103900 221100 ...
##  $ X2001.06  : int  NA 299800 NA 325500 58400 207600 658700 133600 103000 222100 ...
##  $ X2001.07  : int  NA 302000 NA 328400 57800 208600 653900 133500 103000 222100 ...
##  $ X2001.08  : int  NA 304200 NA 330700 57000 209200 646800 135400 103500 222900 ...
##  $ X2001.09  : int  NA 307900 NA 332800 56700 210000 641400 136800 103500 224500 ...
##  $ X2001.10  : int  NA 311000 NA 334400 56800 210200 636300 136500 103700 226700 ...
##  $ X2001.11  : int  NA 311400 NA 335900 56600 211300 630300 135300 104000 229700 ...
##  $ X2001.12  : int  NA 311000 NA 337400 56500 213000 628500 134600 104000 231500 ...
##  $ X2002.01  : int  NA 311700 NA 339700 56500 214100 633500 135400 104700 232200 ...
##  $ X2002.02  : int  NA 312300 NA 342300 56600 215200 637300 137700 106800 233100 ...
##  $ X2002.03  : int  NA 312000 NA 343800 56700 217600 640700 139500 109700 234300 ...
##  $ X2002.04  : int  NA 311800 NA 343400 56600 220200 647900 139900 112100 235100 ...
##  $ X2002.05  : int  NA 312600 NA 342300 56700 222400 655500 140600 114300 235500 ...
##  $ X2002.06  : int  NA 313000 NA 341800 57000 224600 662500 141600 116600 236400 ...
##  $ X2002.07  : int  NA 314400 NA 341700 57300 227000 670800 141200 118000 238100 ...
##  $ X2002.08  : int  NA 317300 NA 342400 57300 228600 677500 140600 117800 239300 ...
##  $ X2002.09  : int  NA 319700 NA 344300 57300 230100 687900 141900 117800 240400 ...
##  $ X2002.10  : int  NA 320500 NA 346900 57100 232400 698400 142700 118600 242100 ...
##  $ X2002.11  : int  NA 321000 NA 348900 56900 234300 701000 141700 119900 243700 ...
##  $ X2002.12  : int  NA 321600 NA 350200 56900 235300 698900 141700 121300 244900 ...
##  $ X2003.01  : int  NA 323800 NA 351700 57000 236200 697600 143800 121900 247000 ...
##  $ X2003.02  : int  NA 326100 NA 353500 56700 237000 691800 144100 121100 249500 ...
##  $ X2003.03  : int  NA 329000 NA 355700 56700 237900 684600 143000 120400 252100 ...
##  $ X2003.04  : int  NA 332200 NA 358000 57000 239200 681100 142200 120600 254300 ...
##  $ X2003.05  : int  NA 334700 NA 361600 57400 241300 685100 141400 121200 257200 ...
##  $ X2003.06  : int  NA 336000 NA 364000 57700 243600 692200 141000 121600 261300 ...
##  $ X2003.07  : int  NA 337300 NA 365500 58000 244200 697500 141400 122400 265400 ...
##  $ X2003.08  : int  NA 337500 NA 366400 58300 243800 704200 140800 123900 268400 ...
##  $ X2003.09  : int  NA 337100 NA 367000 58600 244500 710500 139400 125300 269600 ...
##  $ X2003.10  : int  NA 334900 NA 365200 58800 245500 717200 139400 126200 268700 ...
##  $ X2003.11  : int  NA 333100 NA 363100 59100 245700 732300 140200 127600 268100 ...
##   [list output truncated]

Check Missing values in Columns

#Check Missing values in each column
colSums(is.na(costPrice_Zillow))
##   RegionID RegionName       City      State      Metro CountyName 
##          0          0          0          0          0          0 
##   SizeRank   X1996.04   X1996.05   X1996.06   X1996.07   X1996.08 
##          0       2662       2582       2582       2577       2576 
##   X1996.09   X1996.10   X1996.11   X1996.12   X1997.01   X1997.02 
##       2576       2576       2566       2566       2542       2113 
##   X1997.03   X1997.04   X1997.05   X1997.06   X1997.07   X1997.08 
##       2093       2093       2093       2091       2091       1994 
##   X1997.09   X1997.10   X1997.11   X1997.12   X1998.01   X1998.02 
##       1991       1991       1988       1984       1967       1822 
##   X1998.03   X1998.04   X1998.05   X1998.06   X1998.07   X1998.08 
##       1821       1865       1973       1961       1776       1742 
##   X1998.09   X1998.10   X1998.11   X1998.12   X1999.01   X1999.02 
##       1742       1730       1709       1707       1706       1689 
##   X1999.03   X1999.04   X1999.05   X1999.06   X1999.07   X1999.08 
##       1688       1688       1685       1675       1675       1652 
##   X1999.09   X1999.10   X1999.11   X1999.12   X2000.01   X2000.02 
##       1652       1652       1652       1652       1652       1633 
##   X2000.03   X2000.04   X2000.05   X2000.06   X2000.07   X2000.08 
##       1633       1631       1631       1631       1626       1598 
##   X2000.09   X2000.10   X2000.11   X2000.12   X2001.01   X2001.02 
##       1598       1598       1598       1598       1679       1580 
##   X2001.03   X2001.04   X2001.05   X2001.06   X2001.07   X2001.08 
##       1553       1552       1552       1551       1551       1546 
##   X2001.09   X2001.10   X2001.11   X2001.12   X2002.01   X2002.02 
##       1546       1546       1545       1545       1545       1538 
##   X2002.03   X2002.04   X2002.05   X2002.06   X2002.07   X2002.08 
##       1538       1537       1537       1537       1537       1524 
##   X2002.09   X2002.10   X2002.11   X2002.12   X2003.01   X2003.02 
##       1522       1521       1521       1521       1520       1499 
##   X2003.03   X2003.04   X2003.05   X2003.06   X2003.07   X2003.08 
##       1498       1498       1498       1498       1498       1485 
##   X2003.09   X2003.10   X2003.11   X2003.12   X2004.01   X2004.02 
##       1484       1483       1482       1480       1479       1447 
##   X2004.03   X2004.04   X2004.05   X2004.06   X2004.07   X2004.08 
##       1428       1428       1428       1426       1426       1403 
##   X2004.09   X2004.10   X2004.11   X2004.12   X2005.01   X2005.02 
##       1398       1392       1389       1389       1388       1321 
##   X2005.03   X2005.04   X2005.05   X2005.06   X2005.07   X2005.08 
##       1305       1303       1303       1303       1303       1288 
##   X2005.09   X2005.10   X2005.11   X2005.12   X2006.01   X2006.02 
##       1288       1286       1285       1285       1284       1260 
##   X2006.03   X2006.04   X2006.05   X2006.06   X2006.07   X2006.08 
##       1260       1259       1259       1259       1259       1250 
##   X2006.09   X2006.10   X2006.11   X2006.12   X2007.01   X2007.02 
##       1249       1248       1248       1248       1247       1242 
##   X2007.03   X2007.04   X2007.05   X2007.06   X2007.07   X2007.08 
##       1240       1239       1239       1238       1238       1226 
##   X2007.09   X2007.10   X2007.11   X2007.12   X2008.01   X2008.02 
##       1226       1226       1226       1226       1226       1225 
##   X2008.03   X2008.04   X2008.05   X2008.06   X2008.07   X2008.08 
##       1225       1225       1225       1225       1225       1194 
##   X2008.09   X2008.10   X2008.11   X2008.12   X2009.01   X2009.02 
##       1194       1194       1194       1194       1193       1190 
##   X2009.03   X2009.04   X2009.05   X2009.06   X2009.07   X2009.08 
##       1184       1178       1178       1146       1146       1095 
##   X2009.09   X2009.10   X2009.11   X2009.12   X2010.01   X2010.02 
##       1095       1095       1094       1091       1091       1082 
##   X2010.03   X2010.04   X2010.05   X2010.06   X2010.07   X2010.08 
##        971        959        944        923        903        177 
##   X2010.09   X2010.10   X2010.11   X2010.12   X2011.01   X2011.02 
##        177        174        174        174        163        151 
##   X2011.03   X2011.04   X2011.05   X2011.06   X2011.07   X2011.08 
##        149        149        149        147        134        121 
##   X2011.09   X2011.10   X2011.11   X2011.12   X2012.01   X2012.02 
##        120        120        120        118        108         93 
##   X2012.03   X2012.04   X2012.05   X2012.06   X2012.07   X2012.08 
##         91         91         91         87         84         67 
##   X2012.09   X2012.10   X2012.11   X2012.12   X2013.01   X2013.02 
##         66         65         65         65         64         31 
##   X2013.03   X2013.04   X2013.05   X2013.06   X2013.07   X2013.08 
##         26         21         21         20         20          0 
##   X2013.09   X2013.10   X2013.11   X2013.12   X2014.01   X2014.02 
##          0          0          0          0          0          0 
##   X2014.03   X2014.04   X2014.05   X2014.06   X2014.07   X2014.08 
##          0          0          0          0          0          0 
##   X2014.09   X2014.10   X2014.11   X2014.12   X2015.01   X2015.02 
##          0          0          0          0          0          0 
##   X2015.03   X2015.04   X2015.05   X2015.06   X2015.07   X2015.08 
##          0          0          0          0          0          0 
##   X2015.09   X2015.10   X2015.11   X2015.12   X2016.01   X2016.02 
##          0          0          0          0          0         18 
##   X2016.03   X2016.04   X2016.05   X2016.06   X2016.07   X2016.08 
##         18         18         18          0          0          0 
##   X2016.09   X2016.10   X2016.11   X2016.12   X2017.01   X2017.02 
##          0          0          0          3          0          0 
##   X2017.03   X2017.04   X2017.05   X2017.06 
##          0          0          0          0

Observations

  • The dimension of Zillow Cost data is 8946, 262.
  • Data only available untill July 2017.Hence, forecast of Cost price for Jan 2020 will bring accuracy to the analysis.
  • Table structure: data is horizontal form, which is an indicator or messy format data. Converting this to long-form will be helpful but the horizontal form will be helpful in forecast function.
  • Check column names for redundancy: Many columns are redundant such as RegionID, State, Metro, CountyName.
  • Missing Values: Cost price is missing until a certain timeline, this can be the time from when this zipcode was included in the zillow database.

Below methodology is used to scrub the Zillow data

Scrubbing Methodology

  • Check for NA Values in columns
  • Current price Jan 2020 for each zipcode, hence we decide to use the horizontal format of data for ease of working in Time series pattern.
  • Filter the Zillow file for City=Newyork but the names seem to have many levels for each city. Hence it will better to Inner Join the listings file for each zipcode to Zillow file. JOining is handled post scrubbing phase.
  • Zillow file only has 2 bedroom listings.
  • Remove ‘X’ from numeric price column names.
  • Column name ‘RegionName’ is changed ‘Zipcode’

Below is the scrubbing function that is used for Zillow cost data

#Scrubbing & Forecast function

zillow_scrub <- function(ZillowData,cityName){ 
  
  ZillowData$City<-str_trim(ZillowData$City)
  Zlow_cty<-ZillowData%>% filter(str_detect(tolower(ZillowData$City), cityName))
  Zlow_cty <- Zlow_cty[,c(2,3,7:ncol(Zlow_cty))]#RegionName  City SizeRank
  
  colnames(Zlow_cty)<-str_replace(colnames(Zlow_cty),"X","")
  Zlow_cty$currentPrice <- 0 
 
  for(i in 1:nrow(Zlow_cty)){
    start_pt<-sum(is.na(Zlow_cty[i,-(1:3)])) +1+3 #start point in full dataset
    end_pt<-ncol(Zlow_cty)-1#end point in full dataset - Current price
    n<-end_pt-start_pt+1
    
    strdt<-colnames(Zlow_cty[i,c(start_pt:end_pt)])[1]
    yr<-as.numeric(substr(strdt,0,4))
    mn<-as.numeric(substr(strdt,6,7))
    
    #Convert monthly cost data into time series data 
    tmp =  ts(as.vector(t(Zlow_cty[,c(start_pt:end_pt)])[,i]),start = c(yr,mn),frequency = 12) # Convert the      
    # Use forecast package auto ARIMA function for prediction
    fit<-auto.arima(tmp)
    #########################################################################
    predval<-data.frame(forecast(fit,h=31))["Jan 2020","Point.Forecast"]
    Zlow_cty$currentPrice[i] <- predval 

  }
Zlow_cty<-rename(Zlow_cty,zipcode=RegionName)
Zlow_cty<-Zlow_cty[,c('zipcode','currentPrice')]
return(Zlow_cty) # return the filtered data containing only relevant columns
}

cityName<-"new york"
#Function call
zlow_scbd<-zillow_scrub(costPrice_Zillow,cityName)

dimPostScrubZlw<-dim(zlow_scbd)# Dimensions of Zillow data post scrubbing
dimPostScrubZlw
## [1] 26  2

Dimensions of Zillow data post scrubbing: 26, 2

Zillow data post scrubbing

kable(head(zlow_scbd))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
zipcode currentPrice
10025 1717145
10023 3009982
10128 3093119
10011 2897002
10003 2336587
11201 1628300

Please note only 6 rows have been shown above with head function We are only interested in Cost price by Zipcode, hence we are pulling only relevant columns

Result

  • Current price for current month Jan 2020 is predicted.

3.2 Airbnb Revenue data: Listing

kable(head(rentAirbnbListings))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview notes transit access interaction house_rules thumbnail_url medium_url picture_url xl_picture_url host_id host_url host_name host_since host_location host_about host_response_time host_response_rate host_acceptance_rate host_is_superhost host_thumbnail_url host_picture_url host_neighbourhood host_listings_count host_total_listings_count host_verifications host_has_profile_pic host_identity_verified street neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed city state zipcode market smart_location country_code country latitude longitude is_location_exact property_type room_type accommodates bathrooms bedrooms beds bed_type amenities square_feet price weekly_price monthly_price security_deposit cleaning_fee guests_included extra_people minimum_nights maximum_nights minimum_minimum_nights maximum_minimum_nights minimum_maximum_nights maximum_maximum_nights minimum_nights_avg_ntm maximum_nights_avg_ntm calendar_updated has_availability availability_30 availability_60 availability_90 availability_365 calendar_last_scraped number_of_reviews number_of_reviews_ltm first_review last_review review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value requires_license license jurisdiction_names instant_bookable is_business_travel_ready cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count calculated_host_listings_count_entire_homes calculated_host_listings_count_private_rooms calculated_host_listings_count_shared_rooms reviews_per_month
2539 https://www.airbnb.com/rooms/2539 2.019071e+13 2019-07-09 Clean & quiet apt home by the park Renovated apt home in elevator building. Spacious, renovated, and clean apt home, one block to F train, 25 minutes to lower Manhatten Renovated apt home in elevator building. Spacious, renovated, and clean apt home, one block to F train, 25 minutes to lower Manhatten Close to Prospect Park and Historic Ditmas Park Very close to F and G trains and Express bus into NY. The B and Q are closeby also. If this room is unavailable on your desired dates, check out our other rooms, such as: https://www.airbnb.com/rooms/10267242 none Close to Prospect Park and Historic Ditmas Park If this room is unavailable on your desired dates, check out our other rooms, such as: https://www.airbnb.com/rooms/10267242 Very close to F and G trains and Express bus into NY. The B and Q are closeby also. -The security and comfort of all our guests is important to us! Therefore, no one is permitted to check in without first emailing a clear government issued photo ID, acceptable to manager. Instructions will be provided in the house manual. -No eating, drinking or storage of food in the room. -No smoking. -Illicit drug use is strictly forbidden. -Quiet hours after 10pm and before 8am, This is respectful for our neighbors and other guests. -Please clean up after yourself when using the kitchen and bath. -Please lock the doors and close the windows when exiting the home. -Please indicate and pay for the correct number of guests. Failure to do so will cancel your reservation with no refund due. -Please remove your shoes when entering the apt home. Thanks for choosing our home! NA NA https://a0.muscache.com/im/pictures/3949d073-a02e-4ebc-aa9c-ac74f00eaa1f.jpg?aki_policy=large NA 2787 https://www.airbnb.com/users/show/2787 John 2008-09-07 New York, New York, United States Educated professional living in Brooklyn. I love meeting new people, running, hiking, fine foods, traveling, etc. One of my favorite trips was spending New Year’s Eve in London on the Thames River. Big Ben, spectacular fireworks and light show; and fun times with a good crowd of international tourists. A most memorable night and trip! Also, I generally approach life with a positive attitude. I look forward to meeting you. within an hour 100% N/A f https://a0.muscache.com/im/pictures/8674565a-758d-476b-a580-4d99ea9baab9.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/8674565a-758d-476b-a580-4d99ea9baab9.jpg?aki_policy=profile_x_medium Gravesend 6 6 [‘email’, ‘phone’, ‘reviews’, ‘kba’] t t Brooklyn , NY, United States Brooklyn Kensington Brooklyn Brooklyn NY 11218 New York Brooklyn , NY US United States 40.64749 -73.97237 f Apartment Private room 2 1 1 1 Real Bed {TV,“Cable TV”,Internet,Wifi,“Wheelchair accessible”,Kitchen,“Free parking on premises”,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,“Suitable for events”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,“Safety card”,“Fire extinguisher”,Essentials,Shampoo,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”,“Self check-in”,Keypad,“Outlet covers”,“Hot water”,“Bed linens”,“Extra pillows and blankets”,Microwave,“Coffee maker”,Refrigerator,“Dishes and silverware”,“Cooking basics”,Oven,Stove,“Luggage dropoff allowed”,“Long term stays allowed”,“Cleaning before checkout”} NA $149.00 $299.00 $999.00 $100.00 $25.00 1 $35.00 1 730 1 1 730 730 1 730 3 weeks ago t 30 60 90 365 2019-07-09 9 2 2015-12-04 2018-10-19 98 10 10 10 10 10 10 f f f moderate f f 6 0 5 1 0.21
2595 https://www.airbnb.com/rooms/2595 2.019071e+13 2019-07-09 Skylit Midtown Castle Find your romantic getaway to this beautiful, spacious skylit studio in the heart of Midtown, Manhattan. STUNNING SKYLIT STUDIO / 1 BED + SINGLE / FULL BATH / FULL KITCHEN / FIREPLACE / CENTRALLY LOCATED / WiFi + APPLE TV / SHEETS + TOWELS
  • Spacious (500+ft²), immaculate and nicely furnished & designed studio. - Tuck yourself into the ultra comfortable bed under the skylight. Fall in love with a myriad of bright lights in the city night sky. - Single-sized bed/convertible floor mattress with luxury bedding (available upon request). - Gorgeous pyramid skylight with amazing diffused natural light, stunning architectural details, soaring high vaulted ceilings, exposed brick, wood burning fireplace, floor seating area with natural zafu cushions, modern style mixed with eclectic art & antique treasures, large full bath, newly renovated kitchen, air conditioning/heat, high speed WiFi Internet, and Apple TV. - Centrally located in the heart of Midtown Manhattan just a few blocks from all subway connections in the very desirable Midtown location a few minutes walk to Times Square, the Theater District, Bryant Park and Herald Square. - The Midtown Castle is a uniquely charming Dutch Colonial survivor from the 1890s. - This is
Find your romantic getaway to this beautiful, spacious skylit studio in the heart of Midtown, Manhattan. STUNNING SKYLIT STUDIO / 1 BED + SINGLE / FULL BATH / FULL KITCHEN / FIREPLACE / CENTRALLY LOCATED / WiFi + APPLE TV / SHEETS + TOWELS - Spacious (500+ft²), immaculate and nicely furnished & designed studio. - Tuck yourself into the ultra comfortable bed under the skylight. Fall in love with a myriad of bright lights in the city night sky. - Single-sized bed/convertible floor mattress with luxury bedding (available upon request). - Gorgeous pyramid skylight with amazing diffused natural light, stunning architectural details, soaring high vaulted ceilings, exposed brick, wood burning fireplace, floor seating area with natural zafu cushions, modern style mixed with eclectic art & antique treasures, large full bath, newly renovated kitchen, air conditioning/heat, high speed WiFi Internet, and Apple TV. - Centrally located in the heart of Midtown Manhattan just a few blocks from all s none Centrally located in the heart of Manhattan just a few blocks from all subway connections in the very desirable Midtown location a few minutes walk to Times Square, the Theater District, Bryant Park and Herald Square. Apartment is located on 37th Street between 5th & 6th Avenue, just a few blocks from all subway connections. Closest Subways (in order of proximity to apartment (Website hidden by Airbnb) W: 34th Street & 6th Avenu (Website hidden by Airbnb) 3: 34th Street & 7th Avenue 7: 42nd & 5th Avenu (Website hidden by Airbnb) S: 42nd Street between Park & Lexington Avenue (Website hidden by Airbnb) E: 34th Street and 8th Avenue If coming by car, there is a parking garage on the block and free street parking. Guests have full access to the kitchen, bathroom and living spaces. The closets are private/off limits. I am a Sound Therapy Practitioner and Kundalini Yoga & Meditation teacher. I work with energy and sound for relaxation and healing, using Symphonic gong, singing bowls, tuning forks, drums, voice and other instruments. Sound relaxation sessions and/or personalized Kundalini Yoga sessions are available in the space upon request. Individual, couples or group sessions available. Licensed acupuncture and massage also available upon request. Please inquire. I welcome my guests at the apartment for check-in, or alternatively, a self check-in can be arranged. Once you are settled in, I am just a phone call, text or email away, should you have any questions, concerns or issues during your stay. My desire is that you have a smooth arrival and amazing stay here. Make yourself at home, respect the space and the neighbors. No pets, no smoking and no unauthorized guests. NA NA https://a0.muscache.com/im/pictures/f0813a11-40b2-489e-8217-89a2e1637830.jpg?aki_policy=large NA 2845 https://www.airbnb.com/users/show/2845 Jennifer 2008-09-09 New York, New York, United States A New Yorker since 2000! My passion is creating beautiful, unique spaces where unforgettable memories are made. It’s my pleasure to host people from around the world and meet new faces. Welcome travelers! I am a Sound Therapy Practitioner and Kundalini Yoga & Meditation teacher. I work with energy and sound for relaxation and healing, using Symphonic gong, singing bowls, tuning forks, drums, voice and other instruments. within a few hours 87% N/A f https://a0.muscache.com/im/users/2845/profile_pic/1259095067/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/2845/profile_pic/1259095067/original.jpg?aki_policy=profile_x_medium Midtown 5 5 [‘email’, ‘phone’, ‘reviews’, ‘kba’, ‘work_email’] t t New York, NY, United States Manhattan Midtown Manhattan New York NY 10018 New York New York, NY US United States 40.75362 -73.98377 f Apartment Entire home/apt 2 1 0 1 Real Bed {TV,Wifi,“Air conditioning”,Kitchen,“Paid parking off premises”,“Free street parking”,“Indoor fireplace”,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self check-in”,Keypad,“Private living room”,Bathtub,“Hot water”,“Bed linens”,“Extra pillows and blankets”,“Ethernet connection”,“Coffee maker”,Refrigerator,“Dishes and silverware”,“Cooking basics”,Oven,Stove,“Luggage dropoff allowed”,“Long term stays allowed”,“Cleaning before checkout”,“Wide entrance for guests”,“Flat path to guest entrance”,“Well-lit path to entrance”,“No stairs or steps to enter”} NA $225.00 $1,995.00 $350.00 $100.00 2 $0.00 1 1125 1 1 1125 1125 1 1125 4 days ago t 25 55 80 355 2019-07-09 45 11 2009-11-21 2019-05-21 95 10 9 10 10 10 9 f f f strict_14_with_grace_period t t 2 1 0 1 0.38
3647 https://www.airbnb.com/rooms/3647 2.019071e+13 2019-07-08 THE VILLAGE OF HARLEM….NEW YORK ! WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY This Spacious 1 bedroom is with Plenty of Windows with a View……. Sleeps…..Four Adults…..two in the Livingrm. with (2) Sofa-beds. (Website hidden by Airbnb) two in the Bedrm.on a very Comfortable Queen Size Bed… A Complete Bathrm…..With Shower and Bathtub……. Fully Equipped with Linens & Towels…….. Spacious Living Room……Flat ScreenTelevision…..DVD Player with Movies available for your viewing during your stay………………………………………………………………….. Dining Area…..for Morning Coffee or Tea…………………………………………….. The Kitchen Area is Modern with Granite Counter Top… includes the use of a Coffee Maker…Microwave to Heat up a Carry Out/In Meal…. Not suited for a Gourmet Cook…or Top Chef……Sorry!!!! . This Flat is located in HISTORIC HARLEM…. near the Appollo Theater and The Museum Mile…on Fifth Avenue. Sylvia’s World Famous Resturant…loca WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY This Spacious 1 bedroom is with Plenty of Windows with a View……. Sleeps…..Four Adults…..two in the Livingrm. with (2) Sofa-beds. (Website hidden by Airbnb) two in the Bedrm.on a very Comfortable Queen Size Bed… A Complete Bathrm…..With Shower and Bathtub……. Fully Equipped with Linens & Towels…….. Spacious Living Room……Flat ScreenTelevision…..DVD Player with Movies available for your viewing during your stay………………………………………………………………….. Dining Area…..for Morning Coffee or Tea…………………………………………….. The Kitchen Area is Modern with Granite Counter Top… includes the use of a Coffee Maker…Microwave to Heat up a Carry Out/In Meal…. Not suited for a Gourmet Cook…or Top Chef……Sorry!!!! . This Flat is located in HISTORIC HARLEM…. near the Appollo Theater and The Museum Mile…on Fifth Avenue. Sylvia’s World Famous Resturant…loca none Upon arrival please have a legibile copy of your Passport and / or State Photo. ID. as well as your confirmation letter. Please NO SMOKING …LOUD TALKING or PARTIES of any kind. Security Deposit and Cleaning Fees in CASH at time of arrival. Security deposit will be refunded within 72hrs pending no damages. .Cleaning fees are non-refundable. At Check Out… Please dispose of all trash/garbage in the bins located outside behind the entrance stairs. Please place all dirty linens and towels in the dirty laundry bin. Please don’t leave any dishes in the sink and dispose of all Plastic Dishes/Plastic Culinary. If you need a late check-out please contact the Emergency Contact Telephone Numbers that are listed in the Flat. PLEASE LEAVE ALL KEYS IN THE FLAT and BE CERTAIN THE DOORS ARE LOCKED. WE HOPE YOU ENJOYED YOUR STAY and will write positve comments and will visit again. NA NA https://a0.muscache.com/im/pictures/838341/9b3c66f3_original.jpg?aki_policy=large NA 4632 https://www.airbnb.com/users/show/4632 Elisabeth 2008-11-25 New York, New York, United States Make Up Artist National/ (Website hidden by Airbnb) Production. I m curently working with a Production Company in WDC and Coordinated a “Day Of Service” for the Presidential Innaugration 2013. I can’t live without Starbucks and Oprah’s Chai Tea Latte…and my circle of of great friends world wide. “BLESS ME INTO USEFULLNESS” is my daily prayer. I within a day 100% N/A f https://a0.muscache.com/im/users/4632/profile_pic/1328402497/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/4632/profile_pic/1328402497/original.jpg?aki_policy=profile_x_medium Harlem 1 1 [‘email’, ‘phone’, ‘google’, ‘reviews’, ‘jumio’, ‘government_id’] t t New York, NY, United States Harlem Harlem Manhattan New York NY 10027 New York New York, NY US United States 40.80902 -73.94190 t Apartment Private room 2 1 1 1 Pull-out Sofa {“Cable TV”,Internet,Wifi,“Air conditioning”,Kitchen,“Buzzer/wireless intercom”,Heating,“Smoke detector”,“Carbon monoxide detector”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”} NA $150.00 $200.00 $75.00 2 $20.00 3 7 3 3 7 7 3 7 34 months ago t 30 60 90 365 2019-07-08 0 0 NA NA NA NA NA NA NA f f f strict_14_with_grace_period t t 1 0 1 0 NA
3831 https://www.airbnb.com/rooms/3831 2.019071e+13 2019-07-09 Cozy Entire Floor of Brownstone Urban retreat: enjoy 500 s.f. floor in 1899 brownstone, with wood and ceramic flooring throughout (completed Aug. 2015 through Sept. 2015), roomy bdrm, & upgraded kitchen & bathroom (completed Oct. 2015). It’s sunny and loaded with everything you need! Greetings! We own a double-duplex brownstone in Clinton Hill on Gates near Classon Avenue - (7 blocks to C train, 5 blocks to G train, minutes to all), in which we host on the entire top floor of the upper duplex. This is more of an efficiency set-up: it is the top floor on a two-family, double duplex brownstone. The top floor for our guests consists of a sizable bedroom, full bath and eat-in kitchen for your exclusive use. Our family occupies the floors below. You go through a common hallway and staircase, to get to the top floor (2 easy flights up from the main entrance), but not through any rooms, so it is a fairly private set-up. - Clinton Hill, Gates Avenue near Classon Ave. (1 mi. or less to Williamsburg, Park Slope, Prospect Heights, downtown, Ft. Greene, Bed-Stuy, Bushwick; 20 mins to Manhattan) - includes FiOS, heat (or A/C), hot water, and electricity all included - furnished with two twin beds (convertible into a king bed), one rollaway twin bed and one inflatable Urban retreat: enjoy 500 s.f. floor in 1899 brownstone, with wood and ceramic flooring throughout (completed Aug. 2015 through Sept. 2015), roomy bdrm, & upgraded kitchen & bathroom (completed Oct. 2015). It’s sunny and loaded with everything you need! Greetings! We own a double-duplex brownstone in Clinton Hill on Gates near Classon Avenue - (7 blocks to C train, 5 blocks to G train, minutes to all), in which we host on the entire top floor of the upper duplex. This is more of an efficiency set-up: it is the top floor on a two-family, double duplex brownstone. The top floor for our guests consists of a sizable bedroom, full bath and eat-in kitchen for your exclusive use. Our family occupies the floors below. You go through a common hallway and staircase, to get to the top floor (2 easy flights up from the main entrance), but not through any rooms, so it is a fairly private set-up. - Clinton Hill, Gates Avenue near Classon Ave. (1 mi. or less to Williamsburg, Park Slope, Pro none Just the right mix of urban center and local neighborhood; close to all but enough quiet for a calming walk. B52 bus for a 10-minute ride to downtown Brooklyn is a few yards away on the corner; G train/Classon Avenue is 5 blocks away; C train is about 6 blocks to either the Clinton/Washington stop or Franklin Avenue stop. There is on-street parking, alternate side is twice per week on the immediate block but only once per week on Classon. From LaGuardia Airport, a taxi will cost $30-$35, but there is also a bus that will put you at the Jackson Heights subway station, and from there it’s about 5 stops to catch the G train, which stops 5 blocks away. From JFK, the taxi is closer to $40, but the AirTran can get you conveniently to the A/C line and the C train is about 6 blocks from here. From JFK via subway/metro/train: From JFK take the AirTrain to Howard Beach to catch the A train toward Brooklyn/Manhattan. Take the A train to Utica Avenue and go across that same platform to catch the C local train (you could also transfer at Nostrand but you would have to carry luggage downstairs to cat You will have exclusive use of and access to: a sizable private room as described in “The Space” section, furnished with two twin beds (which we will combine into one king bed upon request) and optional rollaway twin and/or inflatable beds, and other small furnishings; full private bath and private eat-in kitchen both renovated in Fall 2015; sizable dining table in sun-filled kitchen area doubles as a great desk space; alcove perfect for vertical bike storage. Upon request you may also have some use of the livingroom on the floor just below. We’ll be around, but since you have the top floor to yourself, most of the interaction is on the way in or out - we’re open to socializing and did so frequently with our last long-term guests, so it’s really up to you Smoking - outside please; pets allowed but please contact me first for arrangements NA NA https://a0.muscache.com/im/pictures/e49999c2-9fd5-4ad5-b7cc-224deac989aa.jpg?aki_policy=large NA 4869 https://www.airbnb.com/users/show/4869 LisaRoxanne 2008-12-07 New York, New York, United States Laid-back bi-coastal actor/professor/attorney. within a few hours 93% N/A f https://a0.muscache.com/im/users/4869/profile_pic/1371927771/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/4869/profile_pic/1371927771/original.jpg?aki_policy=profile_x_medium Clinton Hill 1 1 [‘email’, ‘phone’, ‘reviews’, ‘kba’] t t Brooklyn, NY, United States Brooklyn Clinton Hill Brooklyn Brooklyn NY 11238 New York Brooklyn, NY US United States 40.68514 -73.95976 t Guest suite Entire home/apt 3 1 1 4 Real Bed {TV,“Cable TV”,Internet,Wifi,“Air conditioning”,Kitchen,“Pets allowed”,“Free street parking”,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self check-in”,Lockbox,Bathtub,“High chair”,“Stair gates”,“Children’s books and toys”,“Pack ’n Play/travel crib”,“Hot water”,“Luggage dropoff allowed”,“Long term stays allowed”} 500 $89.00 $575.00 $2,100.00 $500.00 1 $0.00 1 730 1 1 730 730 1 730 today t 0 0 3 194 2019-07-09 270 69 2014-09-30 2019-07-05 90 10 9 10 10 10 9 f f f moderate f f 1 1 0 0 4.64
5022 https://www.airbnb.com/rooms/5022 2.019071e+13 2019-07-08 Entire Apt: Spacious Studio/Loft by central park Loft apartment with high ceiling and wood flooring located 10 minutes away from Central Park in Harlem - 1 block away from 6 train and 3 blocks from 2 & 3 line. This is in a recently renovated building which includes elevator, trash shoot. marble entrance and laundromat in the basement. The apartment is a spacious loft studio. The seating area and sleeping area is divided by a bookcase. There is a long hallway entrance where the bathroom and closet for your clothes is situated. The apartment is in mint condition, the walls have been freshly painted a few months ago. Supermarket, and 24 hour convenience store less than 1 block away. 1 block away from Hot Yoga Studio and NY Sports club facility. Perfect for anyone wanting to stay in Manhattan but get more space. 10 minutes away from midtown and 15 minutes away from downtown. The neighborhood is lively and diverse. You will need to travel at least 10 blocks to find cafe’s, restaurants etc.. There are a few restaurants on 100 street on Loft apartment with high ceiling and wood flooring located 10 minutes away from Central Park in Harlem - 1 block away from 6 train and 3 blocks from 2 & 3 line. This is in a recently renovated building which includes elevator, trash shoot. marble entrance and laundromat in the basement. The apartment is a spacious loft studio. The seating area and sleeping area is divided by a bookcase. There is a long hallway entrance where the bathroom and closet for your clothes is situated. The apartment is in mint condition, the walls have been freshly painted a few months ago. Supermarket, and 24 hour convenience store less than 1 block away. 1 block away from Hot Yoga Studio and NY Sports club facility. Perfect for anyone wanting to stay in Manhattan but get more space. 10 minutes away from midtown and 15 minutes away from downtown. The neighborhood is lively and diverse. You will need to travel at least 10 blocks to find cafe’s, restaurants etc.. There are a few restaurants on 100 street on none Please be considerate when staying in the apartment. This is a low key building and it’s important guest are respectful. You can come and go as you please I just ask that you keep a low profile. 1) Please be respectful of neighbors - no loud music after 10pm and keep a low profile 2) Do not open the door for anyone 3) Please keep the apt clean 4) No access to mailbox - please forward personal mail to job or school NA NA https://a0.muscache.com/im/pictures/feb453bd-fdec-405c-8bfa-3f6963d827e9.jpg?aki_policy=large NA 7192 https://www.airbnb.com/users/show/7192 Laura 2009-01-29 Miami, Florida, United States I have been a NYer for almost 10 years. I came to NY to study and never left. I work in the advertising industry and love to eat peanut butter & jelly sandwiches. N/A N/A N/A f https://a0.muscache.com/im/users/7192/profile_pic/1325651676/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/7192/profile_pic/1325651676/original.jpg?aki_policy=profile_x_medium East Harlem 1 1 [‘email’, ‘phone’, ‘facebook’, ‘reviews’, ‘kba’] t t New York, NY, United States East Harlem East Harlem Manhattan New York NY 10029 New York New York, NY US United States 40.79851 -73.94399 t Apartment Entire home/apt 1 1 NA 1 Real Bed {Internet,Wifi,“Air conditioning”,Kitchen,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,Essentials,Shampoo,“Hair dryer”,“Hot water”,“Host greets you”} NA $80.00 $600.00 $1,600.00 $100.00 $80.00 1 $20.00 10 120 10 10 120 120 10 120 3 months ago t 0 0 0 0 2019-07-08 9 4 2012-03-20 2018-11-19 93 10 9 10 10 9 10 f f f strict_14_with_grace_period t t 1 1 0 0 0.10
5099 https://www.airbnb.com/rooms/5099 2.019071e+13 2019-07-08 Large Cozy 1 BR Apartment In Midtown East My large 1 bedroom apartment is true New York City living. The apt is in midtown on the east side and centrally located, just a 10-minute walk from Grand Central Station, Empire State Building, Times Square. The kitchen and living room are large and bright with Apple TV. I have a new Queen Bed that sleeps 2 people, and a Queen Aero Bed that can sleep 2 people in the living room. The apartment is located on the 5th floor of a walk up - no elevator (lift). I have a large 1 bedroom apartment centrally located in Midtown East. A 10 minute walk from Grand Central Station, Times Square, Empire State Building and all major subway and bus lines. The apartment is located on the 5th floor of a pre-war walk up building-no elevator/lift. The apartment is bright with has high ceilings and flow through rooms. A spacious, cozy living room with Netflix and Apple TV. A large bright kitchen to sit and enjoy coffee or tea. The bedroom is spacious with a comfortable queen size bed that sleeps 2. I have a comfortable queen size aero bed that fits in the living room and sleeps 2. It can be tucked away for living space and opened when ready for bed. I’d be happy to give you tips and advice on the best ways to experience the most of NYC. The apartment’s location is great for sightseeing. ** Check out my listing guidebook ** If you would like to stay local in the area, there is a very long & famous strip of bars and restaurants along 3rd Avenue, which My large 1 bedroom apartment is true New York City living. The apt is in midtown on the east side and centrally located, just a 10-minute walk from Grand Central Station, Empire State Building, Times Square. The kitchen and living room are large and bright with Apple TV. I have a new Queen Bed that sleeps 2 people, and a Queen Aero Bed that can sleep 2 people in the living room. The apartment is located on the 5th floor of a walk up - no elevator (lift). I have a large 1 bedroom apartment centrally located in Midtown East. A 10 minute walk from Grand Central Station, Times Square, Empire State Building and all major subway and bus lines. The apartment is located on the 5th floor of a pre-war walk up building-no elevator/lift. The apartment is bright with has high ceilings and flow through rooms. A spacious, cozy living room with Netflix and Apple TV. A large bright kitchen to sit and enjoy coffee or tea. The bedroom is spacious with a comfortable queen size bed that sleeps 2. I none My neighborhood in Midtown East is called Murray Hill. The area is very centrally located with easy access to explore . The apartment is about 5 blocks (7 minute walk) to the United Nations and Grand Central Station the main and most historic train station. Grand Central will give you access to every train in the city. The apartment is also very close to main attractions, It’s about a 10 minute walk to both the Empire State Building and Times Square. There’s a great shopping area with dozens of stores including H&M, Zara, The Gap, BeBe and the world famous Macy’s department store. These shops are a 10 minute walk up East 34th Street from 5th avenue and 8th avenue. If you would like to stay local in the area, there is a very long & famous strip of bars and restaurants along 3rd avenue, which is just around the corner from the apartment. It’s commonly known as the 3rd avenue strip. Read My Full Listing For All Information. New York City really is the city that doesn’t sleep. There’s a constant flow of people, bikes and cars. The city can be noisy at times, if you’re a light sleeper, ear plugs would help. Check out my local guide book for things to do. From the apartment is a 10 minute walk to Grand Central Station on East 42nd Street, a 10 minute walk to the Empire State Building on East 34th Street and 5th Avenue, a 10 minute walk to Times Square on West 42 Street and about 20 minutes walk to Central Park on 59th Street. Depending on how long you are staying, I would recommend a 7 day unlimited metro card. This allows you to travel unlimited all day and night on any train or bus in and outside of the city. Grand Central Station is the main NYC train station. You can find any train connection and get anywhere in Manhattan from Grand Central Station. You can get to Brooklyn, Queens, The Bronx and Staten Island from Grand Central Station The M15 bus is around the corner on 2nd avenue. This bus will take you from uptown Harlem to the East Village and South Street Seaport. It will also take you to the Staten Island ferry and Statue of Liberty and everywhere in between along the east side. The M101 bus is just up the street on 3rd aven I will meet you upon arrival. I usually check in with guests via text or email. I’m available by text, email or phone call with any questions, suggestions or to help out. • Check-in time is 2PM. • Check-out time is 12 PM. Please be respectful of the space and leave the apartment in the condition you were welcomed into. NA NA https://a0.muscache.com/im/pictures/0790b1a5-8981-41cc-a370-fa2b982a8803.jpg?aki_policy=large NA 7322 https://www.airbnb.com/users/show/7322 Chris 2009-02-02 New York, New York, United States I’m an artist, writer, traveler, and a native new yorker. Welcome to my city. within an hour 100% N/A f https://a0.muscache.com/im/pictures/user/26745d24-d818-4bf5-8f9e-26b097121ba7.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/user/26745d24-d818-4bf5-8f9e-26b097121ba7.jpg?aki_policy=profile_x_medium Flatiron District 1 1 [‘email’, ‘phone’, ‘reviews’, ‘jumio’, ‘government_id’] t f New York, NY, United States Midtown East Murray Hill Manhattan New York NY 10016 New York New York, NY US United States 40.74767 -73.97500 f Apartment Entire home/apt 2 1 1 1 Real Bed {TV,“Cable TV”,Internet,Wifi,Kitchen,“Buzzer/wireless intercom”,Heating,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”,“Hot water”,“Bed linens”,“Extra pillows and blankets”,“Host greets you”} NA $200.00 $300.00 $125.00 2 $100.00 3 21 3 3 21 21 3 21 3 weeks ago t 23 48 48 129 2019-07-08 74 9 2009-04-20 2019-06-22 89 10 9 10 10 9 9 f f f strict_14_with_grace_period t t 1 1 0 0 0.59

Please note only 6 rows have been shown below with head function

Dimensions of Listing file

#Check dimensions
dim(rentAirbnbListings)
## [1] 48895   106

Structure of Listing file

#Check the structure
str(rentAirbnbListings)
## 'data.frame':    48895 obs. of  106 variables:
##  $ id                                          : int  2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
##  $ listing_url                                 : Factor w/ 48895 levels "https://www.airbnb.com/rooms/10000070",..: 21404 21961 39206 39474 41416 41514 41543 41592 41625 41660 ...
##  $ scrape_id                                   : num  2.02e+13 2.02e+13 2.02e+13 2.02e+13 2.02e+13 ...
##  $ last_scraped                                : Factor w/ 2 levels "2019-07-08","2019-07-09": 2 2 1 2 1 1 2 1 1 1 ...
##  $ name                                        : Factor w/ 47906 levels "","'Fan'tastic",..: 12661 38172 45171 15702 19366 25001 8337 25048 15597 17682 ...
##  $ summary                                     : Factor w/ 43807 levels "","----This a listing for our photo studio.  The space does not offer overnight stays---- With an abundance of Sou"| __truncated__,..: 29689 13422 1 40376 1 22465 1 27616 26184 19463 ...
##  $ space                                       : Factor w/ 32171 levels "","---------------------------------------------------------------------------------------------------------------"| __truncated__,..: 18395 201 30500 8115 12328 9504 8648 31798 18480 16185 ...
##  $ description                                 : Factor w/ 46250 levels "","----------------------- THE APARTMENT ----------------------- **SUMMER ALERT:  Apartment is FULLY AIRCONDITIONE"| __truncated__,..: 31270 14116 44645 42618 21635 23689 16093 29099 27589 20535 ...
##  $ experiences_offered                         : Factor w/ 1 level "none": 1 1 1 1 1 1 1 1 1 1 ...
##  $ neighborhood_overview                       : Factor w/ 26905 levels "","''Stuyvesant hight'' is a trendy and safe neighborhood, their is plenty of coffee shops, restaurants and stores"| __truncated__,..: 4516 3890 1 10157 1 12131 1 21046 13722 8179 ...
##  $ notes                                       : Factor w/ 17018 levels "","'Minibar' is the name of the best app for beer, wine, and liquor delivery. 'Seamless' is the main food delivery"| __truncated__,..: 5563 1 1 1 1 10205 1 10272 2164 14126 ...
##  $ transit                                     : Factor w/ 28041 levels "","'very very close to L express to manhattan! JMZ on broadway one block away>  sometimes you hear the train runni"| __truncated__,..: 24517 4488 1 4789 1 8116 1 1 20931 7719 ...
##  $ access                                      : Factor w/ 23435 levels "","---------------------------------------------------------------------------------------------------------------"| __truncated__,..: 1 8448 1 22621 1 10294 1 2632 8780 8179 ...
##  $ interaction                                 : Factor w/ 24970 levels "","---------------------------------------------------------------------------------------------- ----------------"| __truncated__,..: 1 6204 1 19711 1 10508 1 1 16482 19266 ...
##  $ house_rules                                 : Factor w/ 25255 levels "","'No Shoes' in the apartment.",..: 6531 12547 23842 21886 17807 7595 1 15619 16771 15412 ...
##  $ thumbnail_url                               : logi  NA NA NA NA NA NA ...
##  $ medium_url                                  : logi  NA NA NA NA NA NA ...
##  $ picture_url                                 : Factor w/ 48272 levels "https://a0.muscache.com/im/pictures/00027b83-c89a-4a76-ac36-de9ab57e878e.jpg?aki_policy=large",..: 11889 45757 26944 43799 48081 1209 6720 3977 2921 17476 ...
##  $ xl_picture_url                              : logi  NA NA NA NA NA NA ...
##  $ host_id                                     : int  2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
##  $ host_url                                    : Factor w/ 37457 levels "https://www.airbnb.com/users/show/1000014",..: 19513 19756 26531 27259 32826 33085 33159 35905 33430 33539 ...
##  $ host_name                                   : Factor w/ 11453 levels "","'Cil","-TheQueensCornerLot",..: 5051 4846 2962 6264 5982 1970 3601 9699 6935 1264 ...
##  $ host_since                                  : Factor w/ 3587 levels "","2008-08-22",..: 4 5 14 15 19 20 21 31 22 24 ...
##  $ host_location                               : Factor w/ 1571 levels "","  Brooklyn, NY  ",..: 982 982 982 982 886 982 982 982 982 982 ...
##  $ host_about                                  : Factor w/ 22144 levels "","'Eric is a director-actor-writer-producer-teacher of theatre; he also writes songs, makes bread, and advocates "| __truncated__,..: 3047 751 17091 16419 13912 9517 107 15173 21518 14390 ...
##  $ host_response_time                          : Factor w/ 6 levels "","a few days or more",..: 6 5 4 5 3 6 3 5 3 5 ...
##  $ host_response_rate                          : Factor w/ 87 levels "","0%","10%",..: 4 74 4 80 87 4 87 77 87 4 ...
##  $ host_acceptance_rate                        : Factor w/ 2 levels "","N/A": 2 2 2 2 2 2 2 2 2 2 ...
##  $ host_is_superhost                           : Factor w/ 3 levels "","f","t": 2 2 2 2 2 2 2 2 2 3 ...
##  $ host_thumbnail_url                          : Factor w/ 37360 levels "","https://a0.muscache.com/defaults/user_pic-50x50.png?v=3",..: 6697 31587 34728 34835 36082 14713 5717 36888 36252 21825 ...
##  $ host_picture_url                            : Factor w/ 37360 levels "","https://a0.muscache.com/defaults/user_pic-225x225.png?v=3",..: 6697 31587 34728 34835 36082 14713 5717 36888 36252 21825 ...
##  $ host_neighbourhood                          : Factor w/ 416 levels "","Ã\230sterbro",..: 156 241 167 75 113 129 27 169 375 218 ...
##  $ host_listings_count                         : int  6 5 1 1 1 1 1 1 1 4 ...
##  $ host_total_listings_count                   : int  6 5 1 1 1 1 1 1 1 4 ...
##  $ host_verifications                          : Factor w/ 536 levels "['email', 'facebook', 'google', 'reviews', 'offline_government_id', 'selfie', 'government_id', 'identity_manual']",..: 351 350 182 351 118 317 138 155 33 301 ...
##  $ host_has_profile_pic                        : Factor w/ 3 levels "","f","t": 3 3 3 3 3 3 3 3 3 3 ...
##  $ host_identity_verified                      : Factor w/ 3 levels "","f","t": 3 3 3 3 3 2 2 2 3 3 ...
##  $ street                                      : Factor w/ 315 levels " Astoria, NY, United States",..: 46 203 203 64 203 203 64 203 203 203 ...
##  $ neighbourhood                               : Factor w/ 195 levels "","Allerton",..: 20 106 83 20 53 117 20 106 178 105 ...
##  $ neighbourhood_cleansed                      : Factor w/ 221 levels "Allerton","Arden Heights",..: 109 128 95 42 62 138 14 96 203 36 ...
##  $ neighbourhood_group_cleansed                : Factor w/ 5 levels "Bronx","Brooklyn",..: 2 3 3 2 3 3 2 3 3 3 ...
##  $ city                                        : Factor w/ 306 levels ""," Astoria",..: 49 187 187 46 187 187 46 187 187 187 ...
##  $ state                                       : Factor w/ 8 levels "","CA","MP","New York ",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ zipcode                                     : Factor w/ 200 levels "","07093","07302",..: 123 20 29 142 31 18 121 21 27 5 ...
##  $ market                                      : Factor w/ 18 levels "","Adirondacks",..: 15 15 15 15 15 15 15 15 15 15 ...
##  $ smart_location                              : Factor w/ 317 levels " Astoria, NY",..: 46 205 205 62 205 205 62 205 205 205 ...
##  $ country_code                                : Factor w/ 1 level "US": 1 1 1 1 1 1 1 1 1 1 ...
##  $ country                                     : Factor w/ 1 level "United States": 1 1 1 1 1 1 1 1 1 1 ...
##  $ latitude                                    : num  40.6 40.8 40.8 40.7 40.8 ...
##  $ longitude                                   : num  -74 -74 -73.9 -74 -73.9 ...
##  $ is_location_exact                           : Factor w/ 2 levels "f","t": 1 1 2 2 2 1 1 2 2 2 ...
##  $ property_type                               : Factor w/ 36 levels "Aparthotel","Apartment",..: 2 2 2 19 2 2 2 2 2 2 ...
##  $ room_type                                   : Factor w/ 3 levels "Entire home/apt",..: 2 1 2 1 1 1 2 2 2 1 ...
##  $ accommodates                                : int  2 2 2 3 1 2 2 2 1 3 ...
##  $ bathrooms                                   : num  1 1 1 1 1 1 NA 1 1 1 ...
##  $ bedrooms                                    : int  1 0 1 1 NA 1 1 1 1 1 ...
##  $ beds                                        : int  1 1 1 4 1 1 1 1 1 2 ...
##  $ bed_type                                    : Factor w/ 5 levels "Airbed","Couch",..: 5 5 4 5 5 5 3 5 5 5 ...
##  $ amenities                                   : Factor w/ 44292 levels "{\"Air conditioning\",\"Carbon monoxide detector\",\"First aid kit\",\"Lock on bedroom door\",\"translation mis"| __truncated__,..: 9863 23810 339 7790 3243 9905 38040 20063 920 11136 ...
##  $ square_feet                                 : int  NA NA NA 500 NA NA NA NA NA NA ...
##  $ price                                       : Factor w/ 674 levels "$0.00","$1,000.00",..: 111 216 113 642 619 189 551 612 612 113 ...
##  $ weekly_price                                : Factor w/ 599 levels "","$1,000.00",..: 260 162 1 426 440 1 1 366 1 1 ...
##  $ monthly_price                               : Factor w/ 681 levels "","$1,000.00",..: 681 1 1 233 131 1 1 1 1 1 ...
##  $ security_deposit                            : Factor w/ 225 levels "","$0.00","$1,000.00",..: 32 148 95 178 32 137 169 1 1 2 ...
##  $ cleaning_fee                                : Factor w/ 207 levels "","$0.00","$1,000.00",..: 91 5 182 1 188 26 2 43 1 128 ...
##  $ guests_included                             : int  1 2 2 1 1 2 1 1 1 2 ...
##  $ extra_people                                : Factor w/ 104 levels "$0.00","$10.00",..: 51 1 29 1 29 3 45 8 1 2 ...
##  $ minimum_nights                              : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ maximum_nights                              : int  730 1125 7 730 120 21 730 14 14 1125 ...
##  $ minimum_minimum_nights                      : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ maximum_minimum_nights                      : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ minimum_maximum_nights                      : int  730 1125 7 730 120 21 730 14 14 1125 ...
##  $ maximum_maximum_nights                      : int  730 1125 7 730 120 21 730 14 14 1125 ...
##  $ minimum_nights_avg_ntm                      : num  1 1 3 1 10 3 45 2 2 1 ...
##  $ maximum_nights_avg_ntm                      : num  730 1125 7 730 120 ...
##  $ calendar_updated                            : Factor w/ 92 levels "1 week ago","10 months ago",..: 27 38 32 91 26 27 10 39 19 91 ...
##  $ has_availability                            : Factor w/ 1 level "t": 1 1 1 1 1 1 1 1 1 1 ...
##  $ availability_30                             : int  30 25 30 0 0 23 0 3 0 0 ...
##  $ availability_60                             : int  60 55 60 0 0 48 0 10 0 0 ...
##  $ availability_90                             : int  90 80 90 3 0 48 0 12 0 0 ...
##  $ availability_365                            : int  365 355 365 194 0 129 0 220 0 188 ...
##  $ calendar_last_scraped                       : Factor w/ 2 levels "2019-07-08","2019-07-09": 2 2 1 2 1 1 2 1 1 1 ...
##  $ number_of_reviews                           : int  9 45 0 270 9 74 49 430 118 160 ...
##  $ number_of_reviews_ltm                       : int  2 11 0 69 4 9 0 43 0 12 ...
##  $ first_review                                : Factor w/ 2898 levels "","2009-03-12",..: 1587 23 1 1162 357 3 9 5 15 14 ...
##  $ last_review                                 : Factor w/ 1765 levels "","2011-03-28",..: 1503 1717 1 1762 1534 1749 1124 1751 1048 1736 ...
##  $ review_scores_rating                        : int  98 95 NA 90 93 89 90 84 98 94 ...
##  $ review_scores_accuracy                      : int  10 10 NA 10 10 10 8 9 10 10 ...
##  $ review_scores_cleanliness                   : int  10 9 NA 9 9 9 8 7 10 9 ...
##  $ review_scores_checkin                       : int  10 10 NA 10 10 10 10 10 10 10 ...
##  $ review_scores_communication                 : int  10 10 NA 10 10 10 10 9 10 10 ...
##  $ review_scores_location                      : int  10 10 NA 10 9 9 9 10 10 9 ...
##  $ review_scores_value                         : int  10 9 NA 9 10 9 9 9 10 9 ...
##  $ requires_license                            : Factor w/ 1 level "f": 1 1 1 1 1 1 1 1 1 1 ...
##  $ license                                     : Factor w/ 10 levels "","11-3461723",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ jurisdiction_names                          : Factor w/ 6 levels "","{\"Connecticut State\"}",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ instant_bookable                            : Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
##  $ is_business_travel_ready                    : Factor w/ 1 level "f": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cancellation_policy                         : Factor w/ 6 levels "flexible","moderate",..: 2 4 4 2 4 4 4 4 1 4 ...
##   [list output truncated]

Missing Values by Attribute

#Check Missing values in each column
colSums(is.na(rentAirbnbListings))
##                                           id 
##                                            0 
##                                  listing_url 
##                                            0 
##                                    scrape_id 
##                                            0 
##                                 last_scraped 
##                                            0 
##                                         name 
##                                            0 
##                                      summary 
##                                            0 
##                                        space 
##                                            0 
##                                  description 
##                                            0 
##                          experiences_offered 
##                                            0 
##                        neighborhood_overview 
##                                            0 
##                                        notes 
##                                            0 
##                                      transit 
##                                            0 
##                                       access 
##                                            0 
##                                  interaction 
##                                            0 
##                                  house_rules 
##                                            0 
##                                thumbnail_url 
##                                        48895 
##                                   medium_url 
##                                        48895 
##                                  picture_url 
##                                            0 
##                               xl_picture_url 
##                                        48895 
##                                      host_id 
##                                            0 
##                                     host_url 
##                                            0 
##                                    host_name 
##                                            0 
##                                   host_since 
##                                            0 
##                                host_location 
##                                            0 
##                                   host_about 
##                                            0 
##                           host_response_time 
##                                            0 
##                           host_response_rate 
##                                            0 
##                         host_acceptance_rate 
##                                            0 
##                            host_is_superhost 
##                                            0 
##                           host_thumbnail_url 
##                                            0 
##                             host_picture_url 
##                                            0 
##                           host_neighbourhood 
##                                            0 
##                          host_listings_count 
##                                           21 
##                    host_total_listings_count 
##                                           21 
##                           host_verifications 
##                                            0 
##                         host_has_profile_pic 
##                                            0 
##                       host_identity_verified 
##                                            0 
##                                       street 
##                                            0 
##                                neighbourhood 
##                                            0 
##                       neighbourhood_cleansed 
##                                            0 
##                 neighbourhood_group_cleansed 
##                                            0 
##                                         city 
##                                            0 
##                                        state 
##                                            0 
##                                      zipcode 
##                                            0 
##                                       market 
##                                            0 
##                               smart_location 
##                                            0 
##                                 country_code 
##                                            0 
##                                      country 
##                                            0 
##                                     latitude 
##                                            0 
##                                    longitude 
##                                            0 
##                            is_location_exact 
##                                            0 
##                                property_type 
##                                            0 
##                                    room_type 
##                                            0 
##                                 accommodates 
##                                            0 
##                                    bathrooms 
##                                           56 
##                                     bedrooms 
##                                           22 
##                                         beds 
##                                           40 
##                                     bed_type 
##                                            0 
##                                    amenities 
##                                            0 
##                                  square_feet 
##                                        48487 
##                                        price 
##                                            0 
##                                 weekly_price 
##                                            0 
##                                monthly_price 
##                                            0 
##                             security_deposit 
##                                            0 
##                                 cleaning_fee 
##                                            0 
##                              guests_included 
##                                            0 
##                                 extra_people 
##                                            0 
##                               minimum_nights 
##                                            0 
##                               maximum_nights 
##                                            0 
##                       minimum_minimum_nights 
##                                            0 
##                       maximum_minimum_nights 
##                                            0 
##                       minimum_maximum_nights 
##                                            0 
##                       maximum_maximum_nights 
##                                            0 
##                       minimum_nights_avg_ntm 
##                                            0 
##                       maximum_nights_avg_ntm 
##                                            0 
##                             calendar_updated 
##                                            0 
##                             has_availability 
##                                            0 
##                              availability_30 
##                                            0 
##                              availability_60 
##                                            0 
##                              availability_90 
##                                            0 
##                             availability_365 
##                                            0 
##                        calendar_last_scraped 
##                                            0 
##                            number_of_reviews 
##                                            0 
##                        number_of_reviews_ltm 
##                                            0 
##                                 first_review 
##                                            0 
##                                  last_review 
##                                            0 
##                         review_scores_rating 
##                                        11022 
##                       review_scores_accuracy 
##                                        11060 
##                    review_scores_cleanliness 
##                                        11043 
##                        review_scores_checkin 
##                                        11078 
##                  review_scores_communication 
##                                        11055 
##                       review_scores_location 
##                                        11082 
##                          review_scores_value 
##                                        11080 
##                             requires_license 
##                                            0 
##                                      license 
##                                            0 
##                           jurisdiction_names 
##                                            0 
##                             instant_bookable 
##                                            0 
##                     is_business_travel_ready 
##                                            0 
##                          cancellation_policy 
##                                            0 
##                require_guest_profile_picture 
##                                            0 
##             require_guest_phone_verification 
##                                            0 
##               calculated_host_listings_count 
##                                            0 
##  calculated_host_listings_count_entire_homes 
##                                            0 
## calculated_host_listings_count_private_rooms 
##                                            0 
##  calculated_host_listings_count_shared_rooms 
##                                            0 
##                            reviews_per_month 
##                                        10052

Observations

  • The dimension of Airbnb Cost data is 48895, 106.
  • Table structure: Data is in the Long form, which is convenient for our EDA. Money columns need to be changed to numeric.
  • Slicing Relevant columns: There are 106 columns for various dimensions, only pick the relevant columns for our further analysis will be required.
  • Filter Criteria: filter 2 bedroom listings from bedroom column.
  • Missing Values: Many(282804) NA values can be observed in the dataset.

Scrubbing Methodology

  • Check for NA Values in columns
  • Filter Criteria: Filter out only 2 bedroom listings.
  • Relevent columnns: Only column that are relevent will be sliced for further analysis
  • Property data: zipcode,id, last_scraped, neighbourhood_group_cleansed, latitude,longitude,property_type,room_type,accommodates, bathrooms,bedrooms,beds,security_deposit, price, cleaning_fee
  • Demand/Avaliablity data: availability_30,availability_60,availability_90,availability_365
  • Review Data :number_of_reviews,first_review,last_review,review_scores_rating,reviews_per_month
  • Change money columns to numeric
  • Change first_review, last_review columns to date from factors

Below is the scrubbing function that is used for Listings data

listings_scrub <- function(rentAirbnbListings,NoOfBedrooms){ 
  
  revenue_filter <- c('zipcode','id', 'last_scraped', 'neighbourhood_group_cleansed',
                         'latitude','longitude','property_type','room_type','accommodates',
                         'bathrooms','bedrooms','beds','security_deposit', 'price', 'cleaning_fee',
                         'availability_30','availability_60','availability_90','availability_365',
                         'number_of_reviews','first_review','last_review','review_scores_rating',
                         'reviews_per_month')

rentAirbnbListings_filt <-
  rentAirbnbListings %>% # read house rent data
  .[, colnames(.) %in% revenue_filter]  # filter by subset of variables


convert_price_to_numeric <- function(data, var)
{
  index <- which(colnames(data) == var) # indexes of price columns
  df1 <- str_sub(data[,index], 2) # remove '$' character
  df2 <- gsub(",","",df1) %>% # substitute ',' with blank
    as.numeric() # convert to numeric
  return(df2)
  }
  
#Assumption: Cleaning fee and security deposit is considered as 0 where Blank
rentAirbnbListings_filt$price <- convert_price_to_numeric(rentAirbnbListings_filt,'price')
rentAirbnbListings_filt$security_deposit <- convert_price_to_numeric(rentAirbnbListings_filt,'security_deposit')
rentAirbnbListings_filt$cleaning_fee <- convert_price_to_numeric(rentAirbnbListings_filt, 'cleaning_fee')

#Calculate total rent amount (rental/night + cleaning fee)
rentAirbnbListings_filt<-rentAirbnbListings_filt%>%
  mutate( totalRent=(price+ ifelse(is.na(cleaning_fee),as.numeric(0),cleaning_fee)))
 
#convert factor to date
rentAirbnbListings_filt$first_review<-as.Date(as.character(rentAirbnbListings_filt$first_review, format = "%m/%d/%Y"))
rentAirbnbListings_filt$last_review<-as.Date(as.character(rentAirbnbListings_filt$last_review, format = "%m/%d/%Y"))

rentAirbnbListings_filt<-rentAirbnbListings_filt%>%filter(bedrooms==NoOfBedrooms)
return(rentAirbnbListings_filt) 
}
NoOfBedrooms<-2
listings_scbd<-listings_scrub(rentAirbnbListings,NoOfBedrooms)
dimListPostScrb<-dim(listings_scbd)# Dimension post scrubbing
dimListPostScrb
## [1] 6497   25

Dimension post scrubbing: 6497, 25

Airbnb data post scrubbing

kable(head(listings_scbd))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
id last_scraped neighbourhood_group_cleansed zipcode latitude longitude property_type room_type accommodates bathrooms bedrooms beds price security_deposit cleaning_fee availability_30 availability_60 availability_90 availability_365 number_of_reviews first_review last_review review_scores_rating reviews_per_month totalRent
7750 2019-07-08 Manhattan 10029 40.79685 -73.94872 Apartment Entire home/apt 4 1 2 2 190 NA NA 4 14 14 249 0 NA NA NA NA 190
13050 2019-07-09 Brooklyn 11221 40.68554 -73.94090 Townhouse Entire home/apt 5 1 2 2 115 150 85 0 0 0 0 11 2010-04-18 2017-01-01 94 0.10 200
14290 2019-07-08 Brooklyn 11206 40.70420 -73.93560 Loft Entire home/apt 2 1 2 0 228 350 128 14 20 50 140 82 2009-12-01 2019-05-17 94 0.70 356
15396 2019-07-08 Manhattan 10001 40.74623 -73.99530 Apartment Entire home/apt 4 2 2 2 375 1000 120 30 60 90 180 5 2016-03-23 2018-11-03 100 0.12 495
15711 2019-07-08 Manhattan 10162 40.77065 -73.95269 Apartment Entire home/apt 6 1 2 4 250 500 200 0 12 28 231 66 2010-01-06 2019-03-30 93 0.57 450
16458 2019-07-08 Brooklyn 11215 40.67343 -73.98338 Condominium Entire home/apt 4 2 2 1 225 NA NA 0 0 0 0 4 2017-07-02 2017-09-24 100 0.16 225

Please note only 6 rows have been shown above with head function

3.3 Data Merge

Merge (Inner join) Zillow cost and Airbnb rent data based on Zipcode

RentCostMerged<-merge(listings_scbd,zlow_scbd,by.x="zipcode",by.y="zipcode")#merge the data

NoOfRentCost<-dim(RentCostMerged)#calculate dimensions
rowsRentCost<-nrow(RentCostMerged)#No of rows
colsRentCost<-ncol(RentCostMerged)#No of cols
NAvaluesRentCost<-sum(is.na(RentCostMerged))# No of missing data
NoOfRentCost #Dimension post Merge
## [1] 1565   26

Dimension post Merge: 1565, 26

Merged Airbnb Rental & Zillow Cost data

kable(head(RentCostMerged))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
zipcode id last_scraped neighbourhood_group_cleansed latitude longitude property_type room_type accommodates bathrooms bedrooms beds price security_deposit cleaning_fee availability_30 availability_60 availability_90 availability_365 number_of_reviews first_review last_review review_scores_rating reviews_per_month totalRent currentPrice
10003 19288937 2019-07-08 Manhattan 40.72784 -73.98883 Apartment Entire home/apt 4 1.0 2 2 250 500 110 1 2 10 176 99 2017-07-02 2019-06-21 97 4.03 360 2336587
10003 13866112 2019-07-08 Manhattan 40.72394 -73.98872 Apartment Entire home/apt 2 1.5 2 1 165 250 85 0 0 0 0 3 2016-07-31 2017-12-09 100 0.08 250 2336587
10003 31835260 2019-07-08 Manhattan 40.73782 -73.98547 Apartment Entire home/apt 5 1.0 2 2 218 0 120 0 0 0 0 6 2019-03-10 2019-05-12 93 1.49 338 2336587
10003 5207973 2019-07-08 Manhattan 40.72950 -73.98623 Apartment Entire home/apt 4 1.0 2 2 230 500 60 0 0 0 0 1 2015-03-31 2015-03-31 NA 0.02 290 2336587
10003 17843998 2019-07-08 Manhattan 40.72571 -73.98789 Apartment Entire home/apt 4 1.0 2 2 250 200 75 2 5 16 264 8 2017-06-14 2019-06-02 88 0.32 325 2336587
10003 4126452 2019-07-08 Manhattan 40.73327 -73.98503 Apartment Entire home/apt 3 1.0 2 2 200 300 125 0 0 0 157 13 2014-10-06 2015-10-19 80 0.22 325 2336587

Please note only 6 rows have been shown above with head function

Observation

  • The dimension of the final dataset is 1565, 26. These observations would be used for Exploratory Data Analysis.
  • Rows have reduced from 48,895 and 8946 rows in Airbnb Listings & Zillow cost files to 1565 rows in the final dataset.
  • Columns have also reduced from 106 & 262 in Airbnb Listings & Zillow cost files, to 26 columns in the final dataset.
  • A total of 2123 NA values are observed in the final RentCostMerged dataset.

We need to further do a data quality check for NA and Abnormal Values in the merged data. We will handle the data quality checks in the Exploratory Data Analysis phase.

4.0 Exploratory Data Analysis

Exploratory Data Analysis (EDA) is usually the first step of the data analysis phase. In statistics, this is an approach to analyze data sets to summarize their main characteristics.

Below we use statistical & Visualization techniques to draw some key insights from the data

4.1 Data Quality Check

Data Quality check is an important phase of EDA. As the quality of data at hand and heavily influence the accuracy of data analysis, this is taken as the primary step to start with…

4.1.1 Summary Statistics

summary(RentCostMerged)
##     zipcode          id               last_scraped 
##  11215  :189   Min.   :   16458   2019-07-08:1090  
##  10036  :147   1st Qu.: 7664343   2019-07-09: 475  
##  10003  :136   Median :17586014                    
##  10025  :124   Mean   :17837441                    
##  11217  :124   3rd Qu.:28501387                    
##  10011  :106   Max.   :36477307                    
##  (Other):739                                       
##  neighbourhood_group_cleansed    latitude       longitude     
##  Bronx        :   0           Min.   :40.52   Min.   :-74.21  
##  Brooklyn     : 501           1st Qu.:40.68   1st Qu.:-74.00  
##  Manhattan    :1022           Median :40.73   Median :-73.99  
##  Queens       :  16           Mean   :40.73   Mean   :-73.98  
##  Staten Island:  26           3rd Qu.:40.76   3rd Qu.:-73.97  
##                               Max.   :40.81   Max.   :-73.76  
##                                                               
##             property_type            room_type     accommodates   
##  Apartment         :1290   Entire home/apt:1465   Min.   : 1.000  
##  Condominium       :  68   Private room   : 100   1st Qu.: 4.000  
##  House             :  57   Shared room    :   0   Median : 4.000  
##  Loft              :  55                          Mean   : 4.458  
##  Townhouse         :  44                          3rd Qu.: 5.000  
##  Serviced apartment:  35                          Max.   :16.000  
##  (Other)           :  16                                          
##    bathrooms        bedrooms      beds           price       
##  Min.   :0.000   Min.   :2   Min.   :0.000   Min.   :  50.0  
##  1st Qu.:1.000   1st Qu.:2   1st Qu.:2.000   1st Qu.: 165.0  
##  Median :1.000   Median :2   Median :2.000   Median : 228.0  
##  Mean   :1.299   Mean   :2   Mean   :2.381   Mean   : 284.6  
##  3rd Qu.:2.000   3rd Qu.:2   3rd Qu.:3.000   3rd Qu.: 320.0  
##  Max.   :3.500   Max.   :2   Max.   :6.000   Max.   :4000.0  
##  NA's   :3                                                   
##  security_deposit  cleaning_fee   availability_30  availability_60
##  Min.   :   0.0   Min.   :  0.0   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.:   0.0   1st Qu.: 75.0   1st Qu.: 0.000   1st Qu.: 0.00  
##  Median : 250.0   Median :100.0   Median : 0.000   Median : 7.00  
##  Mean   : 503.8   Mean   :109.9   Mean   : 5.894   Mean   :15.87  
##  3rd Qu.: 500.0   3rd Qu.:150.0   3rd Qu.: 8.000   3rd Qu.:27.00  
##  Max.   :5000.0   Max.   :482.0   Max.   :30.000   Max.   :60.00  
##  NA's   :420      NA's   :212                                     
##  availability_90 availability_365 number_of_reviews  first_review       
##  Min.   : 0.00   Min.   :  0.0    Min.   :  0.00    Min.   :2010-05-20  
##  1st Qu.: 0.00   1st Qu.:  0.0    1st Qu.:  1.00    1st Qu.:2015-10-13  
##  Median :12.00   Median : 59.0    Median :  4.00    Median :2017-05-21  
##  Mean   :25.79   Mean   :120.7    Mean   : 19.78    Mean   :2017-01-13  
##  3rd Qu.:49.00   3rd Qu.:247.0    3rd Qu.: 17.00    3rd Qu.:2018-09-02  
##  Max.   :90.00   Max.   :365.0    Max.   :403.00    Max.   :2019-07-07  
##                                                     NA's   :367         
##   last_review         review_scores_rating reviews_per_month
##  Min.   :2012-10-02   Min.   : 20.00       Min.   : 0.010   
##  1st Qu.:2018-05-04   1st Qu.: 92.00       1st Qu.: 0.160   
##  Median :2019-04-29   Median : 96.00       Median : 0.470   
##  Mean   :2018-08-31   Mean   : 94.15       Mean   : 1.187   
##  3rd Qu.:2019-06-23   3rd Qu.:100.00       3rd Qu.: 1.700   
##  Max.   :2019-07-08   Max.   :100.00       Max.   :12.050   
##  NA's   :367          NA's   :387          NA's   :367      
##    totalRent       currentPrice    
##  Min.   :  50.0   Min.   : 413626  
##  1st Qu.: 240.0   1st Qu.:1628300  
##  Median : 325.0   Median :1910420  
##  Mean   : 379.6   Mean   :2156909  
##  3rd Qu.: 450.0   3rd Qu.:2897002  
##  Max.   :4040.0   Max.   :3713170  
## 

This gives a quick overview of data structure

Observation

  • Missing values are present in few columns, namely security_deposit,Cleaning_fee,reviews_per_month,last_review,first_review,review_scores_rating,bathrooms.
  • The rest structure looks fine to proceed.

4.1.2 Missing Values

Often, data can have missing values due to a variety of reasons, for example with listings data, some observations may not have been provided. It is important for us to analyze our data, and get a sense of what the missing values are so we can decide how we want to handle missing values for our Analysis.

MissingValues<-as.vector(sapply(RentCostMerged, function(x) sum(is.na(x))))
df_MissingValues<-data.frame(as.character(colnames(RentCostMerged)),MissingValues)

#Coulmns that have Missing values
df_MissingValues<-
  df_MissingValues%>%
  rename(Attributes="as.character.colnames.RentCostMerged..")%>%
  filter(MissingValues!=0)%>%
  arrange(desc(MissingValues))
df_MissingValues
##             Attributes MissingValues
## 1     security_deposit           420
## 2 review_scores_rating           387
## 3         first_review           367
## 4          last_review           367
## 5    reviews_per_month           367
## 6         cleaning_fee           212
## 7            bathrooms             3

Plot indicating the amount of missing values in Merged dataset

mv<-ggplot(df_MissingValues, aes(x=Attributes, y=MissingValues)) + 
  geom_bar(stat = "identity",fill="#3990E5", alpha=.6, width=.5)+
  geom_text(aes(label=MissingValues), hjust=-1.6,  size=3.5)+
  coord_flip()+
  theme(text = element_text(size=10))+
  ggtitle("Missing Values")
ggplotly(mv)

Observation

  • Security Deposit: 420 properties have Missing Security deposit. This can be because the owners did not demand this amount from the tenant, hence the column has blanks. We assume that blank/NA indicates ‘0’ amount.
  • Cleaning_fee: 212 properties do not have any cleaning fee included. We assume that blank/NA indicates ‘0’ amount.
  • Review per month: Missing values in this column indicate that 367 properties did not have any review ever. We assume that blank/NA indicates ‘0’ Reviews given.
  • last_review/first_review: 367 properties have no first or last review. These properties have never been reviewed, last and first have an equal number of blanks.
  • Review score rating: 387 Missing values in this column indicate that 20 properties(387-367=20) did not have any rating yet but did have review/reviews. We assume that blank/NA indicates ‘0’ ratings given.

Missing values in these columns make sense as some are “fee” columns that may have not been demanded by the owner or Review columns which may be blank because the listing never got any review yet. It is safe to assume the missing values as 0 here.

4.1.3 Total properties by Neighbourhood

Below plot depicts the number of total properties segregated by Neighbourhood

p1<-ggplot(RentCostMerged, aes(neighbourhood_group_cleansed)) + 
  geom_bar(fill='#3990E5' ,alpha=0.5) +
  labs(x = "Neighbourhood", 
       y = "Number of Properties",
       title = "Total properties",
       subtitle = "Segrated by Neighbourhood") + 
  geom_text(stat='count', aes(label=..count..),vjust=-0.6)
ggplotly(p1)

Observation

  • Manhattan has a maximum number of properties: 1022
  • Brooklyn has close to half as many as in Manhattan : 500
  • Queens & Staten Island have very few as compared to Manhattan & Brooklyn : 16 & 26 respectively.

Below plot depicts the number of total properties segregated by Neighbourhood & zipcode

p2<-ggplot(RentCostMerged, aes(zipcode, fill = neighbourhood_group_cleansed)) + 
  geom_bar(alpha=0.5) + 
  geom_text(stat='count', aes(label=..count..), vjust= -0.3) + 
  guides(fill = guide_legend(title = "Neighbourhood"))+
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    plot.caption = element_text(hjust = 0, face = "italic"),
    legend.position="bottom"
    ) +
    labs(x = "Zipcode", 
         y = "Number of Properties",
         title = "Segrated by Neighbourhood & zipcode",
         caption = "Data source: Airbnb & Zillow")  
p2  

Observation

  • A property in zipcode 10013 has been incorrectly labeled as under Brooklyn. Removing this abnormal property will be better for analysis.
  • Some zip codes have a-few listing only, as compared to others. Keeping these zip codes in our analysis can result in skewed output drastically. Hence we need to further analyze this..

A Deeper analysis of Number of listings in each zipcode

B1<-RentCostMerged %>% 
  group_by(zipcode) %>% 
  summarise(ListingCount=n())

#  25TH Percentile of total listings by zipcode
per25<-quantile(B1$ListingCount,.25)
per25  #  25TH Percentile of total listings by zipcode
##  25% 
## 7.75

Observation

  • 25th percentile= 7.75 indicates there 25% zipcodes have less than 8 listings.
  • A boxplot can help us understand the distribution of total listings in each zipcode.
B2<-RentCostMerged %>% 
  group_by(zipcode) %>% 
  summarise(ListingCount=n())%>% 
  filter(ListingCount>=10)

  B11<-ggplot(B1,aes(x="",y=ListingCount))+
  geom_boxplot(
    # custom boxes
        color="blue",
        fill="blue",
        alpha=0.2,
        # Notch?
        notch=TRUE,
        notchwidth = 0.5,
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3
  )+
    labs(y = "Listings", 
         x = "Zipcode",
         title = "Pre removal",
         subtitle = "of zips with <10 listings"
        )
  
  B22<-ggplot(B2,aes(x="",y=ListingCount))+
  geom_boxplot(
    # custom boxes
        color="darkgreen",
        fill="green",
        alpha=0.3,
        # Notch?
        notch=TRUE,
        notchwidth = 0.5,
        
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=3
  )+
    labs(y = "Listings ", 
         x = "Zipcode",
         title = "Post removal",
         subtitle = "of zips with <10 listings"
         )
  
 
grid.arrange(B11,B22, ncol = 2, nrow = 1)
## notch went outside hinges. Try setting notch=FALSE.

Observation

25 percentile of zipcodes have 8 or less number of 2 bedroom properties listed in airbnb. Revenue analysis can get skewed when a zipcode has very few listings. For now, zipcodes with less than 10 two-bedroom airbnb listings are removed from the analysis

Correction

Based on above analysis corrections are made below

RentCostMerged<-subset(RentCostMerged,id!='22216259')

ReleventZips_num <-
  RentCostMerged %>% 
  group_by(zipcode) %>% 
  summarise(ListingCount=n())%>%
  filter(ListingCount>=10)%>%
  arrange(desc(ListingCount))

RentCostMerged_final<-merge(RentCostMerged,ReleventZips_num,by.x="zipcode",by.y="zipcode")

#Updating the Review columns to 0 where there were "no review per month" & rating was never given 
RentCostMerged_final$reviews_per_month<-ifelse(is.na(RentCostMerged_final$reviews_per_month)==TRUE,0,RentCostMerged_final$reviews_per_month)
RentCostMerged_final$review_scores_rating<-ifelse(is.na(RentCostMerged_final$review_scores_rating)==TRUE,as.numeric(0),RentCostMerged_final$review_scores_rating)

nrowfnl<-nrow(RentCostMerged_final)
ncolfnl<-ncol(RentCostMerged_final)
nrowfnl#Number of rows post correction
## [1] 1541
ncolfnl#Number of columns post correction
## [1] 27

We finally have 1541 properties to analyze based on 27 attributes

4.2 Data Analysis

Analysis based on visualization

EDA through Visualizations is helpful in understanding the hidden patterns in data that the human eye may miss to notice in tabular format.

Here, we used some statistical Visualization techniques to translate the elusive patterns and correlations in data.

What Zipcodes are better to invest ?

The analysis to understand what will be better than the other is based on the question “what is desired out of the investment”. For instance, If an investor wants a shorter break-even time in investment then he has to select based on that dimension.

Here, the analysis is based on 4 such dimensions, namely:

  • Customer Preference
  • Market Demand
  • Customer Feedback/Reviews
  • Break-Even Time

A deep dive Analysis on these individual dimensions in performed below..

4.2.1 Customer Preferance

There can be a plethora of reasons behind a prominent customer propensity for a locality such as Environment and social influence, Commercial implication, Proximity, etc. Hence, it goes without saying that a property that is located in an area that is more customer-preferred will flourish more and hence will be in demand and earn better returns on Investment.

Revisiting the same corrected graph

ReleventZips_num <-
  RentCostMerged_final %>% 
  group_by(zipcode,neighbourhood_group_cleansed) %>% 
  summarise(ListingCount=n())%>%
  arrange(desc(ListingCount))

p2<-ggplot(RentCostMerged_final, aes(zipcode, fill = neighbourhood_group_cleansed)) + 
  geom_bar(alpha=0.5) + 
  geom_text(stat='count', aes(label=..count..), vjust= -0.3) + 
  geom_hline(aes(yintercept = median(ReleventZips_num$ListingCount)),color="grey31",linetype="dashed", size=1)+
  guides(fill = guide_legend(title = "Neighbourhood"))+
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    plot.subtitle = element_text(hjust = 0.5,size = 12),# Center subtitle
    plot.caption = element_text(hjust = 0, face = "italic"),
    legend.position="bottom"
    ) +
    labs(x = "Zipcode", 
         y = "Number of Properties",
         title = "Segrated by zipcode",
         caption = "Data source: Airbnb & Zillow")  
p2  

All above the median reference line are better Zipcodes for investment

Observation

It can be observed that most Airbnb rental properties are in the areas of Manhattan followed by Brooklyn and then Queens & Staten Island.

Hence, Making an Investment in a location that is mostly “preferred by customer” seems reasonable. There are high chances that investment will flourish as it looks like the best market for short-term rental investment considering the highest Number of Listings in the city!

Recommended Zipcodes based on Customer Preferance:

#Zipcode recommended:
aboverMedian_byPreferance<-subset(ReleventZips_num,ListingCount>=median(ReleventZips_num$ListingCount))
aboverMedian_byPreferance
## # A tibble: 9 x 3
## # Groups:   zipcode [9]
##   zipcode neighbourhood_group_cleansed ListingCount
##   <fct>   <fct>                               <int>
## 1 11215   Brooklyn                              189
## 2 10036   Manhattan                             147
## 3 10003   Manhattan                             136
## 4 10025   Manhattan                             124
## 5 11217   Brooklyn                              124
## 6 10011   Manhattan                             106
## 7 10013   Manhattan                             104
## 8 11231   Brooklyn                               93
## 9 10014   Manhattan                              90

4.2.2 Demand

Assumption: Avalilablity is inversely propotional to Demand

Demand can be an important parameter to evaluate before making any investment. High demand can be because of many factors such as location, sight & view, proximity, etc. Investing in such an area would result in heavy-booking of property which will improve the occupancy rate and in turn Revenue.

Here, to analyze the demand of properties in various zip codes availability is parameter is analyzed. This is based on the fact that a good apartment that is in high demand will have low Availablity

Correlation analysis of various availablity columns

C<-cor(RentCostMerged_final[,c(16,17,18,19,25,26)])
corrplot(C,  method = "square",type="lower")

Observation

  • We observe a strong correlation between availability parameters. This proves that any property that is not available(in high demand) in the next 30 days will possibly not be available in the next 365 days as well.
  • We also observe a weak correlation between availability and rent. It can be inferred that high availability indicates low demand and in turn low rent.
  • Rent and price are highly correlated.

For further analysis we take availablity_365 as an indicator of availablity

Below Visualization indicates the Zipcodes vs Demand

ReleventZips_bydmnd<-
  RentCostMerged_final %>% 
  group_by(neighbourhood_group_cleansed,zipcode) %>% 
  summarise(demand=100-(100*(median(availability_365)))/365) %>%
  arrange(desc(demand))

  p3<-ggplot(ReleventZips_bydmnd,aes(x =zipcode,y= demand, fill = neighbourhood_group_cleansed )) + 
  geom_bar(position = 'dodge',stat = "identity",alpha=0.6) + 
  geom_text(aes(label=round(demand,2)),position=position_dodge(width=0.9), vjust=-0.25,size=3)+
  geom_hline(aes(yintercept = median(ReleventZips_bydmnd$demand)),color="grey31",linetype="dashed", size=1)+
  scale_y_continuous(labels = scales::comma) +  
  scale_colour_brewer(palette = "Pastel2") + 
 theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    legend.position="bottom",
    plot.caption = element_text(hjust = 0, face = "italic")
    ) + 
  guides(fill = guide_legend(title = "Neighbourhood")) +
  
  labs(y = "Percentage Demand '%' ", 
       x = "Zipcode",
       title = "Percentage Demand of Zipcodes",
       subtitle = "Median values of each zipcode in used for calculation",
       caption = "Data source: Airbnb & Zillow")
 
p3

All above the median reference line are better Zipcodes for investment

Observation

Close to half of zipodes of manhattan and all zipcodes from Brooklyn are in high demand, Brooklyn seems to be performing better here.

Recommended Zipcodes based on Occupancy/Demand

#Zipcode recommended:
aboverMedian_byDemand<-subset(ReleventZips_bydmnd,demand>=median(ReleventZips_bydmnd$demand))
aboverMedian_byDemand
## # A tibble: 9 x 3
## # Groups:   neighbourhood_group_cleansed [2]
##   neighbourhood_group_cleansed zipcode demand
##   <fct>                        <fct>    <dbl>
## 1 Brooklyn                     11217     97.4
## 2 Manhattan                    10025     97.4
## 3 Manhattan                    10014     95.9
## 4 Brooklyn                     11231     94.0
## 5 Brooklyn                     11201     93.4
## 6 Manhattan                    10011     93.2
## 7 Manhattan                    10021     91.9
## 8 Manhattan                    10003     91.0
## 9 Brooklyn                     11215     90.1

4.2.3 Customer Feedback or Reviews

Reviews not only have the power to influence consumer decisions but can strengthen a company’s credibility. Reviews have the power to gain customer trust, and they encourage people to interact with the company. Customer interaction ultimately leads to improved profits for businesses. Good reviews compel more customers to select a property and bad reviews deter them. Hence, this can be an important dimension to analyze.

Correlation analysis of various availablity columns

R<-cor(RentCostMerged_final[,c(19,20,23,24,25,26)])
corrplot(R,  method = "square",type="lower")

Observation

  • It is surprising to notice that number_of reviews,review_scores_rating & reviews_per_month have a weak association with Rent. This indicates good reviews of a locality do indicate higher rental income. But, it goes without saying that reviews do impact a huge customer populace. We can also observe the positive correlation between number_of_reviews & availablity_365. This indicates that a zipcode with better Reviews has better odds to be demand than the one that does not…

Below scatter plot show the better performing Zipcode in Review dimension

#relevent zips
ReleventZips_Rvu<-RentCostMerged_final %>% 
group_by(neighbourhood_group_cleansed,zipcode) %>% 
 summarise(RvperMonth_avg=mean(number_of_reviews),RvScoreRating_med=(median(review_scores_rating))/10)%>%   arrange(desc(RvScoreRating_med),desc(RvperMonth_avg))

m<-round(median(ReleventZips_Rvu$RvperMonth_avg),2)
r<-median(ReleventZips_Rvu$RvScoreRating_med)

numDayperReview <- 
  RentCostMerged_final %>% 
  group_by(neighbourhood_group_cleansed,zipcode) %>% 
  summarise(RvperMonth_avg=mean(number_of_reviews),RvScoreRating_med=(median(review_scores_rating))/10)%>% 
  ggplot(aes(x = RvperMonth_avg, y= RvScoreRating_med,colour = zipcode)) + 
  geom_point() + 
  geom_vline(xintercept = m,linetype="dashed", color = "gray")+
  geom_hline(yintercept = r,linetype="dashed", color = "gray")+
  #geom_point(data=ReleventZips_Rvu_sel, aes(x=RvperMonth_avg,y=RvScoreRating_med))+ #for highlighting
    theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    plot.title = element_text(hjust = 0.5, size = 14),# Center title position and size
    plot.subtitle = element_text(hjust = 0.5,size = 12),# Center subtitle
    plot.caption = element_text(hjust = 0, face = "italic")
       ) + 
labs(x="Total Number Of Reviews", 
     y="Review Score Rating(10)",
     title = "Zipcode Review Analysis",
     subtitle = "Zipcodes in first quadrant are out performing",
     caption = "Data source: Airbnb & Zillow"
    )

ggplotly(numDayperReview)

Intepretation of Plot

  • They grey vertical & horizontal axis indicates the median review per month and median Review score rating.
  • Zipcodes that have above-median Review score rating and reviews per month are considered better based on Review analysis.

Observation

  • 6 Zipcodes perform better than median reference values

Recommended Zipcodes based on Review Dimension

#Zips recommended
aboverMedian_byReview<-subset(ReleventZips_Rvu,RvScoreRating_med>=r & RvperMonth_avg>=m)

aboverMedian_byReview
## # A tibble: 6 x 4
## # Groups:   neighbourhood_group_cleansed [3]
##   neighbourhood_group_cleansed zipcode RvperMonth_avg RvScoreRating_med
##   <fct>                        <fct>            <dbl>             <dbl>
## 1 Brooklyn                     11215             30.1               9.7
## 2 Brooklyn                     11201             19.4               9.7
## 3 Queens                       11434             37.1               9.6
## 4 Brooklyn                     11231             26.4               9.6
## 5 Staten Island                10305             25.7               9.6
## 6 Brooklyn                     11217             22.5               9.6

4.3.4 Break even Analysis

Assumption

  • Rent increases by 10% every year
  • Occupancy rate for all property is calculated as 75%

An utmost important dimension in any investment is it’s break-even time. The breakeven point in this analysis is referred to the period property would take to earn its investment amount, the period when rent earned over years = investment amount

A property that can reach this point earlier starts to earn profit earlier. Hence, selecting the zipcodes that have smaller breakeven time would be a sensible investment.

#Code to calculate break even time based on current price and rent
brkEvenCalculate <- function(RentCostMerged_final){ 

breakeven_time <- c()
for(i in 1:nrow(RentCostMerged_final))
{
  #initialize to 0 default values
  TmeYr <- 0 
  totRevenue <- 0 
  aprx_Prcntg_Inc_Rnt_Prc_PrYr<-.10 #Assumption
  occpncyRate<-.75 #Asumption
  rentPrice <- RentCostMerged_final$totalRent[i] # rent price of property
  costPrice  <-  RentCostMerged_final$currentPrice[i]# cost price of property
  
  # while cost of buying the house is greater than the total revenue generated through rent
  while(costPrice>totRevenue)
  {
    # increase year by 1
    TmeYr <- TmeYr + 1 
    
    # increase rent price for next year by a fixed percentage
    rentPrice <- rentPrice * (1+aprx_Prcntg_Inc_Rnt_Prc_PrYr) 
    
    # calculating revenue till the current year
    totRevenue <- totRevenue + rentPrice*365*occpncyRate
  }
  
  # decrease the year by 1 as break-even point has reached before the end of this year
  breakeven_year <- TmeYr - 1 
  # total revenue needed this year to break-even
  rvnue_extra <- totRevenue - costPrice
  
  # total time needed to break-even in this year
  breakeven_year_extra_time <- rvnue_extra/(rentPrice*365*occpncyRate)
  
  # total breakeven time is summation of previous years + time taken in the current year
  breakeven <- breakeven_year + breakeven_year_extra_time
  
  # storing breakeven time for each property
  RentCostMerged_final$breakevenTime[i] <- round(breakeven,2)
}
return(RentCostMerged_final)
}

RentCostMerged_final<-brkEvenCalculate(RentCostMerged_final)

Cost Rent Density Analysis

A density anlaysis of cost price and Rental income can give a clear picture of better prospects in investment.

pp_c<-ggplot(RentCostMerged_final, 
           aes(x=eval(parse(text='currentPrice')), 
               fill=eval(parse(text='neighbourhood_group_cleansed')))) +
  geom_density(alpha=0.4) +
  geom_vline(aes(xintercept=median(eval(parse(text='currentPrice'))))
             , color="black", linetype="dashed", size=.5) +
  guides(fill=guide_legend(title='Neighbourhood')) + 
  scale_x_continuous(limits = c(0, 5000000))+
  ggtitle("Cost vs Rent analysis")+
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    plot.caption = element_text(hjust = 0, face = "italic")
    
    ) + 
  labs(x="Cost of Properties", 
       y="Density",
       subtitle = "Cost Density plot",
       title = "House Price vs Property Rent Analysis"
       )

pp_r<-ggplot(RentCostMerged_final, 
           aes(x=eval(parse(text='totalRent')), 
               fill=eval(parse(text='neighbourhood_group_cleansed')))) +
  geom_density(alpha=0.4) +
  geom_vline(aes(xintercept=median(eval(parse(text='totalRent'))))
             , color="black", linetype="dashed", size=.5) +
  scale_x_continuous(limits = c(0, 1000)) +
  guides(fill=guide_legend(title='Neighbourhood'))+
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    plot.caption = element_text(hjust = 0, face = "italic"),
    
  ) + 
  labs(x="Rent/Night of Properties", 
       y="Density",
       title = "",
       subtitle = "Rent Density Plot"
       #caption = "Data source:Zillow"
  )

grid.arrange(pp_c, pp_r)

Observation

  • The cost price of properties is highest in the neighbourhood Manhattan followed by Brooklyn . Also, Manhattan has the highest spread and surprising cost price of Brooklyn is very short spread. Staten Island & Queens have an overlapping cost price

  • The rentals/per Night of properties is closely overlapping. Manhattan has the longest spread of rental amount, whereas Brooklyn is concentrated in a shorter range. Staten Island and Queens have overlapping cost price.

  • Manhattan: The cost of properties at Manhattan is highest as compared to other neighbourhoods, we further notice that half of the properties at Manhattan have rentals higher than median overall rent. This indicates that breakeven time will be considerably higher.

  • Brooklyn: The cost of properties is higher but less than that of Manhattan, whereas the rent is quite close to that of Manhattan, a few properties of Manahattan have higher rental value than of Brooklyn. Most properties have lower than median overall rent.

  • Staten Island and Queens have overlapping costs and rent(Range is similar). The rent and cost both are below median values.

This analysis indicates that Brooklyn can be a better choice of investment, We can verify this through below boxplot analysis of break-even time for zipcodes

Break Even Time analysis

  ggplot(RentCostMerged_final, aes(x = as.factor(zipcode), y = breakevenTime, fill = neighbourhood_group_cleansed)) + 
  geom_boxplot(alpha=0.5,
        # custom outliers
        outlier.colour="red",
        outlier.fill="red",
        outlier.size=1
        ) +
  labs(x = "Zipcode", y = "Breakeven time") + 
  scale_y_continuous(breaks=seq(0, 40, 8)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  guides(fill=guide_legend(title='Neighbourhood'))+
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.text.x = element_text(angle = 90, hjust = 1),
    plot.caption = element_text(hjust = 0, face = "italic"),
    legend.position="bottom",
  ) + 
  labs(x="Zipcode", 
       y='Breakeven Time (years)',
       title = "Break even Time Analysis",
       subtitle = "",
       caption = "Data source:Airbnb & Zillow",
       fill='Neighbourhood'
  )

Observation Looking at the boxplots segregated by neighbourhood, zipcode we can infer that :

  • Manhattan takes a lead in all neighbourhood with a considerably higher breakeven period.
  • Brooklyn has a shorter break-even period than Manhattan but longer than Staten Island and Queens.

Based on the breakeven analysis the zipcodes that have shorter than median can be considered as good investment locations.

Recommended Zipcodes based on breakeven analysis:

#all relevent zips as per break even analysis
ReleventZips_BEA<-
  RentCostMerged_final%>%
  group_by(neighbourhood_group_cleansed,zipcode) %>% 
  summarise(breakEven_med=median(breakevenTime)) %>% 
  arrange(breakEven_med)

#zips above median
aboveMedian_byBEA<-subset(ReleventZips_BEA,breakEven_med<=median(ReleventZips_BEA$breakEven_med))
aboveMedian_byBEA
## # A tibble: 9 x 3
## # Groups:   neighbourhood_group_cleansed [4]
##   neighbourhood_group_cleansed zipcode breakEven_med
##   <fct>                        <fct>           <dbl>
## 1 Queens                       11434            7.32
## 2 Staten Island                10305            8.92
## 3 Manhattan                    10022            9.3 
## 4 Manhattan                    10036            9.43
## 5 Brooklyn                     11231           10.2 
## 6 Brooklyn                     11215           10.4 
## 7 Manhattan                    10025           10.7 
## 8 Brooklyn                     11201           10.7 
## 9 Brooklyn                     11217           11.6

5.0 Conclusion

Below table indicates Recommended zipcodes based on 4 important dimensions.

Median value is considered as reference in each category

Interpretation of above table:

  • The zip code that are present in all 4 columns are the ones that are better performing zips in all 4 dimensions.
  • Ranking as per individual dimension analysis is maintained.For instance, 11215 performed in the preference category as compared to 11217.

Conclusion

  • There are 3 zipcode that are present in all 4 dimensions, namely: 11215, 11217,11231 ,all these are from Brooklyn. These 3 are recommended Zipcodes for Investment
  • There is 1 zipcode that is present in 3 dimensions(Demand, Review, Breakeven), namely:11201, from Brooklyn.
  • Many zips are present in 2 dimensions as it can be noticed from the table.

Future Steps

  • Speaking to the investor to understand the type of investment and dimension of interest and help choose the zip codes further.
  • With stated Assumptions and Data, the analysis is made on some assumed Dimensions of interest.
  • Deeper understanding of each Attribute and significance will facilitate in making better decision models.
  • Text and Sentimental analytics on Review column can help understand if the review is positive or Negative. This can bring deeper insight into criteria that attract the customer population.
  • Introducing Weather data, Holiday Season & Special Day Celebrations Data can improve the scalability of the analysis.