Executive Summary

Problem Statement

As a part of this assignment, a data product was built to assist a real estate company to understand which zip codes are profitable for short term rentals within New York City. For this purpose, publicly available data from Zillow and AirBnB were used.

  1. Cost Data - Estimate of value for two-bedroom properties provided by Zillow.
  2. Revenue Data - AirBnB Data Set with relevant short term rental information.

Assumptions Made

  1. The investor will pay for the property in cash (i.e. no mortgage/interest rate will need to be accounted for).
  2. The time value of money discount rate is 0% (i.e. $1 today is worth the same 100 years from now).
  3. All properties and all square feet within each locale can be assumed to be homogeneous (i.e. a 1000 square foot property in a locale such as Bronx or Manhattan generates twice the revenue and costs twice as much as any other 500 square foot property within that same locale).
  4. Weather/Seasonality has little or no impact on number of bookings.
  5. If the property type == Private Room, it is multipled by number of bedrooms to account for overall price. Correction applied is returned to original price column.
  6. Number of reviews are identitive of popularity among consumers and the reviews are positive.
  7. Availablity for next 365 days is a status quo as far as occupancy in concerned. Lesser than availablity , higher the occupancy.

Data Cleaning and EDA

  1. Revenue and Cost Data have large number of columns - 95 and 262 resppectively. They were filtered out based on predefined assumptions, data quality requirements and keeping the end goal in mind.

  2. Revenue and Cost data were joined based on zipcodes and final data set contains 22 zipcodes to choose from, expanding over 4 major neighbourhoods - Manhattan, Queens, Brooklyn and Staten Island.

  3. Extensive exploratory data analysis was performed to evaluate problem on 5 key metrics : Cost to buy property, Property daily Rental Price, Number of Choices available, Popularity and occupancy.

Key Findings

  1. Majority of zipcodes in Manhattan and Brooklyn have higher property cost, some of them don’t. Since these zipcodes are also popular in terms of tourism, number of available choices and prebookings(higher occupancy) - they make some excellent choice for investment.

  2. Zipcodes in Manhattan and Brooklyn have also seen major property price increase in the last 5-10 years. Staten Island and Queens have been stagnant for over a decade. If the trends were to continue, Manhattan and Brooklyn also make an excellent choice as the re-sale value can be expected to be higher and will protect the investement if things went south in the years to come.

  3. Staten Island and Queens Neighbourhood have limited choices but they cost lower, are highly popular and return higher rental prices. These zipcodes were explicitly identified.

  4. Price and Availablity are not inveresely proportional. Consumers are not hestitant to pay higher price ahead of time. The company can capitalize this behavior into their pricing scheme.

Results

  1. Top 3 Zipcodes based on the key metrics are: 11231,11201 and 11217. These zipcodes cost lesser, derive higher daily rental value, provide higher number of choices, are popular among current consumers in terms of number of reviews and occupancy.

Initial Set Up : Config File

Config file and Packages are heart and soul of this program. They have to be set up first to ensure rest of program runs.

How is it achieved?

  1. Get a list of required packages/libraries from config file.
  2. Check for existing packages, otherwise install package/libraries listed.
  3. After installation, Load Package/libraries for use.

Config File

Configuration File

Configuration file is used to configure parameters and initial settings for majority of computer applications/programs. Instruction to use config file can be found in readme file which is provided along with this document. As a part of this exercise: directory path, libraries and global objects such as City Name, Start Year and End year are separately intialized in the config file.

# Config 
source("config.R")

Package Loading

Required Packages

  • Tidyverse (dplyr, ggplot2..) - Data Read, Manipulation and visualisation
  • Data Explorer - EDA & Generate Reports
  • Caret - Pre Processing, Feature Selection
  • Plotly - Interactive Visualization
  • KableExtra - Styling for Kable (Interactive Data Tables within Markdown)
  • MatrixStats - For functions operating on rows and columns of matrices.
# Package Loading 
## Check for package versions and update them accordingly
newPackages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(newPackages)) install.packages(newPackages, dependencies = T)

inst = lapply(packages, require,character.only = TRUE)
cat("library loaded succesfully!")

Data Loading

Revenue(Airbnb) and Cost(Zillow) Datasets are loaded into R-Enviroment for Exploratory data analysis. If new markets or cities have to be explored at a later stage, new files must be loaded into the same directory and the code will automatically scale to account for it.

How is it Achieved ?

  1. Check for path name defined in Config file to locate for Revenue (Airbnb) and cost(Zillow) datasets.
  2. Look for “CSV” pattern match at the end of each file name.
  3. Load dataset using Rbind - Only datasets with common column names are merged rowwise.

Revenue

Dimension of Revenue Data (Number of Rows, Number of Columns):

revenue <-
  list.files(path = revenue,
             pattern = "*.csv",
             full.names = T) %>%
  map(read_csv) %>%
  reduce(rbind)

dim(revenue)
## [1] 40753    95

Cost

Dimension of Cost Data (Number of Rows, Number of Columns):

cost <- 
  list.files(path = cost,
           pattern = "*.csv",
           full.names = T) %>%
  map(read_csv) %>%
  reduce(rbind)

print(dim(cost))
## [1] 8946  262

Data Preparation

Cost and Revenue datasets are handled separately in an attempt to enrich the data quality for exploratory data analysis.

How is it Achieved ?

  1. Account for common data quality issues: Missing Values, Duplicated Rows etc and make relevant changes.
  2. Filter Zero Variance/imbalanced columns, high missing value columns, columns not associated to the problem statement etc.
  3. Analyze any redudant columns and aggregate based on reasoning/assumptions.
  4. Produce clean cost and revenue data for joining and further exploration.

Cost Data

Majority of columns in Cost Dataset account for Median Price for 2-bedroom homes. Data is captured between year 1996 and 2017 and spread monthly.

kable(head(cost))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
RegionID RegionName City State Metro CountyName SizeRank 1996-04 1996-05 1996-06 1996-07 1996-08 1996-09 1996-10 1996-11 1996-12 1997-01 1997-02 1997-03 1997-04 1997-05 1997-06 1997-07 1997-08 1997-09 1997-10 1997-11 1997-12 1998-01 1998-02 1998-03 1998-04 1998-05 1998-06 1998-07 1998-08 1998-09 1998-10 1998-11 1998-12 1999-01 1999-02 1999-03 1999-04 1999-05 1999-06 1999-07 1999-08 1999-09 1999-10 1999-11 1999-12 2000-01 2000-02 2000-03 2000-04 2000-05 2000-06 2000-07 2000-08 2000-09 2000-10 2000-11 2000-12 2001-01 2001-02 2001-03 2001-04 2001-05 2001-06 2001-07 2001-08 2001-09 2001-10 2001-11 2001-12 2002-01 2002-02 2002-03 2002-04 2002-05 2002-06 2002-07 2002-08 2002-09 2002-10 2002-11 2002-12 2003-01 2003-02 2003-03 2003-04 2003-05 2003-06 2003-07 2003-08 2003-09 2003-10 2003-11 2003-12 2004-01 2004-02 2004-03 2004-04 2004-05 2004-06 2004-07 2004-08 2004-09 2004-10 2004-11 2004-12 2005-01 2005-02 2005-03 2005-04 2005-05 2005-06 2005-07 2005-08 2005-09 2005-10 2005-11 2005-12 2006-01 2006-02 2006-03 2006-04 2006-05 2006-06 2006-07 2006-08 2006-09 2006-10 2006-11 2006-12 2007-01 2007-02 2007-03 2007-04 2007-05 2007-06 2007-07 2007-08 2007-09 2007-10 2007-11 2007-12 2008-01 2008-02 2008-03 2008-04 2008-05 2008-06 2008-07 2008-08 2008-09 2008-10 2008-11 2008-12 2009-01 2009-02 2009-03 2009-04 2009-05 2009-06 2009-07 2009-08 2009-09 2009-10 2009-11 2009-12 2010-01 2010-02 2010-03 2010-04 2010-05 2010-06 2010-07 2010-08 2010-09 2010-10 2010-11 2010-12 2011-01 2011-02 2011-03 2011-04 2011-05 2011-06 2011-07 2011-08 2011-09 2011-10 2011-11 2011-12 2012-01 2012-02 2012-03 2012-04 2012-05 2012-06 2012-07 2012-08 2012-09 2012-10 2012-11 2012-12 2013-01 2013-02 2013-03 2013-04 2013-05 2013-06 2013-07 2013-08 2013-09 2013-10 2013-11 2013-12 2014-01 2014-02 2014-03 2014-04 2014-05 2014-06 2014-07 2014-08 2014-09 2014-10 2014-11 2014-12 2015-01 2015-02 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08 2016-09 2016-10 2016-11 2016-12 2017-01 2017-02 2017-03 2017-04 2017-05 2017-06
61639 10025 New York NY New York New York 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 798600 798800 801500 804600 814900 828300 835700 849300 858100 854000 834800 821700 830300 853700 868300 875200 882200 892400 905000 924000 934400 932100 927500 923600 907900 890900 883400 896100 923900 952900 964600 972500 973800 973400 966500 966800 967100 974800 976800 976100 973700 974500 973200 966400 950400 933300 920900 909400 891400 873300 858800 850200 842800 834000 828800 821400 813900 813300 821500 831700 845100 854500 858900 859200 863500 876000 886100 890000 894200 901800 909500 913300 907400 900000 897700 896300 892300 890400 888600 891700 899500 904400 908200 914000 915100 912300 914000 921100 923300 917300 915000 922800 929100 937700 955700 974200 995500 1019500 1035100 1054900 1079900 1092600 1103500 1118800 1139300 1154600 1144100 1120300 1125500 1136000 1135100 1130000 1138200 1153700 1174800 1185400 1188400 1189700 1193700 1199900 1201400 1202600 1214200 1235200 1258000 1287700 1307200 1313900 1317100 1327400 1338800 1350400 1356600 1358500 1364000 1373300 1382600 1374400 1364100 1366300 1354800 1327500 1317300 1333700 1352100 1390000 1431000
84654 60657 Chicago IL Chicago Cook 2 167700 166400 166700 167200 166900 166900 168000 170100 171700 173000 174600 177600 180100 182300 184400 186300 187600 189400 190300 189700 189800 191900 194500 195500 196000 196900 198900 201400 204600 207900 211800 214600 216000 217500 220200 222800 226200 229600 232400 234400 236300 238300 241800 246100 249500 251300 253200 255700 259200 263100 266600 269500 272800 275500 278800 283400 288600 291300 292400 294600 297100 298200 299800 302000 304200 307900 311000 311400 311000 311700 312300 312000 311800 312600 313000 314400 317300 319700 320500 321000 321600 323800 326100 329000 332200 334700 336000 337300 337500 337100 334900 333100 332800 333400 335100 337800 339400 340700 343200 345000 345200 344600 344500 346200 349800 353400 355100 356300 358300 359500 359200 358500 359100 361400 364700 367500 369400 369800 369600 368700 368100 369000 370300 371700 374200 375700 376400 378200 379800 380100 380400 381200 382700 382700 380200 377800 376300 375600 376000 376200 375800 376300 377200 376800 373700 370100 368700 368600 366600 362200 358600 355300 352300 350900 350100 347900 345400 343400 342800 342600 341100 339900 338900 338200 335200 329800 325500 323600 323400 325000 325800 323200 320100 318600 317400 315700 315000 315300 315600 313900 309800 305700 301800 299500 299900 301100 300300 298900 298500 298500 297000 296800 298700 299600 300700 303900 306800 307500 308500 310000 310800 311200 313000 315800 319000 323400 327500 330000 331800 334500 336000 335700 335400 336300 338800 342400 344400 344000 343900 345100 346100 346900 348000 349700 351200 351700 350700 350400 352000 354300 355900 356500 355200 353800 353700 354600 356200 357800 358200 358500 360300 362400 363700 365200 367100 368600 370200 372300 375300 378700 381400 381800 382100 383300 385100
61637 10023 New York NY New York New York 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1526800 1424500 1346600 1331300 1322500 1289300 1265400 1249700 1241100 1232700 1225500 1228200 1252600 1266100 1288700 1308100 1333000 1356400 1362000 1353600 1364000 1373900 1389600 1401600 1404100 1415800 1432400 1455400 1474200 1462300 1438300 1435500 1427800 1411200 1407400 1419700 1457400 1500800 1524900 1537800 1558700 1586100 1602300 1621100 1639300 1657400 1657400 1656100 1649400 1643400 1632400 1618200 1588300 1543600 1500800 1464200 1426100 1387300 1362600 1351700 1344300 1331800 1334800 1314200 1271900 1252300 1262300 1279200 1309000 1335300 1353800 1366400 1372100 1381300 1385000 1388100 1399100 1399800 1389300 1384700 1380900 1367900 1365400 1375100 1380400 1377000 1375100 1379000 1395200 1414500 1419000 1403100 1383200 1376700 1378200 1378700 1375900 1366700 1365500 1382200 1404700 1428000 1445700 1452900 1460100 1484400 1508400 1522800 1538300 1568600 1597400 1622900 1654300 1684600 1713000 1728800 1736100 1745900 1753800 1736600 1730400 1734500 1728700 1720800 1717700 1700100 1680400 1676400 1685600 1708100 1730400 1751800 1778300 1810400 1831600 1844400 1861600 1889600 1901500 1895300 1890200 1898400 1924500 1967300 1993500 1980700 1960900 1951300 1937800 1929800 1955000 2022400 2095000 2142300
84616 60614 Chicago IL Chicago Cook 4 195800 193500 192600 192300 192600 193600 195500 197600 199400 201300 203600 206500 209200 211100 212600 214400 215600 216500 217900 220100 222200 223900 225400 227700 230100 231700 232700 233700 234700 235600 236800 238800 240800 242400 243800 246400 250200 254300 257600 261100 264800 267900 270700 272800 274400 276200 278600 280100 283100 287700 293600 298500 302700 305000 306800 309400 313100 314900 316200 318200 320600 322900 325500 328400 330700 332800 334400 335900 337400 339700 342300 343800 343400 342300 341800 341700 342400 344300 346900 348900 350200 351700 353500 355700 358000 361600 364000 365500 366400 367000 365200 363100 362000 362500 364800 368200 370800 371400 371200 371800 374200 377800 380100 381700 384400 386600 385700 385400 388500 392200 394500 394900 394800 395500 400000 404300 407600 409500 410400 408700 406400 403800 402400 402300 403000 403100 403600 404500 406100 407700 408700 409600 409800 408700 408600 411000 412200 411400 410200 408400 406600 406500 406400 404600 402900 400600 397500 392600 387500 383700 381000 378900 377700 377400 376700 374000 369600 366200 364500 364100 364300 363600 361900 361900 356400 347100 342400 344000 345200 346900 346100 342600 340100 339900 338600 335500 333900 335000 336000 334200 330300 327000 326000 326100 326700 326300 324400 322700 323200 322800 320700 319500 320100 320500 321800 323600 324300 324100 324700 326000 327600 329800 332600 336800 342300 348100 353600 358900 361900 363900 366200 368300 369800 371400 372400 373200 373800 374800 376200 376800 376300 374900 373800 373900 374700 375300 375000 374700 376300 378100 378000 377700 378300 380000 383100 385900 388100 389700 391800 393400 394700 394900 395700 396400 397500 398900 401200 403200 405700 408300 408800 408000 410100 412200 412200
93144 79936 El Paso TX El Paso El Paso 5 59100 60500 60900 60800 60300 60400 61200 61700 61000 60100 59300 59000 58700 58400 58000 57800 57900 57800 57800 58100 58400 58700 59200 59400 58700 58100 57900 57900 57700 57600 57500 57800 58000 58000 57900 57800 57800 58000 58500 58700 59000 59200 59300 59500 59900 60300 60400 60300 60300 60000 59500 59400 59800 59900 59700 59500 59400 59500 59700 59700 59200 58700 58400 57800 57000 56700 56800 56600 56500 56500 56600 56700 56600 56700 57000 57300 57300 57300 57100 56900 56900 57000 56700 56700 57000 57400 57700 58000 58300 58600 58800 59100 59500 60200 61100 61700 62400 63100 63600 63800 64400 64900 65100 65200 65300 65600 66300 67100 67900 68800 69600 70600 71500 72100 72600 73400 74200 74500 75000 75700 76400 77100 77700 78100 78700 79600 80600 81500 82300 83200 83900 84400 84300 84400 85200 86100 86800 87400 87700 87800 88300 88800 88500 87900 87500 86600 85500 84600 84200 83900 83600 83400 83500 84100 84600 84600 84700 84600 84600 85100 85500 85800 86600 87600 86500 84200 83300 83800 83800 83600 83600 83400 82900 82400 82200 82100 82300 82800 82900 82600 82700 83100 83300 83100 82800 82500 82300 82200 82300 82200 81900 81700 82100 82600 82900 83000 83000 82900 82100 81200 80800 80700 81200 81800 81800 81400 81400 81500 81900 82000 81900 81900 82100 82100 81500 80800 80300 80100 80100 80700 81200 81700 81900 81700 81500 81700 81700 80900 81000 81500 81400 80500 80000 80100 80500 80800 81400 82300 82600 82600 82500 82500 82600 82700 82600 82400 82300 82400 82300 82500 83200 83900 84100 83900 83700
84640 60640 Chicago IL Chicago Cook 6 123300 122600 122000 121500 120900 120600 120900 121300 121600 122100 122900 124200 125300 126100 126700 127900 129300 130400 131300 131700 132300 133500 134500 134800 135200 135500 136300 137700 139600 141600 143400 144500 145600 147400 149200 149900 150500 151700 152800 153900 156400 159400 161800 163800 165700 167100 168400 169600 171000 172400 174800 177900 180400 182300 184600 187700 190700 193100 196100 200000 202900 205500 207600 208600 209200 210000 210200 211300 213000 214100 215200 217600 220200 222400 224600 227000 228600 230100 232400 234300 235300 236200 237000 237900 239200 241300 243600 244200 243800 244500 245500 245700 246400 247500 248000 249000 251100 252900 253700 256100 259400 261300 262200 263800 265300 267600 270500 272800 273700 273600 273200 273300 273500 273800 274300 274400 274400 275300 276600 278200 280600 283100 284400 284200 283000 282000 282400 283700 285200 286700 286700 284900 283300 282700 282100 280800 280300 280200 280100 279900 281200 282300 282100 279300 275800 273100 272600 271800 270300 268600 266900 264800 262900 260700 258500 256600 255000 252900 250600 248900 248900 250100 250300 250400 248000 243800 240200 239700 239000 238200 236800 235400 233300 231300 230300 228200 225200 223100 222300 220700 218300 215900 213200 210900 209600 208200 205900 204000 203200 202500 200800 199400 199900 201900 204500 207000 208100 207100 205300 204700 204700 205800 208600 211800 213500 213800 215100 218400 221500 223900 226100 227900 229100 230200 230600 230400 230000 229000 227600 226100 225700 226200 226500 226500 227100 227800 229400 231800 234100 235400 235100 233900 233700 235300 237200 238500 239300 239600 239500 240200 242700 244900 247700 249500 248800 247000 247300 248700 250800 252800 253800 253800 253400 254100 255100

Data Quality Check

Missing Values

Median Price for early years (1996-2013) has plenty of Nulls as shown in the table below. This is not consistent across all Regionnames. Steps are taken in the following section to filter columns with higher percentage of Nulls/NA.

numMissingVal <-sapply(cost, function(x) sum(length(which(is.na(x)))))  
kable(as.data.frame(numMissingVal)) %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
numMissingVal
RegionID 0
RegionName 0
City 0
State 0
Metro 250
CountyName 0
SizeRank 0
1996-04 2662
1996-05 2582
1996-06 2582
1996-07 2577
1996-08 2576
1996-09 2576
1996-10 2576
1996-11 2566
1996-12 2566
1997-01 2542
1997-02 2113
1997-03 2093
1997-04 2093
1997-05 2093
1997-06 2091
1997-07 2091
1997-08 1994
1997-09 1991
1997-10 1991
1997-11 1988
1997-12 1984
1998-01 1967
1998-02 1822
1998-03 1821
1998-04 1865
1998-05 1973
1998-06 1961
1998-07 1776
1998-08 1742
1998-09 1742
1998-10 1730
1998-11 1709
1998-12 1707
1999-01 1706
1999-02 1689
1999-03 1688
1999-04 1688
1999-05 1685
1999-06 1675
1999-07 1675
1999-08 1652
1999-09 1652
1999-10 1652
1999-11 1652
1999-12 1652
2000-01 1652
2000-02 1633
2000-03 1633
2000-04 1631
2000-05 1631
2000-06 1631
2000-07 1626
2000-08 1598
2000-09 1598
2000-10 1598
2000-11 1598
2000-12 1598
2001-01 1679
2001-02 1580
2001-03 1553
2001-04 1552
2001-05 1552
2001-06 1551
2001-07 1551
2001-08 1546
2001-09 1546
2001-10 1546
2001-11 1545
2001-12 1545
2002-01 1545
2002-02 1538
2002-03 1538
2002-04 1537
2002-05 1537
2002-06 1537
2002-07 1537
2002-08 1524
2002-09 1522
2002-10 1521
2002-11 1521
2002-12 1521
2003-01 1520
2003-02 1499
2003-03 1498
2003-04 1498
2003-05 1498
2003-06 1498
2003-07 1498
2003-08 1485
2003-09 1484
2003-10 1483
2003-11 1482
2003-12 1480
2004-01 1479
2004-02 1447
2004-03 1428
2004-04 1428
2004-05 1428
2004-06 1426
2004-07 1426
2004-08 1403
2004-09 1398
2004-10 1392
2004-11 1389
2004-12 1389
2005-01 1388
2005-02 1321
2005-03 1305
2005-04 1303
2005-05 1303
2005-06 1303
2005-07 1303
2005-08 1288
2005-09 1288
2005-10 1286
2005-11 1285
2005-12 1285
2006-01 1284
2006-02 1260
2006-03 1260
2006-04 1259
2006-05 1259
2006-06 1259
2006-07 1259
2006-08 1250
2006-09 1249
2006-10 1248
2006-11 1248
2006-12 1248
2007-01 1247
2007-02 1242
2007-03 1240
2007-04 1239
2007-05 1239
2007-06 1238
2007-07 1238
2007-08 1226
2007-09 1226
2007-10 1226
2007-11 1226
2007-12 1226
2008-01 1226
2008-02 1225
2008-03 1225
2008-04 1225
2008-05 1225
2008-06 1225
2008-07 1225
2008-08 1194
2008-09 1194
2008-10 1194
2008-11 1194
2008-12 1194
2009-01 1193
2009-02 1190
2009-03 1184
2009-04 1178
2009-05 1178
2009-06 1146
2009-07 1146
2009-08 1095
2009-09 1095
2009-10 1095
2009-11 1094
2009-12 1091
2010-01 1091
2010-02 1082
2010-03 971
2010-04 959
2010-05 944
2010-06 923
2010-07 903
2010-08 177
2010-09 177
2010-10 174
2010-11 174
2010-12 174
2011-01 163
2011-02 151
2011-03 149
2011-04 149
2011-05 149
2011-06 147
2011-07 134
2011-08 121
2011-09 120
2011-10 120
2011-11 120
2011-12 118
2012-01 108
2012-02 93
2012-03 91
2012-04 91
2012-05 91
2012-06 87
2012-07 84
2012-08 67
2012-09 66
2012-10 65
2012-11 65
2012-12 65
2013-01 64
2013-02 31
2013-03 26
2013-04 21
2013-05 21
2013-06 20
2013-07 20
2013-08 0
2013-09 0
2013-10 0
2013-11 0
2013-12 0
2014-01 0
2014-02 0
2014-03 0
2014-04 0
2014-05 0
2014-06 0
2014-07 0
2014-08 0
2014-09 0
2014-10 0
2014-11 0
2014-12 0
2015-01 0
2015-02 0
2015-03 0
2015-04 0
2015-05 0
2015-06 0
2015-07 0
2015-08 0
2015-09 0
2015-10 0
2015-11 0
2015-12 0
2016-01 0
2016-02 18
2016-03 18
2016-04 18
2016-05 18
2016-06 0
2016-07 0
2016-08 0
2016-09 0
2016-10 0
2016-11 0
2016-12 3
2017-01 0
2017-02 0
2017-03 0
2017-04 0
2017-05 0
2017-06 0

Negative or Zero Valued Columns

There are no Negative or Zero Values in all the non character columns(Int/Numeric) as shown in the following table.

costCharCol <- colnames(cost %>% ungroup() %>% select_if(is.character))

costZeroNeg <-
  sapply(cost[, !(names(cost) %in% costCharCol)], function(x)
  count(x <= 0, na.rm = TRUE))
kable(as.data.frame(costZeroNeg)) %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
costZeroNeg
RegionID 0
SizeRank 0
1996-04 0
1996-05 0
1996-06 0
1996-07 0
1996-08 0
1996-09 0
1996-10 0
1996-11 0
1996-12 0
1997-01 0
1997-02 0
1997-03 0
1997-04 0
1997-05 0
1997-06 0
1997-07 0
1997-08 0
1997-09 0
1997-10 0
1997-11 0
1997-12 0
1998-01 0
1998-02 0
1998-03 0
1998-04 0
1998-05 0
1998-06 0
1998-07 0
1998-08 0
1998-09 0
1998-10 0
1998-11 0
1998-12 0
1999-01 0
1999-02 0
1999-03 0
1999-04 0
1999-05 0
1999-06 0
1999-07 0
1999-08 0
1999-09 0
1999-10 0
1999-11 0
1999-12 0
2000-01 0
2000-02 0
2000-03 0
2000-04 0
2000-05 0
2000-06 0
2000-07 0
2000-08 0
2000-09 0
2000-10 0
2000-11 0
2000-12 0
2001-01 0
2001-02 0
2001-03 0
2001-04 0
2001-05 0
2001-06 0
2001-07 0
2001-08 0
2001-09 0
2001-10 0
2001-11 0
2001-12 0
2002-01 0
2002-02 0
2002-03 0
2002-04 0
2002-05 0
2002-06 0
2002-07 0
2002-08 0
2002-09 0
2002-10 0
2002-11 0
2002-12 0
2003-01 0
2003-02 0
2003-03 0
2003-04 0
2003-05 0
2003-06 0
2003-07 0
2003-08 0
2003-09 0
2003-10 0
2003-11 0
2003-12 0
2004-01 0
2004-02 0
2004-03 0
2004-04 0
2004-05 0
2004-06 0
2004-07 0
2004-08 0
2004-09 0
2004-10 0
2004-11 0
2004-12 0
2005-01 0
2005-02 0
2005-03 0
2005-04 0
2005-05 0
2005-06 0
2005-07 0
2005-08 0
2005-09 0
2005-10 0
2005-11 0
2005-12 0
2006-01 0
2006-02 0
2006-03 0
2006-04 0
2006-05 0
2006-06 0
2006-07 0
2006-08 0
2006-09 0
2006-10 0
2006-11 0
2006-12 0
2007-01 0
2007-02 0
2007-03 0
2007-04 0
2007-05 0
2007-06 0
2007-07 0
2007-08 0
2007-09 0
2007-10 0
2007-11 0
2007-12 0
2008-01 0
2008-02 0
2008-03 0
2008-04 0
2008-05 0
2008-06 0
2008-07 0
2008-08 0
2008-09 0
2008-10 0
2008-11 0
2008-12 0
2009-01 0
2009-02 0
2009-03 0
2009-04 0
2009-05 0
2009-06 0
2009-07 0
2009-08 0
2009-09 0
2009-10 0
2009-11 0
2009-12 0
2010-01 0
2010-02 0
2010-03 0
2010-04 0
2010-05 0
2010-06 0
2010-07 0
2010-08 0
2010-09 0
2010-10 0
2010-11 0
2010-12 0
2011-01 0
2011-02 0
2011-03 0
2011-04 0
2011-05 0
2011-06 0
2011-07 0
2011-08 0
2011-09 0
2011-10 0
2011-11 0
2011-12 0
2012-01 0
2012-02 0
2012-03 0
2012-04 0
2012-05 0
2012-06 0
2012-07 0
2012-08 0
2012-09 0
2012-10 0
2012-11 0
2012-12 0
2013-01 0
2013-02 0
2013-03 0
2013-04 0
2013-05 0
2013-06 0
2013-07 0
2013-08 0
2013-09 0
2013-10 0
2013-11 0
2013-12 0
2014-01 0
2014-02 0
2014-03 0
2014-04 0
2014-05 0
2014-06 0
2014-07 0
2014-08 0
2014-09 0
2014-10 0
2014-11 0
2014-12 0
2015-01 0
2015-02 0
2015-03 0
2015-04 0
2015-05 0
2015-06 0
2015-07 0
2015-08 0
2015-09 0
2015-10 0
2015-11 0
2015-12 0
2016-01 0
2016-02 0
2016-03 0
2016-04 0
2016-05 0
2016-06 0
2016-07 0
2016-08 0
2016-09 0
2016-10 0
2016-11 0
2016-12 0
2017-01 0
2017-02 0
2017-03 0
2017-04 0
2017-05 0
2017-06 0

Duplicated Rows

No Duplicated rows were found in the dataset.

cost[which(duplicated(cost) ==T),] 

Data Filtering

In order to reduce number of columns to human-interpretable level, cost price across every year between 1996-2017 is calculated and tabulated in a different table for analysis. Median is chosen as opposed to Mean to guard against large outlier values (2008 Recession/Housing Price crash) - which may lead to flawed assumptions.

How is it Achieved?

  1. Account for Median Cost price across Individual years starting 1996 to 2017 tabulated into different table.
  2. Analyze the median Cost price trends( 2 and 5 Years) across different regionNames for plausible pattern.
  3. Make assumptions based on these trends to calculate a singular Median Cost value across various regionNames.
  4. Join with original Cost.

Trivia
CityName, StartYear and endYear are derived from the values defined in config file. This scalable feature can account for other cities and years for newer dataset based on requirement.

#Selecting the city in the dataset
cost <- cost %>% filter(City == cityName)

#Creating a list of the years
yearList <- as.character(as.list(startYear:endYear))

medianYears <- data.frame()
for (i in yearList) {
  year <- i 
  medianYearCol <- cost %>% select(starts_with(i), RegionName) %>% mutate(Median = rowMedians(as.matrix(select(., -matches("RegionName")))))
  medianYearCol <- medianYearCol %>% select(RegionName,Median)
  medianYearCol$year<- rep(year, nrow(medianYearCol))
  medianYears<-rbind(medianYears,medianYearCol)
}

Median Price Change Over 5 Years

Upper half(Orange) of line graph highlights regionNames with significant increase in the median cost price. With RegionName = 10003, 10014 & 10011 median cost price increasing by 0.5 Mn in a matter of 5 years (2013-2017).

Lower half of the plot sees little or no increase. In further sections, each zipcode will further analyzed to reason out these trends.

medianPriceAll <-
  medianYears %>% filter(year > (endYear - 5)) %>% ggplot(aes(
  x = year,
  y = Median,
  group = RegionName,
  colour = RegionName
  )) + geom_line() + geom_point() + scale_y_continuous(labels = scales::comma)
ggplotly(medianPriceAll)

Median Price Change Over 2 Years

Growth/Increase in median cost price measured for recent 2 consecutive years (2016,2017) shares a different story. The market has been dry with little or no fluctuation. Median Cost for regionName - 10003 which had an upward trend for over 4-5 years (2013-2016) has dropped by 0.1 mn. Rest of the regionNames have little or no increase.

medianPriceAll <-
  medianYears %>% filter(year > (endYear - 2)) %>% ggplot(aes(
  x = year,
  y = Median,
  group = RegionName,
  colour = RegionName
  )) + geom_line() + geom_point() + scale_y_continuous(labels = scales::comma)
ggplotly(medianPriceAll)

Clean Cost Data

From the above analysis, it is safe to assume that Median Cost Price for Homes have rather been stagnant for the past 2 years. To reduce the complexicity of data in hand, median cost price for year (2017) is chosen to be the actual price of the individual 2 bedroom properties across every regionName(zipcode). Decisions made here on will be based on this assumption.

# Choosing the Last One Year Median Price 
currentMedianPrice <- medianYears %>% filter(year == endYear)


cost <-
  cost %>% select(RegionName, CountyName, SizeRank) %>% inner_join(currentMedianPrice, by = "RegionName")
cost <- rename(cost, cost = Median)

kable(head(cost))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
RegionName CountyName SizeRank cost year
10025 New York 1 1342900 2017
10023 New York 3 1988700 2017
10128 New York 14 1622500 2017
10011 New York 15 2354000 2017
10003 New York 21 2005500 2017
11201 Kings 32 1400200 2017

Revenue Data

Revenue data contains a mix of information including details about the properties like address, zipcode, bedrooms, bathrooms to information about host, daily/weekly and monthly price details for stay.

kable(head(revenue))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview notes transit access interaction house_rules thumbnail_url medium_url picture_url xl_picture_url host_id host_url host_name host_since host_location host_about host_response_time host_response_rate host_acceptance_rate host_is_superhost host_thumbnail_url host_picture_url host_neighbourhood host_listings_count host_total_listings_count host_verifications host_has_profile_pic host_identity_verified street neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed city state zipcode market smart_location country_code country latitude longitude is_location_exact property_type room_type accommodates bathrooms bedrooms beds bed_type amenities square_feet price weekly_price monthly_price security_deposit cleaning_fee guests_included extra_people minimum_nights maximum_nights calendar_updated has_availability availability_30 availability_60 availability_90 availability_365 calendar_last_scraped number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value requires_license license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
7949480 https://www.airbnb.com/rooms/7949480 2.01705e+13 2017-05-03 City Island Sanctuary relaxing BR & Bath w Parking Come relax on City Island in our quiet guest room w/adjacent private bath. Enjoy a Continental Breakfast served by Didier at your table downstairs. Walk to restaurants, shops, the park or the beach. Park one car on our private driveway. If you are looking to explore Manhattan you can take the 29 bus to the nearest subway number 6. Or you can take the BXM8 express bus at the Pelham Bay Subway station which goes directly into Manhattan. There are also green taxis at the subway. On parle français et anglais, (lire Français ci-dessous). I am a native New Yorker and Didier grew up in Biarritz, France. Our three friendly cats do not go into the room. The bathroom is yours during your entire stay (we have our own en suite). We will provide a Continental breakfast with fresh bread butter & jams, fresh squeezed orange juice or fresh fruit, tea or coffee served by my French husband who is a talented cook! You might even get a special treat of canelés de Bordeaux, banana bread or home made scones! City Island Sanctuary relaxing BR & Bath w Parking Pièce calme, ensoleillée et à proximité d’une salle de bain privée. Marchez jusqu, au parc ou à la plage. On peut aller à pied à quelques restaurants, une petite supermarché, petits magasins et le City Island Nautical Museum. Prenez le metro ou le bus express à Manhattan (une heure et demi à centre-ville. Les chats sont en résidence, mais ne vont pas dans la chambre. Cable TV avec TV5 Monde, lecteur DVD, Wi-fi . Petit Come relax on City Island in our quiet guest room w/adjacent private bath. Enjoy a Continental Breakfast served by Didier at your table downstairs. Walk to restaurants, shops, the park or the beach. Park one car on our private driveway. If you are looking to explore Manhattan you can take the 29 bus to the nearest subway number 6. Or you can take the BXM8 express bus at the Pelham Bay Subway station which goes directly into Manhattan. There are also green taxis at the subway. On parle français et anglais, (lire Français ci-dessous). I am a native New Yorker and Didier grew up in Biarritz, France. Our three friendly cats do not go into the room. The bathroom is yours during your entire stay (we have our own en suite). We will provide a Continental breakfast with fresh bread butter & jams, fresh squeezed orange juice or fresh fruit, tea or coffee served by my French husband who is a talented cook! You might even get a special treat of canelés de Bordeaux, banana bread or home mad none City Island is a unique sanctuary in New York City’s the Bronx. It boasts many wonderful seafood restaurants, two great pubs and the only true French bistro in the borough. We are walking distance to the largest park in the City of New York, Pelham Bay Park and also to lovely Orchard Beach, the Riviera of the Bronx! We are walkable to Pelham Park which is the largest park in NYC. It has picnic areas, tennis courts, horseback riding, Orchard Beach and a lovely rocky coast. Check out their website for all there is to do! On the island, you can walk to everything. A local bus will take you to shopping at the Mall at Bay Plaza near Coop City. If you want to go to Manhattan you take the local 29 bus to the subway No. 6 to Manhattan. The local bus does not run after midnight but there are green taxis at the subway station. There is an express bus early in the morning which goes directly from the island to Manhattan. It will take at least one hour to get to midtown by subway and a bit longer on the express bus depending upon traffic. Feel free to enjoy our large backyard in season, our downstairs large recreation area with TV, WIFI. Walk to the local beaches for the view or a dip. Please, no food or drinks except water in the guest bedroom. We speak fluent French and English. We will be in residence during your stay to help you with recommendations for a real New York experience or a unique City Island French experience. But, we do respect your privacy! No extra guests or visitors who are not in the reservation. Maximum 2 registered guests only. No late check ins or check outs. Please, no food or drinks in the guest bedroom except water. No smoking on the property. It’s a residential neighborhood so we keep it quiet after 10. Please only inquire about booking for yourself, we will not accept 3rd party inquiries. https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=small https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=large https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=x_large 119445 https://www.airbnb.com/users/show/119445 Linda & Didier 2010-05-06 New York, New York, United States I am a designer, former instructor at a design school and my husband is a talented French cook (he is from France). He is also an artist, writer and a Zen practitioner. We are sociable but respectful of your privacy.
I am a native New Yorker having relocated back to NYC from So Cal because I Love New York! We are the only French/American Airbnb in the Bronx!
within an hour 100% N/A t https://a0.muscache.com/im/pictures/2de30fe4-bebe-4145-a86a-b8a8a970078d.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/2de30fe4-bebe-4145-a86a-b8a8a970078d.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’, ‘reviews’, ‘jumio’] t t City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.85205 -73.78868 t House Private room 2 1 1 1 Real Bed {“Cable TV”,“Wireless Internet”,“Air conditioning”,“Free parking on premises”,Breakfast,“Pets live on this property”,Cat(s),“Indoor fireplace”,Heating,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”} NA $99.00 NA NA $100.00 NA 1 $20.00 1 7 yesterday NA 24 54 80 170 2017-05-03 25 2016-01-18 2017-04-23 100 10 10 10 10 10 10 f NA NA f moderate t t 1 1.59
16042478 https://www.airbnb.com/rooms/16042478 2.01705e+13 2017-05-04 WATERFRONT STUDIO APARTMENT My place is close to Sea Shore. You’ll love my place because of the coziness, the location, the views, and the people. My place is good for couples, solo adventurers, business travelers, and families (with kids). (URL HIDDEN) My place is close to Sea Shore. You’ll love my place because of the coziness, the location, the views, and the people. My place is good for couples, solo adventurers, business travelers, and families (with kids). (URL HIDDEN) public transport from Grand Central, NYC or from JFK, Laguadia is about 30 minutes away. (URL HIDDEN) Fine sea food restaurant, bars and night clubs all within walking distance, Sailing, fishing, canoeing, marinas and just 30 minutes away from central NYC or major airports like JFK or LaGuardia airports. Enjoy NYC none (URL HIDDEN) Fine sea food restaurant, bars and night clubs all within walking distance, Sailing, fishing, canoeing, marinas and just 30 minutes away from central NYC or major airports like JFK or LaGuardia airports. Enjoy NYC NA public transport from Grand Central, NYC or from JFK, Laguadia is about 30 minutes away. NA NA https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=small https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=large https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=x_large 9117975 https://www.airbnb.com/users/show/9117975 Collins 2013-09-29 New York, New York, United States I am married with 3 children and 2 grandkid. Family and community is the most important thing besides God. I am a Tax consultant and a I have travelled to a few countries including Germany, France, Holland, Netherlands, England in Europe. I have also travel extensively within the Caribbean and Mexico so I appreciate the importance of comfortable and affordable lodging when one is away. a few days or more 0% N/A f https://a0.muscache.com/im/users/9117975/profile_pic/1423794446/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/9117975/profile_pic/1423794446/original.jpg?aki_policy=profile_x_medium City Island 1 1 [‘phone’, ‘facebook’] t f City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.85349 -73.78861 t Apartment Private room 4 1 1 1 Real Bed {TV,Internet,“Wireless Internet”,“Air conditioning”,Kitchen,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron} NA $200.00 NA NA NA NA 1 $0.00 7 29 6 months ago NA 30 60 90 180 2017-05-04 0 NA NA NA NA NA NA NA NA NA f NA NA t flexible f f 1 NA
1886820 https://www.airbnb.com/rooms/1886820 2.01705e+13 2017-05-04 Quaint City Island Community. Quiet island boating town on Long Island Sound. 10 miles from Manhattan. Public trans. Walk to shops, restaurants, bars,parks. Near Botanical gardens, Bronx Zoo and Fordham University. Water view. off street parking. 3BR, 3 Full Baths. Washer/dryer. full kitchen.Fully furnished comfortable home with deck/backyard. local caretaker. Master bed with queen bed, full bath and office. Mid bedroom has queen bed. Small bed has single. TV room/den queen sleeper sofa. Galley kitchen off dining room. Piano. Winter water view. Large back deck with seating. Small front porch. Great for family of 4. Will fit 7. Additional$25.00 per person per night over 4 people. Quiet island boating town on Long Island Sound. 10 miles from Manhattan. Public trans. Walk to shops, restaurants, bars,parks. Near Botanical gardens, Bronx Zoo and Fordham University. Water view. off street parking. 3BR, 3 Full Baths. Washer/dryer. full kitchen.Fully furnished comfortable home with deck/backyard. local caretaker. Master bed with queen bed, full bath and office. Mid bedroom has queen bed. Small bed has single. TV room/den queen sleeper sofa. Galley kitchen off dining room. Piano. Winter water view. Large back deck with seating. Small front porch. Great for family of 4. Will fit 7. Additional$25.00 per person per night over 4 people. 2 story 1800 sq ft house all to yourselves. Local caretaker available during stay. We will be available by phone Small New England type town in the middle of the big city. Small Island in the Long Island Sound. Local marinas, seafood restaurants, yacht clubs, supermarket, library, shops. Parking available in driveway. Bus stop none Small New England type town in the middle of the big city. Small Island in the Long Island Sound. Local marinas, seafood restaurants, yacht clubs, supermarket, library, shops. NA Parking available in driveway. Bus stop around the corner. Near Orchard Beach, Bronx Zoo, World Famous Botanical Gardens, Fordham University and of course, Manhattan 2 story 1800 sq ft house all to yourselves. Local caretaker available during stay. We will be available by phone No smoking. No pets. This is our private home. We expect that you would treat it with care and enjoy it as we do. NA NA https://a0.muscache.com/im/pictures/26266928/14c688d9_original.jpg?aki_policy=large NA 9815788 https://www.airbnb.com/users/show/9815788 Steve 2013-11-04 US NA N/A N/A N/A f https://a0.muscache.com/im/users/9815788/profile_pic/1383845917/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/9815788/profile_pic/1383845917/original.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’] t f City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.84114 -73.78305 t House Entire home/apt 4 3 3 3 Real Bed {TV,“Cable TV”,Internet,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,Heating,Washer,Dryer} NA $300.00 NA NA $800.00 $100.00 4 $25.00 7 90 11 months ago NA 30 60 90 365 2017-05-04 0 NA NA NA NA NA NA NA NA NA f NA NA f strict f f 1 NA
6627449 https://www.airbnb.com/rooms/6627449 2.01705e+13 2017-05-05 Large 1 BDRM in Great location This ground floor apartment is light and airy with a fully equipped kitchen. Located in the heart of City Island w/ shops, restaurants and transportation nearby. If needed there is extra sleeping space on the modular sofa. We are close to fishing, boating, biking, horseback riding or hiking. We are near Orchard Beach and Pelham Bay Park. This is a very convenient area. This ground floor apartment is light and airy with a fully equipped kitchen. Located in the heart of City Island w/ shops, restaurants and transportation nearby. If needed there is extra sleeping space on the modular sofa. We are close to fishing, boating, biking, horseback riding or hiking. We are near Orchard Beach and Pelham Bay Park. This is a very convenient area. The entire apartment is available as I am only there part of the time. There is a patio in back off the living room. You are welcome to help yourself to coffee or tea. My daughter and her family live on the Island and will be there to greet you and help you out with any information you might like. City Island is a unique and a hidden gem of New York City. Many New Yorkers do not even know it exists. It is known for its seafood and sailing, with many seafood restaurants in various price ranges. We are close to grocery stores, library, City Island Diner, antique stores and art galleries. It is a very safe and walk none City Island is a unique and a hidden gem of New York City. Many New Yorkers do not even know it exists. It is known for its seafood and sailing, with many seafood restaurants in various price ranges. We are close to grocery stores, library, City Island Diner, antique stores and art galleries. It is a very safe and walkable neighborhood. City Island is only 11/2 miles long and (3) blocks wide at its widest, with 2 seafood restaurants at the end with outdoor do it yourself seating There is off street parking in the apartment complex. I have a reserved space which you would be able to use. We are (2) blocks from the BX29 bus which you can take to the Pelham Bay subway station. ( a 10 minute ride) From there you can transfer to the #6 train to Manhattan; about 45 minutes. The entire apartment is available as I am only there part of the time. There is a patio in back off the living room. You are welcome to help yourself to coffee or tea. My daughter and her family live on the Island and will be there to greet you and help you out with any information you might like. There is no smoking and no pets. Otherwise use the place as you would your own. https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=small https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=large https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=x_large 13886510 https://www.airbnb.com/users/show/13886510 Arlene 2014-04-04 Los Angeles, California, United States NA within a few hours 100% N/A f https://a0.muscache.com/im/users/13886510/profile_pic/1433129533/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/13886510/profile_pic/1433129533/original.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’, ‘reviews’, ‘kba’] t t City Island, City Island, NY 10464, United States City Island City Island Bronx City Island NY 10464 New York City Island, NY US United States 40.84977 -73.78661 t Apartment Entire home/apt 3 1 1 1 Real Bed {TV,Internet,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,“Buzzer/wireless intercom”,Heating,“Family/kid friendly”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron} NA $125.00 $775.00 NA NA $75.00 1 $0.00 3 21 2 weeks ago NA 8 30 60 335 2017-05-05 12 2015-07-04 2016-10-24 93 10 10 10 10 10 10 f NA NA f strict f f 1 0.54
5557381 https://www.airbnb.com/rooms/5557381 2.01705e+13 2017-05-04 Quaint City Island Home Located in an old sea-shanty town, our home has a countryside feel, only an hour away from Manhattan’s excitement! With a bed big enough for two it’s ideal for business-(wo)men or out-of-towners who don’t want the hustle/bustle to follow them home. You won’t find a place so close to the city (NYC of course) with such a big back yard! There’s lots to do. We are only a train ride away from Manhattan in all it’s glory - we live a bus ride or a 10 min car ride from the Pelham Bay station on the 6 line, to Grand Central it’s less than an hour and a half by public transport, less than an hour if you drive to the train - and there’s plenty to do on City Island as well (or ‘the island’ as we locals call it). There are oodles of seafood restaurants withing walking distance (including a wonderful diner just a block away that has the best breakfast), swimming in Eastchester Bay during the summer, bike trails, Pelham Bay Park and Orchard beach! If you’re looking for some combination of the NYC life with outdoor adventure, this is the place for you! DISCLAIMER: If you are looking for a place from which to explore Manhattans night life, this is probably not the place for you. The public bus that runs from the end of the 6 train onto City Isl Located in an old sea-shanty town, our home has a countryside feel, only an hour away from Manhattan’s excitement! With a bed big enough for two it’s ideal for business-(wo)men or out-of-towners who don’t want the hustle/bustle to follow them home. You won’t find a place so close to the city (NYC of course) with such a big back yard! There’s lots to do. We are only a train ride away from Manhattan in all it’s glory - we live a bus ride or a 10 min car ride from the Pelham Bay station on the 6 line, to Grand Central it’s less than an hour and a half by public transport, less than an hour if you drive to the train - and there’s plenty to do on City Island as well (or ‘the island’ as we locals call it). There are oodles of seafood restaurants withing walking distance (including a wonderful diner just a block away that has the best breakfast), swimming in Eastchester Bay during the summer, bike trails, Pelham Bay Park and Orchard beach! If you’re looking for some combination of the NYC lif none City Island is unique in two ways. First, you get the small community feel, juxtaposed with NEW YORK CITY. Weird right? A small community, surrounded by water and parkland, so close to such a city! The second reason I love my neighborhood is it’s history. City Island used to be a oyster/fishing village (there are still places to rent boats and fishing equipment for those interested), and many inhabitants were boat builders ((website hidden)). City Island even has it’s own Nautical Museum, if you need a weekend afternoon activity. I am going to put this in a few places (also above in the overall description) as this has been a little bit of an issue for past guests: If you are looking for a place from which to explore Manhattans night life, this is probably not the place for you. The public bus that runs from the end of the 6 train onto City Island stops running a little before midnight. However, if you only want to do day trips, explore the Bronx (or Queens), and come back to a quiet house with no traffic sounds, this is the perfect location. The easiest way to get to Manhattan quickly is to drive to the Hunts Point station on the 6 train (this gets you to midtown in a little under an hour). If driving isn’t your thing, there is a public bus (the Bx29) that runs from City Island to the Pelham Bay station (the last station on the 6 line in the Bronx), this option will take you a little less than an hour and a half to get to midtown manhattan. Fare is now $2.75 per ride, and comes with a free transfer. Parking is available on the street in front of the property (we do have a driveway in a pinch, but it’s a little inconvenient). The house is just a place to sleep (and cook and shower). We have a great big back yard (by city standards) with a picnic table for outdoor diners. There is also a garage space with weights and some boxing equipment for the exercise-inclined. I live in the back house (there are two houses on the property, my father lives in the front house) full time. The house is generally empty during weekdays, but the living room, kitchen and bathroom are shared spaces, so interaction will be inevitable. However, I’m pretty laid back. As long as you do your dishes! ;) Only three actual rules: 1. Don’t smoke in the house. There is a beautiful back yard for that. 2. Strip the bed before you check out and leave the sheets in a pile on the bed so I can just grab them and throw them in the wash. 3. Take out the trash on our way out as you leave. The garbage is located to the left of the stair as you’re exiting the front door. Other than that, I’m a very reasonable person. I’ll try to respect your space and clean up after myself while you’re here, and I expect you to do the same. Nobody gets everything exactly right the first time and 2nd chances make the world go round, but that doesn’t mean we shouldn’t do our best. I love pets. Furry, scaly, feathery, doesn’t matter. Pets are not only allowed, they’re encouraged! I don’t have a pet at the moment, but that very well may change! https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=small https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=large https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=x_large 28811542 https://www.airbnb.com/users/show/28811542 Phoebe 2015-03-05 New York, New York, United States

My name is Phoebe. Both my parents are biologists, and it looks like it’s contagious. My favorite mode of travel is my own two feet (sometimes with the help of a canoe). I have many passions, but two of them are learning about the natural world, and showing people how wonderful it is! I’ve lead canoe trips in northern Ontario, trail crews in Colorado and Virginia, and I’m always looking for the next adventure.

I’ve been a part of the Air BnB community since 2015, and I love meeting new people and exchanging stories. I keep my plate very full, so I do not cook for guests, and sometimes my house gets a little messy, but I make sure to keep common areas clear and clean for your use. I have refurbished the house mostly on my own (with help from friends and family for the heavy lifting of course), and I’m welcome to suggestions for updates!

(you’ll see french on the list of languages spoken, I speak only a little french because I went to school on Montrèal)
within an hour 100% N/A f https://a0.muscache.com/im/pictures/43cb9adc-f198-47cb-ba44-8a737440e825.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/43cb9adc-f198-47cb-ba44-8a737440e825.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’, ‘reviews’, ‘jumio’] t t City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.85002 -73.78933 t House Private room 4 1 1 1 Real Bed {Internet,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,“Pets allowed”,Gym,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,Essentials,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_50”} NA $69.00 $350.00 $1,200.00 NA $17.00 2 $15.00 3 1125 yesterday NA 17 47 77 352 2017-05-04 86 2015-05-25 2017-04-22 97 10 10 10 10 10 10 f NA NA t moderate f f 1 3.63
9147025 https://www.airbnb.com/rooms/9147025 2.01705e+13 2017-05-04 Cozy City Island Cottage City Island is a small community at the edge of New York City just beyond Pelham Bay Park in the Bronx and surrounded by the waters of the Long Island Sound and Eastchester Bay. Enjoy the water views and seafood restaurants or nearby Orchard Beach. This cozy little cottage is an open floor plan with a queen size murphy bed. There is brand new bathroom, eat in kitchen, large fenced in yard, perfect for BBQ’s or sitting under the full grown apple tree. The living room has lots of windows and a partial view of the sound and Manhattan. There are two bikes available for use. The location is walking distance to many of the islands best eateries, bars, Nautical Museum, and historic City Island Cemetery . Beautiful Pelham Park is close by, where you can enjoy the boardwalk at Orchard Beach or hike and bike the many trails. Fishing charters also available. Enjoyable get away any time of the year. You will forget you are even in the Bronx, and be transported to New England. City Island is a hidden gem. City Island is a small community at the edge of New York City just beyond Pelham Bay Park in the Bronx and surrounded by the waters of the Long Island Sound and Eastchester Bay. Enjoy the water views and seafood restaurants or nearby Orchard Beach. This cozy little cottage is an open floor plan with a queen size murphy bed. There is brand new bathroom, eat in kitchen, large fenced in yard, perfect for BBQ’s or sitting under the full grown apple tree. The living room has lots of windows and a partial view of the sound and Manhattan. There are two bikes available for use. The location is walking distance to many of the islands best eateries, bars, Nautical Museum, and historic City Island Cemetery . Beautiful Pelham Park is close by, where you can enjoy the boardwalk at Orchard Beach or hike and bike the many trails. Fishing charters also available. Enjoyable get away any time of the year. You will forget you are even in the Bronx, and be transported to New England. City Island is a hi none City Island is famous for it’s seafood restaurants both large and small, casual and fancy. The Crab Shanty, Sammy’s, The Lobster Box, Portofino’s and the Seashore, which also has a large bar with live music on Fridays and boasts a happy hour with drink specials. There are also several other restaurants to choose from that have a variety of offerings. A City Island local favorite is The Black Whale, which also has a sunday bunch, Arties; italain food, Don Coqui, Puerto Rician food, Bistro SK, french, and Ohana with table side hibachi. End your night at Paddy’s On The Island, which is always filled with colorful locals and sometimes live music. Please note that it is a 15 minute bus ride to the NYC subway system. Although it is not required to enjoy your stay, having a car is helpful. The small beach is at the end of the block, 3 houses down. The house is NOT on the water. You can see the water from the windows in the living/bed room. City Island is a walkable Island. Everything is very close. It is helpful to have a car if you want to venture further off the Island. The city subway system is a bus ride away. The MTA bus service takes approximately 15 minutes from City Island to the Pelham Bay 6 train subway stop. The subway ride into Manhattan from there is approximately 1 hour. There is also a MTA Express bus into Manhattan that runs on a limited schedule from City Island Ave. stops to stops in Manhattan. Please consult the MTA website before booking or email me if you have questions. This is a whole private house on a charming quiet residential street. You will have use of the large fenced in back yard. Private driveway for one car. I live close by and know the area well if you have any issues or questions. Friendly and helpful neighbors. Quiet street. No Smoking https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=small https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=large https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=x_large 403032 https://www.airbnb.com/users/show/403032 Diane 2011-02-21 New York, New York, United States NA within an hour 100% N/A t https://a0.muscache.com/im/pictures/dc275952-e2d7-463e-93a5-994fa8c31e04.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/dc275952-e2d7-463e-93a5-994fa8c31e04.jpg?aki_policy=profile_x_medium NA 1 1 [‘email’, ‘phone’, ‘facebook’, ‘reviews’] t f Bronx, NY 10464, United States NA City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.84487 -73.78954 f House Entire home/apt 2 1 0 1 Real Bed {TV,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,Heating,“Family/kid friendly”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,“Safety card”,Essentials,Shampoo,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self Check-In”,Lockbox} NA $125.00 $550.00 NA $500.00 $35.00 1 $0.00 2 28 a week ago NA 23 33 51 129 2017-05-03 41 2015-12-26 2017-04-27 97 10 10 10 10 10 10 f NA NA f moderate f f 1 2.48

Data Quality Check

Missing Values

Majority of description columns and host related information are blank. Steps are taken in the following section to filter columns with higher percentage of Nulls/NA.

numMissingVal <-sapply(revenue, function(x) sum(length(which(is.na(x)))))  
kable(as.data.frame(numMissingVal)) %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
numMissingVal
id 0
listing_url 0
scrape_id 0
last_scraped 0
name 35
summary 1614
space 12872
description 18
experiences_offered 0
neighborhood_overview 17124
notes 26142
transit 15292
access 17171
interaction 18274
house_rules 14833
thumbnail_url 8616
medium_url 8616
picture_url 0
xl_picture_url 8616
host_id 0
host_url 0
host_name 251
host_since 251
host_location 416
host_about 15716
host_response_time 251
host_response_rate 251
host_acceptance_rate 251
host_is_superhost 251
host_thumbnail_url 251
host_picture_url 251
host_neighbourhood 5194
host_listings_count 251
host_total_listings_count 251
host_verifications 0
host_has_profile_pic 251
host_identity_verified 251
street 0
neighbourhood 6005
neighbourhood_cleansed 0
neighbourhood_group_cleansed 0
city 44
state 0
zipcode 611
market 146
smart_location 0
country_code 0
country 0
latitude 0
longitude 0
is_location_exact 0
property_type 0
room_type 0
accommodates 0
bathrooms 159
bedrooms 69
beds 79
bed_type 0
amenities 0
square_feet 40186
price 0
weekly_price 31745
monthly_price 33341
security_deposit 22960
cleaning_fee 12692
guests_included 0
extra_people 0
minimum_nights 0
maximum_nights 0
calendar_updated 0
has_availability 40753
availability_30 0
availability_60 0
availability_90 0
availability_365 0
calendar_last_scraped 0
number_of_reviews 0
first_review 9299
last_review 9185
review_scores_rating 9924
review_scores_accuracy 10017
review_scores_cleanliness 9985
review_scores_checkin 10036
review_scores_communication 9986
review_scores_location 10037
review_scores_value 10039
requires_license 0
license 40753
jurisdiction_names 40745
instant_bookable 0
cancellation_policy 0
require_guest_profile_picture 0
require_guest_phone_verification 0
calculated_host_listings_count 0
reviews_per_month 9299

Account for Missing Zipcodes

Zipcode column contains 611 missing values. Ignoring these values can prove detrimental to the analysis. Zipcodes are imputed by selecting a non-NA value from Neighbourhood Group Cleansed.

Total Number of NA values in Zipcode Column after Imputation:

revenue <-
  revenue %>% group_by(neighbourhood_group_cleansed) %>% fill(zipcode) %>% ungroup()

sum(is.array(revenue$zipcode))
## [1] 0

Dollar-ed Price Columns

‘$’ Value prefix of every price row prevents numeric manipulation. It is thus removed from three columns: Price, Weekly Price & Monthly Price.

revenue$price <- as.numeric(gsub('[$,]','', revenue$price))
revenue$weekly_price <- as.numeric(gsub('[$,]','', revenue$weekly_price))
revenue$monthly_price <- as.numeric(gsub('[$,]','', revenue$monthly_price))
head(revenue[c("price","weekly_price","monthly_price")])
## # A tibble: 6 x 3
##   price weekly_price monthly_price
##   <dbl>        <dbl>         <dbl>
## 1    99           NA            NA
## 2   200           NA            NA
## 3   300           NA            NA
## 4   125          775            NA
## 5    69          350          1200
## 6   125          550            NA

Negative or Zero Valued Columns

None of the price columns: Price, weekly price & Monthly price have zero or negative value.

revCharCol <- colnames(revenue %>% ungroup() %>% select_if(is.character))


revZeroNeg <-
  sapply(revenue[,!(names(revenue) %in% revCharCol)], function(x)
  count(x <= 0, na.rm = TRUE))
kable(as.data.frame(revZeroNeg)) %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
revZeroNeg
id 0
scrape_id 0
last_scraped 0
host_id 0
host_since 0
host_listings_count 45
host_total_listings_count 45
latitude 0
longitude 40753
accommodates 0
bathrooms 104
bedrooms 3525
beds 0
square_feet 54
price 0
weekly_price 0
monthly_price 0
guests_included 0
minimum_nights 0
maximum_nights 0
availability_30 18167
availability_60 15308
availability_90 14210
availability_365 12168
calendar_last_scraped 0
number_of_reviews 9181
first_review 0
last_review 0
review_scores_rating 0
review_scores_accuracy 0
review_scores_cleanliness 0
review_scores_checkin 0
review_scores_communication 0
review_scores_location 0
review_scores_value 0
calculated_host_listings_count 0
reviews_per_month 0

Duplicated Rows

No Duplicated rows were found in the dataset.

revenue[which(duplicated(revenue) ==T),] 

Data Filtering

Inorder to remove columns that add little or no value to the analysis, some of the smart data munging techniques are incorporated. These include removing columns based on pattern matching with their names, imbalanced columns, character columns with 100 % variance etc.

How is it achieved?

  1. Account for Zero Variance, Imbalanced, high NA valued and 100 percent variance character columns.
  2. Use associated methods to remove these columns from revenue data. Worst case - Manually remove columns keeping the final outcome in mind.
  3. Produce clean revenue data for further analysis.

Zero Variance Cols

Imbalanced/Zero Variance columns add no value to the analysis. These columns are removed using nearzeroVar method from Caret package.

Columns Removed:

zvdf <- nearZeroVar(revenue, saveMetrics = TRUE)
ZVnames=rownames(subset(zvdf, nzv== TRUE))
revenue <- revenue[ , !(names(revenue) %in% ZVnames)]

printlis(ZVnames)
## scrape_id 
## experiences_offered 
## host_acceptance_rate 
## host_has_profile_pic 
## state 
## market 
## country_code 
## country 
## bed_type 
## has_availability 
## requires_license 
## license 
## require_guest_profile_picture 
## require_guest_phone_verification

Pattern Matching

Column names starting with “require”, “host”, “calendar” and ending with “url” and “nights” are irrelevant information when the property is invested in, by the real estate company. These columns are removed.

Columns Removed:

pattern <-
  colnames(
  revenue %>% select(
  starts_with("require"),
  starts_with("host"),
  starts_with("calendar"),
  ends_with("url"),
  ends_with("nights")
  )
  )
revenue <- revenue[,!(names(revenue) %in% pattern)]

printlis(pattern)
## host_id 
## host_url 
## host_name 
## host_since 
## host_location 
## host_about 
## host_response_time 
## host_response_rate 
## host_is_superhost 
## host_thumbnail_url 
## host_picture_url 
## host_neighbourhood 
## host_listings_count 
## host_total_listings_count 
## host_verifications 
## host_identity_verified 
## calendar_updated 
## calendar_last_scraped 
## listing_url 
## thumbnail_url 
## medium_url 
## picture_url 
## xl_picture_url 
## minimum_nights 
## maximum_nights

Based on Unique Values - Character Columns

Character columns with near 100% variance (Every Row is different) are removed as they provide no group level information that can be used on a larger population/scale.

These columns include textual columns describing the home, host, amenties etc. For lack of conclusion from other variables, these columns can be revisited for sentiment analysis.

Columns Removed:

uniquedf <-
  revenue %>% select_if(is.character) %>% summarise_all(funs(n_distinct(.))) 

uniquedf <-
  uniquedf %>% gather(key = var_name, value = value, 1:ncol(uniquedf))

uniquedf$percentUnique <- round(uniquedf$value/nrow(revenue),2)
uniqueval <- uniquedf %>% filter(percentUnique > 0.2) %>% pull(var_name)
revenue <- revenue[,!(names(revenue) %in% uniqueval)]


printlis(uniqueval)
## name 
## summary 
## space 
## description 
## neighborhood_overview 
## notes 
## transit 
## access 
## interaction 
## house_rules 
## amenities

Based on NA Values

Columns with over 50% NA values are removed without mercy.

Columns Removed:

nadf <- revenue %>% summarise_all(funs(sum(is.na(.))))
nadf <- nadf %>% gather(key = var_name, value = value, 1:ncol(nadf))
nadf$numNa <- round(nadf$value/nrow(revenue),2)
naval <- nadf %>% filter(numNa > 0.5) %>% pull(var_name)
revenue <- revenue[,!(names(revenue) %in% naval)]

printlis(naval)
## square_feet 
## weekly_price 
## monthly_price 
## security_deposit 
## jurisdiction_names

Manual Removal

For lack of better pattern, some of the columns are removed manually.

Columns Removed:

dropCol <-
  c(
  'id',
  'street',
  'city',
  'CountyName',
  'cleaning_fee',
  'guests_included',
  'extra_people',
  'review_scores_communication',
  'review_scores_checkin',
  'review_scores_cleanliness',
  'review_scores_value',
  'reviews_per_month',
  'instant_bookable',
  'cancellation_policy',
  'smart_location',
  'is_location_exact',
  'calculated_host_listings_count'
  )

revenue <- revenue[,!(names(revenue) %in% dropCol)]

printlis(dropCol)
## id 
## street 
## city 
## CountyName 
## cleaning_fee 
## guests_included 
## extra_people 
## review_scores_communication 
## review_scores_checkin 
## review_scores_cleanliness 
## review_scores_value 
## reviews_per_month 
## instant_bookable 
## cancellation_policy 
## smart_location 
## is_location_exact 
## calculated_host_listings_count

Clean Revenue Data

kable(head(revenue))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
last_scraped neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed zipcode latitude longitude property_type room_type accommodates bathrooms bedrooms beds price availability_30 availability_60 availability_90 availability_365 number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_location
2017-05-03 City Island City Island Bronx 10464 40.85205 -73.78868 House Private room 2 1 1 1 99 24 54 80 170 25 2016-01-18 2017-04-23 100 10 10
2017-05-04 City Island City Island Bronx 10464 40.85349 -73.78861 Apartment Private room 4 1 1 1 200 30 60 90 180 0 NA NA NA NA NA
2017-05-04 City Island City Island Bronx 10464 40.84114 -73.78305 House Entire home/apt 4 3 3 3 300 30 60 90 365 0 NA NA NA NA NA
2017-05-05 City Island City Island Bronx 10464 40.84977 -73.78661 Apartment Entire home/apt 3 1 1 1 125 8 30 60 335 12 2015-07-04 2016-10-24 93 10 10
2017-05-04 City Island City Island Bronx 10464 40.85002 -73.78933 House Private room 4 1 1 1 69 17 47 77 352 86 2015-05-25 2017-04-22 97 10 10
2017-05-04 NA City Island Bronx 10464 40.84487 -73.78954 House Entire home/apt 2 1 0 1 125 23 33 51 129 41 2015-12-26 2017-04-27 97 10 10

Data Join

Revenue and Cost Data are merged based on common regionName/Zipcode. The real estate company has already chosen 2 Bed room homes as profitable venture.

Data is further refined to account for only 2 bed room homes. Dimension of the final data set:

final <- merge(revenue, cost, by.x = "zipcode",by.y = "RegionName")
final <- subset(final,final$bedrooms == '2')
dim(final)
## [1] 1258   28

Exploratory Data Analysis

Deep Dive: Understanding Data

We recognize that only 22 zipcodes/RegionName made it to EDA phase. Over 1258 properties to choose from in 4 distinct Neighbourhood - Manhattan, Queens,Brooklyn and Staten Island.

kable(head(final))  %>% kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>% scroll_box(width = "100%", height = "250px")
zipcode last_scraped neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed latitude longitude property_type room_type accommodates bathrooms bedrooms beds price availability_30 availability_60 availability_90 availability_365 number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_location CountyName SizeRank cost year
2 10003 2017-05-03 East Village East Village Manhattan 40.73133 -73.98774 Apartment Private room 4 1.0 2 2 113 0 0 0 0 306 2012-09-27 2017-02-24 96 10 10 New York 21 2005500 2017
11 10003 2017-05-04 East Village East Village Manhattan 40.73323 -73.98859 Apartment Entire home/apt 4 1.0 2 4 3750 0 0 0 0 0 NA NA NA NA NA New York 21 2005500 2017
12 10003 2017-05-04 East Village East Village Manhattan 40.72808 -73.98763 Apartment Entire home/apt 2 1.0 2 2 200 0 0 0 0 3 2016-10-03 2017-01-01 100 10 10 New York 21 2005500 2017
15 10003 2017-05-04 Gramercy Park Gramercy Manhattan 40.73797 -73.98778 Apartment Entire home/apt 3 2.0 2 2 500 4 18 48 79 1 2016-08-20 2016-08-20 100 10 10 New York 21 2005500 2017
16 10003 2017-05-04 East Village East Village Manhattan 40.72779 -73.98991 Apartment Entire home/apt 4 1.5 2 2 200 19 49 79 79 1 2017-03-26 2017-03-26 60 10 8 New York 21 2005500 2017
30 10003 2017-05-03 NA East Village Manhattan 40.72819 -73.98691 Apartment Entire home/apt 2 1.0 2 2 199 21 51 81 356 31 2015-12-06 2017-04-09 92 9 10 New York 21 2005500 2017

Dimension of Data

Starting with 95 columns in Revenue Data and 262 columns in Zillow (total of 357 columns) are reduced to 28 columns in the final data set.

dim(final)
## [1] 1258   28

Data Type Handling

Convert ‘Character’ columns to Factors to ease the process of building discrete charts.

final <- final %>% mutate_if(sapply(final,is.character),as.factor)
str(final)
## 'data.frame':    1258 obs. of  28 variables:
##  $ zipcode                     : Factor w/ 22 levels "10003","10011",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ last_scraped                : Date, format: "2017-05-03" "2017-05-04" ...
##  $ neighbourhood               : Factor w/ 49 levels "Annadale","Boerum Hill",..: 13 13 13 17 13 NA 13 13 44 13 ...
##  $ neighbourhood_cleansed      : Factor w/ 51 levels "Arrochar","Boerum Hill",..: 16 16 16 20 16 16 16 16 5 16 ...
##  $ neighbourhood_group_cleansed: Factor w/ 4 levels "Brooklyn","Manhattan",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ latitude                    : num  40.7 40.7 40.7 40.7 40.7 ...
##  $ longitude                   : num  -74 -74 -74 -74 -74 ...
##  $ property_type               : Factor w/ 6 levels "Apartment","Condominium",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ room_type                   : Factor w/ 2 levels "Entire home/apt",..: 2 1 1 1 1 1 1 1 1 1 ...
##  $ accommodates                : int  4 4 2 3 4 2 4 5 4 4 ...
##  $ bathrooms                   : num  1 1 1 2 1.5 1 1 1 3 1 ...
##  $ bedrooms                    : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ beds                        : int  2 4 2 2 2 2 2 2 2 2 ...
##  $ price                       : num  113 3750 200 500 200 199 200 300 800 180 ...
##  $ availability_30             : int  0 0 0 4 19 21 0 4 0 0 ...
##  $ availability_60             : int  0 0 0 18 49 51 0 26 0 0 ...
##  $ availability_90             : int  0 0 0 48 79 81 0 56 0 0 ...
##  $ availability_365            : int  0 0 0 79 79 356 0 146 0 0 ...
##  $ number_of_reviews           : int  306 0 3 1 1 31 4 32 0 2 ...
##  $ first_review                : Date, format: "2012-09-27" NA ...
##  $ last_review                 : Date, format: "2017-02-24" NA ...
##  $ review_scores_rating        : int  96 NA 100 100 60 92 80 86 NA 90 ...
##  $ review_scores_accuracy      : int  10 NA 10 10 10 9 9 9 NA 10 ...
##  $ review_scores_location      : int  10 NA 10 10 8 10 10 10 NA 10 ...
##  $ CountyName                  : Factor w/ 4 levels "Kings","New York",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ SizeRank                    : int  21 21 21 21 21 21 21 21 21 21 ...
##  $ cost                        : num  2005500 2005500 2005500 2005500 2005500 ...
##  $ year                        : Factor w/ 1 level "2017": 1 1 1 1 1 1 1 1 1 1 ...

Corrected Price by Room Type

Price of the daily rental in Revenue data is reflective of the space that is offered and not the entire property itself. The price must be specifically corrected to account for entire property to account the benefit.

Assumption Made: If the property type == Private Room, it is multipled by number of bedrooms to account for overall price. Correction applied is returned to original price column.

final <- final %>% mutate(price = if_else(room_type == "Private room",
                          price * bedrooms,
                          price)) 

Missing Values

Nothing too alarming, major chunk of columns look okay to proceed with the analysis.

plot_missing(final)

Deep Dive: Analysis and Visualization

In order to smoothen the process of choosing the zipcode - analysis going forward accounts for a list of top 10 zipcodes defined by a condtion specific to that analysis.

The list is finally compared to choose the top zipcodes.

Zipcodes are evaluated on the following conditions

  1. Choice
  2. Cost
  3. Revenue
  4. Popularity
  5. Occupancy

Note: Few charts delve outside of zipcodes to understand the data on a broader sense.

Number of properties by neighbourhood

Manhattan hosts highest number of properties followed by Brooklyn. Queens and Staten Island are outliers as far number is concerned.

ggplot(final, aes(neighbourhood_group_cleansed)) + geom_bar() + labs(x = "Neighbourhood", y = "Number of Properties") + geom_text(stat='count', aes(label=..count..), vjust= -0.3)  + theme_classic()

Choice - Number of properties by Zipcode

Higher the number of properties , higher the number of choices reassuring higher airbnb/rental activity in the area. Zipcodes 10003 10011,10013,10014,10025,10036,11201,11215,11217,11231 have higher volume of properties and they make the top 10 based on choice.

ggplot(final, aes(zipcode, fill = neighbourhood_group_cleansed)) + geom_bar() + labs(x = "Zipcode", y = "Number of Properties") + geom_text(stat='count', aes(label=..count..), vjust= -0.3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))  + theme_bw() + theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border = element_blank(),axis.text.x = element_text(angle = 90, hjust = 1))

Cost - Property cost by Zipcode

One of the primary constraint in making the choice is the cost, lower the cost - company can save a lot more and reach ROI faster .

Zipcodes in Manhattan walk away with highest property prices (My God! They are expensive) - the beast out scales rest of the neighbourhoods. Brooklyn comes second. Zipcodes in staten island and queens have comparable prices and they are far lower.

Zipcodes : 10025,10304,10305,10306,10308,11215,11217,11231,11234,11434 have lower property cost and they make the top 10 in this section.

medianCost1 <- final %>% select(zipcode,neighbourhood_group_cleansed, cost) %>% filter(neighbourhood_group_cleansed == c("Manhattan","Brooklyn"))  %>% group_by(neighbourhood_group_cleansed,zipcode) %>% summarise_all(funs(median)) %>% ggplot(aes(x = zipcode, y = cost, fill =neighbourhood_group_cleansed )) + geom_bar(stat = "identity") +scale_y_continuous(labels = scales::comma) + labs(y = "Cost", x = "Zipcode") + theme_bw() + theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border = element_blank(),axis.text.x = element_text(angle = 90, hjust = 1)) + guides(fill = guide_legend(title = "Neighbourhood"))

medianCost2 <- final %>% select(zipcode,neighbourhood_group_cleansed, cost) %>% filter(neighbourhood_group_cleansed == c("Staten Island","Queens")) %>% group_by(neighbourhood_group_cleansed,zipcode) %>% summarise_all(funs(median)) %>% ggplot(aes(x = zipcode, y = cost, fill =neighbourhood_group_cleansed )) + geom_bar(stat = "identity") +scale_y_continuous(labels = scales::comma) + labs(x = "Cost", y = "Zipcode") + theme_bw() + theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border = element_blank(),axis.text.x = element_text(angle = 90, hjust = 1)) + guides(fill = guide_legend(title = "Neighbourhood"))

cowplot::plot_grid(medianCost1, medianCost2, labels = "AUTO", ncol = 1, align = 'v')

Property type Vs Cost

In order to develop deeper understanding with the data (Just to make sure), property cost is measured across different property types: Apartment, House, Loft etc. There seems to be no clear pattern to claim a certain property type out prices another.

propTypeCost <- ggplot(final, aes(x=property_type, y=cost, fill = zipcode)) + scale_y_continuous(labels = scales::comma) + geom_point(aes(col = zipcode, size=cost)) + 
geom_smooth(method="loess", se=F) + labs( x="Property type", y="Cost") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplotly(propTypeCost)

Nightly Price by Neighbourhood

Beyond investment, company would look foward to make quick bucks. Higher the rental price, faster is the ROI and profits.

A general look at spread of price/Night reveals

  1. Manhattan has a wider spread with price ranging to ($50-$1000)/Night.
  2. Brooklyn has almost a perfect bell curve showing characterstics of normal real world behaviour.
  3. Staten Island and Queens have lower price and narrower distribution and this is because of limited sample size.
pricebynbghrhood <- ggplot(final,aes(x=price, fill=neighbourhood_group_cleansed)) + geom_density(alpha = 0.3) + scale_x_continuous(limits = quantile(final$price, c(0, 0.99))) + labs(x = "Price/Night", y = "Density") + guides(fill = guide_legend(title = "Neighbourhood")) 
ggplotly(pricebynbghrhood)

Revenue - Nightly Price per property by Zipcode

Going back to listing zipcodes ,prices/Night have too many close contenders.

Looking at top 10 zipcodes which fetch high price/Night are: 10011,10013,10014,10022,10023,10028,10036,11201,11217,11231

pricebyZipcode <- ggplot(final, aes(x = zipcode,y = price, fill = neighbourhood_group_cleansed)) + geom_boxplot() + scale_y_continuous(limits = quantile(final$price, c(0, 0.99))) + labs(x = "Zipcode", y = "Price/Night") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + guides(fill = guide_legend(title = "Neighbourhood")) 
ggplotly(pricebyZipcode)

Pair Wise Correlation

If a property is generously available for booking in the 30 days, it is most likely to be generously available during the rest of the year as well. There is strong positive correlation between (Availablity 30, Availablity 60,Availablity 90 & Availablity 365). The opposite is true. Booked out apartments tend to remain popular across the year.

Interestingly, Number of reviews also drive earlier bookings, which is clearly seen by it’s positive correlation with column: Availablity 365.

corCols <- c("availability_30","availability_60","availability_90","availability_365","number_of_reviews","price","cost")
plot_correlation(final[corCols], use ="pairwise.complete.obs")

Occupancy - Zipcode Vs Availability

Lower the availablity , higher the occupancy - indicating fast-filling properties.

Looking at Availablity for next 365 days, it is surprising to find a lot of properties booked ahead of time. The company can capitalize on this behavior and vary the price accordindingly to generate more revenue/profits.

Zipcodes that make top 10 from this list: 10021,10023,10025,10028,10128,10304,10312,11201,11215,11231

final %>% group_by(neighbourhood_group_cleansed,zipcode) %>% summarise_all(funs(mean)) %>% ggplot(aes(x =zipcode,y= availability_365, fill = neighbourhood_group_cleansed )) + geom_bar(stat = "identity") + scale_y_continuous(labels = scales::comma) +  scale_colour_brewer(palette = "Pastel2") + labs( y = "Availability", x = "Zipcode") + theme_bw() + theme(plot.background = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.border = element_blank(),axis.text.x = element_text(angle = 90, hjust = 1)) + guides(fill = guide_legend(title = "Neighbourhood")) 

Availability Vs Price

Stepping aside to understand availablity in terms of price, low prices don’t exactly drive faster bookings. Higher rental properties have minimal availablity (compared to next 365 days). It is satisfying to know that people are ready to pay higher ahead of time.

availablityPrice <- final %>% group_by(neighbourhood_group_cleansed,zipcode) %>% summarise_all(funs(mean)) %>% ggplot(aes(x=availability_365, y=price)) + scale_colour_brewer(palette = "Set1") + geom_point(aes(col=neighbourhood_group_cleansed, size=price)) +
labs(x="Availability", y="Price", shape="Price", colour= "Neighbourhood")

ggplotly(availablityPrice)

Review scores rating Vs Price

Are reviews driving the price ? Not likely. Trends are all over the place. Majority of the properties in are highly rated (10).

final$rating <- round(final$review_scores_rating/10,0)
final$rating <- as.factor(final$rating )

reviewPrice <- ggplot(data = subset(final, !is.na(rating)), aes(x=rating, y=price, colour = zipcode)) + geom_point(aes( size=price)) + scale_y_continuous(limits = quantile(final$price, c(0, 0.99))) + labs(x="Rating", y="Price")

ggplotly(reviewPrice)

Popularity - Number of Weeks Vs Number of Reviews by Zipcode

Popularity is measured in terms of number of reviews and it is a proponent of time. Higher reviews equate to popular properties and popular neighborhood/zipcode.

Top 10 zipcodes from the popular section are 10003,10014,10023,10025 10304,10306,10308,11201,11217,11231

numDayperReview <- final %>% mutate(diff = round(difftime(last_scraped, first_review, units = "weeks"),0)) %>% drop_na(diff) %>% group_by(neighbourhood_group_cleansed,zipcode) %>% summarise_all(funs(mean)) %>% ggplot(aes(x = number_of_reviews, y= diff, colour = zipcode)) + geom_point() + labs(x="Number Of Reviews", y="Number Of weeks")

ggplotly(numDayperReview)

Conclusion and Future Steps:

Evaluation

The goal of the analysis was to find zipcodes on 5 key metics.

  1. Choice - Have multiple homes to choose from
  2. Cost - Property Costed Less
  3. Revenue - Nightly Priced Higher
  4. Popularity - Were Popular among current Airbnb users
  5. Occupany - Have proven Occupancy/Early Bookings based on next 365 days

Top 10 zipcodes by individual Metrics

Top 10 zipcodes are selected for each of these metrics based on the analysis and the final table looks like one below. Zipcodes that managed to make it to 3 or more metrics are highlighted. Individual zipcodes are given different colors to create a visual bifercation.

Top zipcodes fitting three or more metrics

The list is further reduced to top 8 zipcodes based on number of matches.

Final Conclusion

  1. Zipcode 11231 from Brooklyn is the top zipcode. It fits all the 5 key metrics perfectly.
  2. Zipcode 11201 fit all metrics except Cost. Within the zipcode there are properties that cost much lower , given the number of choice is high - the company can explictly look at low end properties.
  3. Zipcode 11217 fit all metrics except occpancy. Since it is a 365 days availablity cycle that it is measured against. Customers are highly likely to book and the zipcode might make its way into top 10 as we go forward.
  4. Zipcodes 10014,10023,10304,11215,10025 fit 3 key metrics at various degrees. The company can look outside of Brooklyn and look at Manhattan and Staten Island to grab some of the properties there. This would give the company a certain coverage across the New York City.

Future Steps

  1. New York hosts 176 zipcodes, data can be further enriched to account for rest of the zipcodes. This would give the company

  2. Cost and price comparision would yield one-dimensional results. Introduce Math and calculate ROI using individual metrics defined. Since various factor define profitablity.
  3. Text Analytics / Sentiment Analytics on ignored Description columns from Revenue Data (Airbnb). This would open insights about other metrics that drive customer to book an airbnb property for rental such as Access to Public Transport, parking space etc.

  4. Introduce Seasonality and Weather data to understand trends occupancy throughout the year. Can be further extended to statistical models to estimate occupancy.

  5. Due to time constraints - some of the coding practices such as memory management, variable nomenclature and other markdown specific functionalities were ignored. This would be automatic first step in the future scope of work.