For this purpose, publicly available data from Zillow and AirBnB were used.
For this purpose of getting the top zipcodes there are some key metrics that I’ll base my analysis on. Some of these metrics are assumptions whereas the others are researched based on information on property evaluation.
Reviews: Number of reviews on a property depict a lot about the property. More number of reviews mean the property has been booked more often as compared to properties with fewer number of reviews. A zipcode that comprises of properties with more reviews has high engagemnent with users and more probable to be booked in future.
Occupancy Rate: It is the percentage of occupancy of the property in a given month/year. Greather the availability , lesser is the occupancy. More occupancy will A property that is occupied 365 days in a year has a occupancy rate of 100% whereas a property.
Annual Return: It is the revenue generated from the property on a yearly basis. For the purpose of this analysis, the annual return is calculated by multiplying the price per night, yearly occupancy rate and 365 days. (price/night* yearly_occupancy * 365)
Rent Ratio: It’s the annual rent collected from the property divided by the price of the property. The higher the rent ratio the better. It means either that the property is bought at a cheap rate or the rent is higher. In either of these cases, the ivestor is at gain.
Break Even Time: Break-even time represents the amount of time it takes for an investment to make back its original cost. It is caluclated here by dividing the original cost of the proerty by annual rent generated by the property. But since the rent for the property will not be same every year, an increase of 10% in the rent genrated yearly is considered. So lesser is the break even time, the better it is for the investor since after the break event time, most of the revenue will be a profit for the investor and properties with high break even time will take longer to even out the expenes.
Most of zipcodes in Manhattan and Brooklyn have higher property cost. Since these zipcodes are also popular in terms of tourism they make some excellent choice for investment.
Price and Availablity are not inveresely proportional. Customers are not hestitant to pay higher price ahead of time. The company can capitalize this behavior into their pricing scheme.
Staten Island and Queens Neighbourhood have limited choices but it also gets fewer guests. In case the company wants to diversify, they should pick the top zips from different neighbourhoods to minimize risk.
Loading the packages and installing them in case they are not present already.
## Check for package versions and update them accordingly
##https://gist.github.com/benmarwick/5054846
list.of.packages <- c("tidyverse", "DataExplorer","corrplot","caret","nnet","readr","tidyr","ggplot2","plotly","kableExtra") #
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(new.packages, require, character.only=T)
library(naniar)
library(tidyverse)
library(DataExplorer)
library(corrplot)
library(caret)
library(nnet)
library(readr)
library(tidyr)
library(ggplot2)
library(plotly)
library(kableExtra)| RegionID | RegionName | City | State | Metro | CountyName | SizeRank | 1996-04 | 1996-05 | 1996-06 | 1996-07 | 1996-08 | 1996-09 | 1996-10 | 1996-11 | 1996-12 | 1997-01 | 1997-02 | 1997-03 | 1997-04 | 1997-05 | 1997-06 | 1997-07 | 1997-08 | 1997-09 | 1997-10 | 1997-11 | 1997-12 | 1998-01 | 1998-02 | 1998-03 | 1998-04 | 1998-05 | 1998-06 | 1998-07 | 1998-08 | 1998-09 | 1998-10 | 1998-11 | 1998-12 | 1999-01 | 1999-02 | 1999-03 | 1999-04 | 1999-05 | 1999-06 | 1999-07 | 1999-08 | 1999-09 | 1999-10 | 1999-11 | 1999-12 | 2000-01 | 2000-02 | 2000-03 | 2000-04 | 2000-05 | 2000-06 | 2000-07 | 2000-08 | 2000-09 | 2000-10 | 2000-11 | 2000-12 | 2001-01 | 2001-02 | 2001-03 | 2001-04 | 2001-05 | 2001-06 | 2001-07 | 2001-08 | 2001-09 | 2001-10 | 2001-11 | 2001-12 | 2002-01 | 2002-02 | 2002-03 | 2002-04 | 2002-05 | 2002-06 | 2002-07 | 2002-08 | 2002-09 | 2002-10 | 2002-11 | 2002-12 | 2003-01 | 2003-02 | 2003-03 | 2003-04 | 2003-05 | 2003-06 | 2003-07 | 2003-08 | 2003-09 | 2003-10 | 2003-11 | 2003-12 | 2004-01 | 2004-02 | 2004-03 | 2004-04 | 2004-05 | 2004-06 | 2004-07 | 2004-08 | 2004-09 | 2004-10 | 2004-11 | 2004-12 | 2005-01 | 2005-02 | 2005-03 | 2005-04 | 2005-05 | 2005-06 | 2005-07 | 2005-08 | 2005-09 | 2005-10 | 2005-11 | 2005-12 | 2006-01 | 2006-02 | 2006-03 | 2006-04 | 2006-05 | 2006-06 | 2006-07 | 2006-08 | 2006-09 | 2006-10 | 2006-11 | 2006-12 | 2007-01 | 2007-02 | 2007-03 | 2007-04 | 2007-05 | 2007-06 | 2007-07 | 2007-08 | 2007-09 | 2007-10 | 2007-11 | 2007-12 | 2008-01 | 2008-02 | 2008-03 | 2008-04 | 2008-05 | 2008-06 | 2008-07 | 2008-08 | 2008-09 | 2008-10 | 2008-11 | 2008-12 | 2009-01 | 2009-02 | 2009-03 | 2009-04 | 2009-05 | 2009-06 | 2009-07 | 2009-08 | 2009-09 | 2009-10 | 2009-11 | 2009-12 | 2010-01 | 2010-02 | 2010-03 | 2010-04 | 2010-05 | 2010-06 | 2010-07 | 2010-08 | 2010-09 | 2010-10 | 2010-11 | 2010-12 | 2011-01 | 2011-02 | 2011-03 | 2011-04 | 2011-05 | 2011-06 | 2011-07 | 2011-08 | 2011-09 | 2011-10 | 2011-11 | 2011-12 | 2012-01 | 2012-02 | 2012-03 | 2012-04 | 2012-05 | 2012-06 | 2012-07 | 2012-08 | 2012-09 | 2012-10 | 2012-11 | 2012-12 | 2013-01 | 2013-02 | 2013-03 | 2013-04 | 2013-05 | 2013-06 | 2013-07 | 2013-08 | 2013-09 | 2013-10 | 2013-11 | 2013-12 | 2014-01 | 2014-02 | 2014-03 | 2014-04 | 2014-05 | 2014-06 | 2014-07 | 2014-08 | 2014-09 | 2014-10 | 2014-11 | 2014-12 | 2015-01 | 2015-02 | 2015-03 | 2015-04 | 2015-05 | 2015-06 | 2015-07 | 2015-08 | 2015-09 | 2015-10 | 2015-11 | 2015-12 | 2016-01 | 2016-02 | 2016-03 | 2016-04 | 2016-05 | 2016-06 | 2016-07 | 2016-08 | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61639 | 10025 | New York | NY | New York | New York | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 798600 | 798800 | 801500 | 804600 | 814900 | 828300 | 835700 | 849300 | 858100 | 854000 | 834800 | 821700 | 830300 | 853700 | 868300 | 875200 | 882200 | 892400 | 905000 | 924000 | 934400 | 932100 | 927500 | 923600 | 907900 | 890900 | 883400 | 896100 | 923900 | 952900 | 964600 | 972500 | 973800 | 973400 | 966500 | 966800 | 967100 | 974800 | 976800 | 976100 | 973700 | 974500 | 973200 | 966400 | 950400 | 933300 | 920900 | 909400 | 891400 | 873300 | 858800 | 850200 | 842800 | 834000 | 828800 | 821400 | 813900 | 813300 | 821500 | 831700 | 845100 | 854500 | 858900 | 859200 | 863500 | 876000 | 886100 | 890000 | 894200 | 901800 | 909500 | 913300 | 907400 | 900000 | 897700 | 896300 | 892300 | 890400 | 888600 | 891700 | 899500 | 904400 | 908200 | 914000 | 915100 | 912300 | 914000 | 921100 | 923300 | 917300 | 915000 | 922800 | 929100 | 937700 | 955700 | 974200 | 995500 | 1019500 | 1035100 | 1054900 | 1079900 | 1092600 | 1103500 | 1118800 | 1139300 | 1154600 | 1144100 | 1120300 | 1125500 | 1136000 | 1135100 | 1130000 | 1138200 | 1153700 | 1174800 | 1185400 | 1188400 | 1189700 | 1193700 | 1199900 | 1201400 | 1202600 | 1214200 | 1235200 | 1258000 | 1287700 | 1307200 | 1313900 | 1317100 | 1327400 | 1338800 | 1350400 | 1356600 | 1358500 | 1364000 | 1373300 | 1382600 | 1374400 | 1364100 | 1366300 | 1354800 | 1327500 | 1317300 | 1333700 | 1352100 | 1390000 | 1431000 |
| 84654 | 60657 | Chicago | IL | Chicago | Cook | 2 | 167700 | 166400 | 166700 | 167200 | 166900 | 166900 | 168000 | 170100 | 171700 | 173000 | 174600 | 177600 | 180100 | 182300 | 184400 | 186300 | 187600 | 189400 | 190300 | 189700 | 189800 | 191900 | 194500 | 195500 | 196000 | 196900 | 198900 | 201400 | 204600 | 207900 | 211800 | 214600 | 216000 | 217500 | 220200 | 222800 | 226200 | 229600 | 232400 | 234400 | 236300 | 238300 | 241800 | 246100 | 249500 | 251300 | 253200 | 255700 | 259200 | 263100 | 266600 | 269500 | 272800 | 275500 | 278800 | 283400 | 288600 | 291300 | 292400 | 294600 | 297100 | 298200 | 299800 | 302000 | 304200 | 307900 | 311000 | 311400 | 311000 | 311700 | 312300 | 312000 | 311800 | 312600 | 313000 | 314400 | 317300 | 319700 | 320500 | 321000 | 321600 | 323800 | 326100 | 329000 | 332200 | 334700 | 336000 | 337300 | 337500 | 337100 | 334900 | 333100 | 332800 | 333400 | 335100 | 337800 | 339400 | 340700 | 343200 | 345000 | 345200 | 344600 | 344500 | 346200 | 349800 | 353400 | 355100 | 356300 | 358300 | 359500 | 359200 | 358500 | 359100 | 361400 | 364700 | 367500 | 369400 | 369800 | 369600 | 368700 | 368100 | 369000 | 370300 | 371700 | 374200 | 375700 | 376400 | 378200 | 379800 | 380100 | 380400 | 381200 | 382700 | 382700 | 380200 | 377800 | 376300 | 375600 | 376000 | 376200 | 375800 | 376300 | 377200 | 376800 | 373700 | 370100 | 368700 | 368600 | 366600 | 362200 | 358600 | 355300 | 352300 | 350900 | 350100 | 347900 | 345400 | 343400 | 342800 | 342600 | 341100 | 339900 | 338900 | 338200 | 335200 | 329800 | 325500 | 323600 | 323400 | 325000 | 325800 | 323200 | 320100 | 318600 | 317400 | 315700 | 315000 | 315300 | 315600 | 313900 | 309800 | 305700 | 301800 | 299500 | 299900 | 301100 | 300300 | 298900 | 298500 | 298500 | 297000 | 296800 | 298700 | 299600 | 300700 | 303900 | 306800 | 307500 | 308500 | 310000 | 310800 | 311200 | 313000 | 315800 | 319000 | 323400 | 327500 | 330000 | 331800 | 334500 | 336000 | 335700 | 335400 | 336300 | 338800 | 342400 | 344400 | 344000 | 343900 | 345100 | 346100 | 346900 | 348000 | 349700 | 351200 | 351700 | 350700 | 350400 | 352000 | 354300 | 355900 | 356500 | 355200 | 353800 | 353700 | 354600 | 356200 | 357800 | 358200 | 358500 | 360300 | 362400 | 363700 | 365200 | 367100 | 368600 | 370200 | 372300 | 375300 | 378700 | 381400 | 381800 | 382100 | 383300 | 385100 |
| 61637 | 10023 | New York | NY | New York | New York | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1526800 | 1424500 | 1346600 | 1331300 | 1322500 | 1289300 | 1265400 | 1249700 | 1241100 | 1232700 | 1225500 | 1228200 | 1252600 | 1266100 | 1288700 | 1308100 | 1333000 | 1356400 | 1362000 | 1353600 | 1364000 | 1373900 | 1389600 | 1401600 | 1404100 | 1415800 | 1432400 | 1455400 | 1474200 | 1462300 | 1438300 | 1435500 | 1427800 | 1411200 | 1407400 | 1419700 | 1457400 | 1500800 | 1524900 | 1537800 | 1558700 | 1586100 | 1602300 | 1621100 | 1639300 | 1657400 | 1657400 | 1656100 | 1649400 | 1643400 | 1632400 | 1618200 | 1588300 | 1543600 | 1500800 | 1464200 | 1426100 | 1387300 | 1362600 | 1351700 | 1344300 | 1331800 | 1334800 | 1314200 | 1271900 | 1252300 | 1262300 | 1279200 | 1309000 | 1335300 | 1353800 | 1366400 | 1372100 | 1381300 | 1385000 | 1388100 | 1399100 | 1399800 | 1389300 | 1384700 | 1380900 | 1367900 | 1365400 | 1375100 | 1380400 | 1377000 | 1375100 | 1379000 | 1395200 | 1414500 | 1419000 | 1403100 | 1383200 | 1376700 | 1378200 | 1378700 | 1375900 | 1366700 | 1365500 | 1382200 | 1404700 | 1428000 | 1445700 | 1452900 | 1460100 | 1484400 | 1508400 | 1522800 | 1538300 | 1568600 | 1597400 | 1622900 | 1654300 | 1684600 | 1713000 | 1728800 | 1736100 | 1745900 | 1753800 | 1736600 | 1730400 | 1734500 | 1728700 | 1720800 | 1717700 | 1700100 | 1680400 | 1676400 | 1685600 | 1708100 | 1730400 | 1751800 | 1778300 | 1810400 | 1831600 | 1844400 | 1861600 | 1889600 | 1901500 | 1895300 | 1890200 | 1898400 | 1924500 | 1967300 | 1993500 | 1980700 | 1960900 | 1951300 | 1937800 | 1929800 | 1955000 | 2022400 | 2095000 | 2142300 |
| 84616 | 60614 | Chicago | IL | Chicago | Cook | 4 | 195800 | 193500 | 192600 | 192300 | 192600 | 193600 | 195500 | 197600 | 199400 | 201300 | 203600 | 206500 | 209200 | 211100 | 212600 | 214400 | 215600 | 216500 | 217900 | 220100 | 222200 | 223900 | 225400 | 227700 | 230100 | 231700 | 232700 | 233700 | 234700 | 235600 | 236800 | 238800 | 240800 | 242400 | 243800 | 246400 | 250200 | 254300 | 257600 | 261100 | 264800 | 267900 | 270700 | 272800 | 274400 | 276200 | 278600 | 280100 | 283100 | 287700 | 293600 | 298500 | 302700 | 305000 | 306800 | 309400 | 313100 | 314900 | 316200 | 318200 | 320600 | 322900 | 325500 | 328400 | 330700 | 332800 | 334400 | 335900 | 337400 | 339700 | 342300 | 343800 | 343400 | 342300 | 341800 | 341700 | 342400 | 344300 | 346900 | 348900 | 350200 | 351700 | 353500 | 355700 | 358000 | 361600 | 364000 | 365500 | 366400 | 367000 | 365200 | 363100 | 362000 | 362500 | 364800 | 368200 | 370800 | 371400 | 371200 | 371800 | 374200 | 377800 | 380100 | 381700 | 384400 | 386600 | 385700 | 385400 | 388500 | 392200 | 394500 | 394900 | 394800 | 395500 | 400000 | 404300 | 407600 | 409500 | 410400 | 408700 | 406400 | 403800 | 402400 | 402300 | 403000 | 403100 | 403600 | 404500 | 406100 | 407700 | 408700 | 409600 | 409800 | 408700 | 408600 | 411000 | 412200 | 411400 | 410200 | 408400 | 406600 | 406500 | 406400 | 404600 | 402900 | 400600 | 397500 | 392600 | 387500 | 383700 | 381000 | 378900 | 377700 | 377400 | 376700 | 374000 | 369600 | 366200 | 364500 | 364100 | 364300 | 363600 | 361900 | 361900 | 356400 | 347100 | 342400 | 344000 | 345200 | 346900 | 346100 | 342600 | 340100 | 339900 | 338600 | 335500 | 333900 | 335000 | 336000 | 334200 | 330300 | 327000 | 326000 | 326100 | 326700 | 326300 | 324400 | 322700 | 323200 | 322800 | 320700 | 319500 | 320100 | 320500 | 321800 | 323600 | 324300 | 324100 | 324700 | 326000 | 327600 | 329800 | 332600 | 336800 | 342300 | 348100 | 353600 | 358900 | 361900 | 363900 | 366200 | 368300 | 369800 | 371400 | 372400 | 373200 | 373800 | 374800 | 376200 | 376800 | 376300 | 374900 | 373800 | 373900 | 374700 | 375300 | 375000 | 374700 | 376300 | 378100 | 378000 | 377700 | 378300 | 380000 | 383100 | 385900 | 388100 | 389700 | 391800 | 393400 | 394700 | 394900 | 395700 | 396400 | 397500 | 398900 | 401200 | 403200 | 405700 | 408300 | 408800 | 408000 | 410100 | 412200 | 412200 |
| 93144 | 79936 | El Paso | TX | El Paso | El Paso | 5 | 59100 | 60500 | 60900 | 60800 | 60300 | 60400 | 61200 | 61700 | 61000 | 60100 | 59300 | 59000 | 58700 | 58400 | 58000 | 57800 | 57900 | 57800 | 57800 | 58100 | 58400 | 58700 | 59200 | 59400 | 58700 | 58100 | 57900 | 57900 | 57700 | 57600 | 57500 | 57800 | 58000 | 58000 | 57900 | 57800 | 57800 | 58000 | 58500 | 58700 | 59000 | 59200 | 59300 | 59500 | 59900 | 60300 | 60400 | 60300 | 60300 | 60000 | 59500 | 59400 | 59800 | 59900 | 59700 | 59500 | 59400 | 59500 | 59700 | 59700 | 59200 | 58700 | 58400 | 57800 | 57000 | 56700 | 56800 | 56600 | 56500 | 56500 | 56600 | 56700 | 56600 | 56700 | 57000 | 57300 | 57300 | 57300 | 57100 | 56900 | 56900 | 57000 | 56700 | 56700 | 57000 | 57400 | 57700 | 58000 | 58300 | 58600 | 58800 | 59100 | 59500 | 60200 | 61100 | 61700 | 62400 | 63100 | 63600 | 63800 | 64400 | 64900 | 65100 | 65200 | 65300 | 65600 | 66300 | 67100 | 67900 | 68800 | 69600 | 70600 | 71500 | 72100 | 72600 | 73400 | 74200 | 74500 | 75000 | 75700 | 76400 | 77100 | 77700 | 78100 | 78700 | 79600 | 80600 | 81500 | 82300 | 83200 | 83900 | 84400 | 84300 | 84400 | 85200 | 86100 | 86800 | 87400 | 87700 | 87800 | 88300 | 88800 | 88500 | 87900 | 87500 | 86600 | 85500 | 84600 | 84200 | 83900 | 83600 | 83400 | 83500 | 84100 | 84600 | 84600 | 84700 | 84600 | 84600 | 85100 | 85500 | 85800 | 86600 | 87600 | 86500 | 84200 | 83300 | 83800 | 83800 | 83600 | 83600 | 83400 | 82900 | 82400 | 82200 | 82100 | 82300 | 82800 | 82900 | 82600 | 82700 | 83100 | 83300 | 83100 | 82800 | 82500 | 82300 | 82200 | 82300 | 82200 | 81900 | 81700 | 82100 | 82600 | 82900 | 83000 | 83000 | 82900 | 82100 | 81200 | 80800 | 80700 | 81200 | 81800 | 81800 | 81400 | 81400 | 81500 | 81900 | 82000 | 81900 | 81900 | 82100 | 82100 | 81500 | 80800 | 80300 | 80100 | 80100 | 80700 | 81200 | 81700 | 81900 | 81700 | 81500 | 81700 | 81700 | 80900 | 81000 | 81500 | 81400 | 80500 | 80000 | 80100 | 80500 | 80800 | 81400 | 82300 | 82600 | 82600 | 82500 | 82500 | 82600 | 82700 | 82600 | 82400 | 82300 | 82400 | 82300 | 82500 | 83200 | 83900 | 84100 | 83900 | 83700 |
| 84640 | 60640 | Chicago | IL | Chicago | Cook | 6 | 123300 | 122600 | 122000 | 121500 | 120900 | 120600 | 120900 | 121300 | 121600 | 122100 | 122900 | 124200 | 125300 | 126100 | 126700 | 127900 | 129300 | 130400 | 131300 | 131700 | 132300 | 133500 | 134500 | 134800 | 135200 | 135500 | 136300 | 137700 | 139600 | 141600 | 143400 | 144500 | 145600 | 147400 | 149200 | 149900 | 150500 | 151700 | 152800 | 153900 | 156400 | 159400 | 161800 | 163800 | 165700 | 167100 | 168400 | 169600 | 171000 | 172400 | 174800 | 177900 | 180400 | 182300 | 184600 | 187700 | 190700 | 193100 | 196100 | 200000 | 202900 | 205500 | 207600 | 208600 | 209200 | 210000 | 210200 | 211300 | 213000 | 214100 | 215200 | 217600 | 220200 | 222400 | 224600 | 227000 | 228600 | 230100 | 232400 | 234300 | 235300 | 236200 | 237000 | 237900 | 239200 | 241300 | 243600 | 244200 | 243800 | 244500 | 245500 | 245700 | 246400 | 247500 | 248000 | 249000 | 251100 | 252900 | 253700 | 256100 | 259400 | 261300 | 262200 | 263800 | 265300 | 267600 | 270500 | 272800 | 273700 | 273600 | 273200 | 273300 | 273500 | 273800 | 274300 | 274400 | 274400 | 275300 | 276600 | 278200 | 280600 | 283100 | 284400 | 284200 | 283000 | 282000 | 282400 | 283700 | 285200 | 286700 | 286700 | 284900 | 283300 | 282700 | 282100 | 280800 | 280300 | 280200 | 280100 | 279900 | 281200 | 282300 | 282100 | 279300 | 275800 | 273100 | 272600 | 271800 | 270300 | 268600 | 266900 | 264800 | 262900 | 260700 | 258500 | 256600 | 255000 | 252900 | 250600 | 248900 | 248900 | 250100 | 250300 | 250400 | 248000 | 243800 | 240200 | 239700 | 239000 | 238200 | 236800 | 235400 | 233300 | 231300 | 230300 | 228200 | 225200 | 223100 | 222300 | 220700 | 218300 | 215900 | 213200 | 210900 | 209600 | 208200 | 205900 | 204000 | 203200 | 202500 | 200800 | 199400 | 199900 | 201900 | 204500 | 207000 | 208100 | 207100 | 205300 | 204700 | 204700 | 205800 | 208600 | 211800 | 213500 | 213800 | 215100 | 218400 | 221500 | 223900 | 226100 | 227900 | 229100 | 230200 | 230600 | 230400 | 230000 | 229000 | 227600 | 226100 | 225700 | 226200 | 226500 | 226500 | 227100 | 227800 | 229400 | 231800 | 234100 | 235400 | 235100 | 233900 | 233700 | 235300 | 237200 | 238500 | 239300 | 239600 | 239500 | 240200 | 242700 | 244900 | 247700 | 249500 | 248800 | 247000 | 247300 | 248700 | 250800 | 252800 | 253800 | 253800 | 253400 | 254100 | 255100 |
#selecting the desired city
filter_city <- 'New York'
#selecting the desired number of bedrooms
filter_bedroom <- 2
start_date <- '2014'
end_date <- '2017'
head(zillow)Filtering only the rows that have City as New York. Checking the column years for null values and dropping the columns with missing values
zillow<- zillow %>% filter(City==filter_city) %>% select(RegionID,RegionName,City,State,Metro,CountyName,SizeRank,'1996-04':'2017-06')
# Filtering zillow data based on parameter filter_city
head(zillow)
colSums(is.na(zillow))
cond1 <- sapply(zillow, function(col) sum(is.na(col)) == 0)
zillow<-zillow[,cond1,drop=T]Removing the year 2007 columns since it is starting with mid year
We have data for every months from 2008 to 2017. It means there are (12*10) columns of price data for the past 10 years, which is cumbersome to process and analyse. Median is chosen as opposed to Mean to guard against large outlier values which may lead to bias in analysis.
Process: 1) Calculate median price of each individual zipcodes across the years from 2008 to 2017. 2) Asign the median price calculated to the respective zipcodes.
zillow<-gather(zillow,key='year',value='price','2008-01':'2017-06')
zillow$year<-substr(zillow$year,1,4)
zillow<-zillow[,-1]
zillow2<-zillow %>% group_by(RegionName,year) %>% mutate(median_price = median(price)) %>% ungroup()
zillow2<-zillow2[,-3]
zillow2 %>% arrange(RegionName)## # A tibble: 2,850 x 3
## RegionName year median_price
## <chr> <chr> <dbl>
## 1 10003 2008 1532950
## 2 10003 2008 1532950
## 3 10003 2008 1532950
## 4 10003 2008 1532950
## 5 10003 2008 1532950
## 6 10003 2008 1532950
## 7 10003 2008 1532950
## 8 10003 2008 1532950
## 9 10003 2008 1532950
## 10 10003 2008 1532950
## # … with 2,840 more rows
We now have median price for every zipcode for every year from 2008 to 2017 i.e for 25 zipcodes and 10 years.Plotting the distribuition of median price over the past 10 years (2008-2017) to see the trend in the real estate market.
zillow2$RegionName <- as.factor(zillow2$RegionName)
zipcodemedianprice<-ggplot(zillow2,aes(
x = year,
y = median_price,
group = RegionName,
colour = RegionName
)) + geom_line() + geom_point() + scale_y_continuous(labels = scales::comma)+theme(legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(zipcodemedianprice)As seen from the plot above, the prices of houses are fluctuating for some zipcodes. While for other zipcodes it has been consistent moslty. So it’s better not consider these years for the analysis and start with 2014 to get less variability in prices. Dropped the years from 2008 to 2013.
The dataset has latest prices for year 2017, but for the analysis in 2019 I’ll try to predict the median for the year 2019. The approach is to calculate the growth for every consecutive year and then take an average of it. And this average growth per year is used to calculate the prices for year 2019.
The columns year1,year2 and year3 shows the relative increase in the prices of house from the previous year. The avg_growth column is the average of year1,year2 and year3. 2018 and 2019 shows the predicted prices for year 2018 and 2019.
zillow2<- zillow2 %>% mutate(year1 = `2015`/`2014`) %>%
mutate(year2 = `2016`/`2015`) %>%
mutate(year3 = `2017`/`2016`) %>%
mutate(avg_growth = (year1+year2+year3)/3) %>%
mutate(`2018` = avg_growth * `2017`) %>%
mutate(`2019` = avg_growth * `2018`)
zillow2<-zillow2[,-c(2:10)]
zillow2<-gather(zillow2,key='year',value='price', '2019')
head(zillow2)## # A tibble: 6 x 3
## RegionName year price
## <fct> <chr> <dbl>
## 1 10003 2019 2208878.
## 2 10011 2019 2601297.
## 3 10013 2019 3320756.
## 4 10014 2019 2776339.
## 5 10021 2019 1905416.
## 6 10022 2019 1988830.
Removing the columns we don’t need and renaming the column 2019 to price, which is the predicted price for year 2019. This is the final cleansed cost data that is used further in this analysis.
Plotting median price(predicted) of properties in 2019 by zipcodes.
medianpriceplot<-ggplot(zillow2 %>% arrange(desc(price)) %>% mutate(zipcode=factor(RegionName, levels=RegionName)),aes(x = as.factor(RegionName),y = price)) + geom_col(fill="#339999") + scale_y_continuous(labels = scales::comma) +labs(x = "Zipcodes", y = "Price of properties")+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(medianpriceplot)| id | listing_url | scrape_id | last_scraped | name | summary | space | description | experiences_offered | neighborhood_overview | notes | transit | access | interaction | house_rules | thumbnail_url | medium_url | picture_url | xl_picture_url | host_id | host_url | host_name | host_since | host_location | host_about | host_response_time | host_response_rate | host_acceptance_rate | host_is_superhost | host_thumbnail_url | host_picture_url | host_neighbourhood | host_listings_count | host_total_listings_count | host_verifications | host_has_profile_pic | host_identity_verified | street | neighbourhood | neighbourhood_cleansed | neighbourhood_group_cleansed | city | state | zipcode | market | smart_location | country_code | country | latitude | longitude | is_location_exact | property_type | room_type | accommodates | bathrooms | bedrooms | beds | bed_type | amenities | square_feet | price | weekly_price | monthly_price | security_deposit | cleaning_fee | guests_included | extra_people | minimum_nights | maximum_nights | minimum_minimum_nights | maximum_minimum_nights | minimum_maximum_nights | maximum_maximum_nights | minimum_nights_avg_ntm | maximum_nights_avg_ntm | calendar_updated | has_availability | availability_30 | availability_60 | availability_90 | availability_365 | calendar_last_scraped | number_of_reviews | number_of_reviews_ltm | first_review | last_review | review_scores_rating | review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | requires_license | license | jurisdiction_names | instant_bookable | is_business_travel_ready | cancellation_policy | require_guest_profile_picture | require_guest_phone_verification | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2539 | https://www.airbnb.com/rooms/2539 | 2.019071e+13 | 2019-07-09 | Clean & quiet apt home by the park | Renovated apt home in elevator building. | Spacious, renovated, and clean apt home, one block to F train, 25 minutes to lower Manhatten | Renovated apt home in elevator building. Spacious, renovated, and clean apt home, one block to F train, 25 minutes to lower Manhatten Close to Prospect Park and Historic Ditmas Park Very close to F and G trains and Express bus into NY. The B and Q are closeby also. If this room is unavailable on your desired dates, check out our other rooms, such as: https://www.airbnb.com/rooms/10267242 | none | Close to Prospect Park and Historic Ditmas Park | If this room is unavailable on your desired dates, check out our other rooms, such as: https://www.airbnb.com/rooms/10267242 | Very close to F and G trains and Express bus into NY. The B and Q are closeby also. | NA | NA | -The security and comfort of all our guests is important to us! Therefore, no one is permitted to check in without first emailing a clear government issued photo ID, acceptable to manager. Instructions will be provided in the house manual. -No eating, drinking or storage of food in the room. -No smoking. -Illicit drug use is strictly forbidden. -Quiet hours after 10pm and before 8am, This is respectful for our neighbors and other guests. -Please clean up after yourself when using the kitchen and bath. -Please lock the doors and close the windows when exiting the home. -Please indicate and pay for the correct number of guests. Failure to do so will cancel your reservation with no refund due. -Please remove your shoes when entering the apt home. Thanks for choosing our home! | NA | NA | https://a0.muscache.com/im/pictures/3949d073-a02e-4ebc-aa9c-ac74f00eaa1f.jpg?aki_policy=large | NA | 2787 | https://www.airbnb.com/users/show/2787 | John | 2008-09-07 | New York, New York, United States | Educated professional living in Brooklyn. I love meeting new people, running, hiking, fine foods, traveling, etc. One of my favorite trips was spending New Year’s Eve in London on the Thames River. Big Ben, spectacular fireworks and light show; and fun times with a good crowd of international tourists. A most memorable night and trip! Also, I generally approach life with a positive attitude. I look forward to meeting you. | within an hour | 100% | N/A | FALSE | https://a0.muscache.com/im/pictures/8674565a-758d-476b-a580-4d99ea9baab9.jpg?aki_policy=profile_small | https://a0.muscache.com/im/pictures/8674565a-758d-476b-a580-4d99ea9baab9.jpg?aki_policy=profile_x_medium | Gravesend | 6 | 6 | [‘email’, ‘phone’, ‘reviews’, ‘kba’] | TRUE | TRUE | Brooklyn , NY, United States | Brooklyn | Kensington | Brooklyn | Brooklyn | NY | 11218 | New York | Brooklyn , NY | US | United States | 40.64749 | -73.97237 | FALSE | Apartment | Private room | 2 | 1 | 1 | 1 | Real Bed | {TV,“Cable TV”,Internet,Wifi,“Wheelchair accessible”,Kitchen,“Free parking on premises”,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,“Suitable for events”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,“Safety card”,“Fire extinguisher”,Essentials,Shampoo,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”,“Self check-in”,Keypad,“Outlet covers”,“Hot water”,“Bed linens”,“Extra pillows and blankets”,Microwave,“Coffee maker”,Refrigerator,“Dishes and silverware”,“Cooking basics”,Oven,Stove,“Luggage dropoff allowed”,“Long term stays allowed”,“Cleaning before checkout”} | NA | $149.00 | $299.00 | $999.00 | $100.00 | $25.00 | 1 | $35.00 | 1 | 730 | 1 | 1 | 730 | 730 | 1 | 730 | 3 weeks ago | TRUE | 30 | 60 | 90 | 365 | 2019-07-09 | 9 | 2 | 2015-12-04 | 2018-10-19 | 98 | 10 | 10 | 10 | 10 | 10 | 10 | FALSE | NA | NA | FALSE | FALSE | moderate | FALSE | FALSE | 6 | 0 | 5 | 1 | 0.21 |
| 2595 | https://www.airbnb.com/rooms/2595 | 2.019071e+13 | 2019-07-09 | Skylit Midtown Castle | Find your romantic getaway to this beautiful, spacious skylit studio in the heart of Midtown, Manhattan. STUNNING SKYLIT STUDIO / 1 BED + SINGLE / FULL BATH / FULL KITCHEN / FIREPLACE / CENTRALLY LOCATED / WiFi + APPLE TV / SHEETS + TOWELS |
|
Find your romantic getaway to this beautiful, spacious skylit studio in the heart of Midtown, Manhattan. STUNNING SKYLIT STUDIO / 1 BED + SINGLE / FULL BATH / FULL KITCHEN / FIREPLACE / CENTRALLY LOCATED / WiFi + APPLE TV / SHEETS + TOWELS - Spacious (500+ft²), immaculate and nicely furnished & designed studio. - Tuck yourself into the ultra comfortable bed under the skylight. Fall in love with a myriad of bright lights in the city night sky. - Single-sized bed/convertible floor mattress with luxury bedding (available upon request). - Gorgeous pyramid skylight with amazing diffused natural light, stunning architectural details, soaring high vaulted ceilings, exposed brick, wood burning fireplace, floor seating area with natural zafu cushions, modern style mixed with eclectic art & antique treasures, large full bath, newly renovated kitchen, air conditioning/heat, high speed WiFi Internet, and Apple TV. - Centrally located in the heart of Midtown Manhattan just a few blocks from all s | none | Centrally located in the heart of Manhattan just a few blocks from all subway connections in the very desirable Midtown location a few minutes walk to Times Square, the Theater District, Bryant Park and Herald Square. | NA | Apartment is located on 37th Street between 5th & 6th Avenue, just a few blocks from all subway connections. Closest Subways (in order of proximity to apartment (Website hidden by Airbnb) W: 34th Street & 6th Avenu (Website hidden by Airbnb) 3: 34th Street & 7th Avenue 7: 42nd & 5th Avenu (Website hidden by Airbnb) S: 42nd Street between Park & Lexington Avenue (Website hidden by Airbnb) E: 34th Street and 8th Avenue If coming by car, there is a parking garage on the block and free street parking. | Guests have full access to the kitchen, bathroom and living spaces. The closets are private/off limits. | I am a Sound Therapy Practitioner and Kundalini Yoga & Meditation teacher. I work with energy and sound for relaxation and healing, using Symphonic gong, singing bowls, tuning forks, drums, voice and other instruments. Sound relaxation sessions and/or personalized Kundalini Yoga sessions are available in the space upon request. Individual, couples or group sessions available. Licensed acupuncture and massage also available upon request. Please inquire. I welcome my guests at the apartment for check-in, or alternatively, a self check-in can be arranged. Once you are settled in, I am just a phone call, text or email away, should you have any questions, concerns or issues during your stay. My desire is that you have a smooth arrival and amazing stay here. | Make yourself at home, respect the space and the neighbors. No pets, no smoking and no unauthorized guests. | NA | NA | https://a0.muscache.com/im/pictures/f0813a11-40b2-489e-8217-89a2e1637830.jpg?aki_policy=large | NA | 2845 | https://www.airbnb.com/users/show/2845 | Jennifer | 2008-09-09 | New York, New York, United States | A New Yorker since 2000! My passion is creating beautiful, unique spaces where unforgettable memories are made. It’s my pleasure to host people from around the world and meet new faces. Welcome travelers! I am a Sound Therapy Practitioner and Kundalini Yoga & Meditation teacher. I work with energy and sound for relaxation and healing, using Symphonic gong, singing bowls, tuning forks, drums, voice and other instruments. | within a few hours | 87% | N/A | FALSE | https://a0.muscache.com/im/users/2845/profile_pic/1259095067/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/2845/profile_pic/1259095067/original.jpg?aki_policy=profile_x_medium | Midtown | 5 | 5 | [‘email’, ‘phone’, ‘reviews’, ‘kba’, ‘work_email’] | TRUE | TRUE | New York, NY, United States | Manhattan | Midtown | Manhattan | New York | NY | 10018 | New York | New York, NY | US | United States | 40.75362 | -73.98377 | FALSE | Apartment | Entire home/apt | 2 | 1 | 0 | 1 | Real Bed | {TV,Wifi,“Air conditioning”,Kitchen,“Paid parking off premises”,“Free street parking”,“Indoor fireplace”,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self check-in”,Keypad,“Private living room”,Bathtub,“Hot water”,“Bed linens”,“Extra pillows and blankets”,“Ethernet connection”,“Coffee maker”,Refrigerator,“Dishes and silverware”,“Cooking basics”,Oven,Stove,“Luggage dropoff allowed”,“Long term stays allowed”,“Cleaning before checkout”,“Wide entrance for guests”,“Flat path to guest entrance”,“Well-lit path to entrance”,“No stairs or steps to enter”} | NA | $225.00 | $1,995.00 | NA | $350.00 | $100.00 | 2 | $0.00 | 1 | 1125 | 1 | 1 | 1125 | 1125 | 1 | 1125 | 4 days ago | TRUE | 25 | 55 | 80 | 355 | 2019-07-09 | 45 | 11 | 2009-11-21 | 2019-05-21 | 95 | 10 | 9 | 10 | 10 | 10 | 9 | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 2 | 1 | 0 | 1 | 0.38 |
| 3647 | https://www.airbnb.com/rooms/3647 | 2.019071e+13 | 2019-07-08 | THE VILLAGE OF HARLEM….NEW YORK ! | NA | WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY This Spacious 1 bedroom is with Plenty of Windows with a View……. Sleeps…..Four Adults…..two in the Livingrm. with (2) Sofa-beds. (Website hidden by Airbnb) two in the Bedrm.on a very Comfortable Queen Size Bed… A Complete Bathrm…..With Shower and Bathtub……. Fully Equipped with Linens & Towels…….. Spacious Living Room……Flat ScreenTelevision…..DVD Player with Movies available for your viewing during your stay………………………………………………………………….. Dining Area…..for Morning Coffee or Tea…………………………………………….. The Kitchen Area is Modern with Granite Counter Top… includes the use of a Coffee Maker…Microwave to Heat up a Carry Out/In Meal…. Not suited for a Gourmet Cook…or Top Chef……Sorry!!!! . This Flat is located in HISTORIC HARLEM…. near the Appollo Theater and The Museum Mile…on Fifth Avenue. Sylvia’s World Famous Resturant…loca | WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY This Spacious 1 bedroom is with Plenty of Windows with a View……. Sleeps…..Four Adults…..two in the Livingrm. with (2) Sofa-beds. (Website hidden by Airbnb) two in the Bedrm.on a very Comfortable Queen Size Bed… A Complete Bathrm…..With Shower and Bathtub……. Fully Equipped with Linens & Towels…….. Spacious Living Room……Flat ScreenTelevision…..DVD Player with Movies available for your viewing during your stay………………………………………………………………….. Dining Area…..for Morning Coffee or Tea…………………………………………….. The Kitchen Area is Modern with Granite Counter Top… includes the use of a Coffee Maker…Microwave to Heat up a Carry Out/In Meal…. Not suited for a Gourmet Cook…or Top Chef……Sorry!!!! . This Flat is located in HISTORIC HARLEM…. near the Appollo Theater and The Museum Mile…on Fifth Avenue. Sylvia’s World Famous Resturant…loca | none | NA | NA | NA | NA | NA | Upon arrival please have a legibile copy of your Passport and / or State Photo. ID. as well as your confirmation letter. Please NO SMOKING …LOUD TALKING or PARTIES of any kind. Security Deposit and Cleaning Fees in CASH at time of arrival. Security deposit will be refunded within 72hrs pending no damages. .Cleaning fees are non-refundable. At Check Out… Please dispose of all trash/garbage in the bins located outside behind the entrance stairs. Please place all dirty linens and towels in the dirty laundry bin. Please don’t leave any dishes in the sink and dispose of all Plastic Dishes/Plastic Culinary. If you need a late check-out please contact the Emergency Contact Telephone Numbers that are listed in the Flat. PLEASE LEAVE ALL KEYS IN THE FLAT and BE CERTAIN THE DOORS ARE LOCKED. WE HOPE YOU ENJOYED YOUR STAY and will write positve comments and will visit again. | NA | NA | https://a0.muscache.com/im/pictures/838341/9b3c66f3_original.jpg?aki_policy=large | NA | 4632 | https://www.airbnb.com/users/show/4632 | Elisabeth | 2008-11-25 | New York, New York, United States | Make Up Artist National/ (Website hidden by Airbnb) Production. I m curently working with a Production Company in WDC and Coordinated a “Day Of Service” for the Presidential Innaugration 2013. I can’t live without Starbucks and Oprah’s Chai Tea Latte…and my circle of of great friends world wide. “BLESS ME INTO USEFULLNESS” is my daily prayer. I | within a day | 100% | N/A | FALSE | https://a0.muscache.com/im/users/4632/profile_pic/1328402497/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/4632/profile_pic/1328402497/original.jpg?aki_policy=profile_x_medium | Harlem | 1 | 1 | [‘email’, ‘phone’, ‘google’, ‘reviews’, ‘jumio’, ‘government_id’] | TRUE | TRUE | New York, NY, United States | Harlem | Harlem | Manhattan | New York | NY | 10027 | New York | New York, NY | US | United States | 40.80902 | -73.94190 | TRUE | Apartment | Private room | 2 | 1 | 1 | 1 | Pull-out Sofa | {“Cable TV”,Internet,Wifi,“Air conditioning”,Kitchen,“Buzzer/wireless intercom”,Heating,“Smoke detector”,“Carbon monoxide detector”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”} | NA | $150.00 | NA | NA | $200.00 | $75.00 | 2 | $20.00 | 3 | 7 | 3 | 3 | 7 | 7 | 3 | 7 | 34 months ago | TRUE | 30 | 60 | 90 | 365 | 2019-07-08 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 1 | 0 | 1 | 0 | NA |
| 3831 | https://www.airbnb.com/rooms/3831 | 2.019071e+13 | 2019-07-09 | Cozy Entire Floor of Brownstone | Urban retreat: enjoy 500 s.f. floor in 1899 brownstone, with wood and ceramic flooring throughout (completed Aug. 2015 through Sept. 2015), roomy bdrm, & upgraded kitchen & bathroom (completed Oct. 2015). It’s sunny and loaded with everything you need! | Greetings! We own a double-duplex brownstone in Clinton Hill on Gates near Classon Avenue - (7 blocks to C train, 5 blocks to G train, minutes to all), in which we host on the entire top floor of the upper duplex. This is more of an efficiency set-up: it is the top floor on a two-family, double duplex brownstone. The top floor for our guests consists of a sizable bedroom, full bath and eat-in kitchen for your exclusive use. Our family occupies the floors below. You go through a common hallway and staircase, to get to the top floor (2 easy flights up from the main entrance), but not through any rooms, so it is a fairly private set-up. - Clinton Hill, Gates Avenue near Classon Ave. (1 mi. or less to Williamsburg, Park Slope, Prospect Heights, downtown, Ft. Greene, Bed-Stuy, Bushwick; 20 mins to Manhattan) - includes FiOS, heat (or A/C), hot water, and electricity all included - furnished with two twin beds (convertible into a king bed), one rollaway twin bed and one inflatable | Urban retreat: enjoy 500 s.f. floor in 1899 brownstone, with wood and ceramic flooring throughout (completed Aug. 2015 through Sept. 2015), roomy bdrm, & upgraded kitchen & bathroom (completed Oct. 2015). It’s sunny and loaded with everything you need! Greetings! We own a double-duplex brownstone in Clinton Hill on Gates near Classon Avenue - (7 blocks to C train, 5 blocks to G train, minutes to all), in which we host on the entire top floor of the upper duplex. This is more of an efficiency set-up: it is the top floor on a two-family, double duplex brownstone. The top floor for our guests consists of a sizable bedroom, full bath and eat-in kitchen for your exclusive use. Our family occupies the floors below. You go through a common hallway and staircase, to get to the top floor (2 easy flights up from the main entrance), but not through any rooms, so it is a fairly private set-up. - Clinton Hill, Gates Avenue near Classon Ave. (1 mi. or less to Williamsburg, Park Slope, Pro | none | Just the right mix of urban center and local neighborhood; close to all but enough quiet for a calming walk. | NA | B52 bus for a 10-minute ride to downtown Brooklyn is a few yards away on the corner; G train/Classon Avenue is 5 blocks away; C train is about 6 blocks to either the Clinton/Washington stop or Franklin Avenue stop. There is on-street parking, alternate side is twice per week on the immediate block but only once per week on Classon. From LaGuardia Airport, a taxi will cost $30-$35, but there is also a bus that will put you at the Jackson Heights subway station, and from there it’s about 5 stops to catch the G train, which stops 5 blocks away. From JFK, the taxi is closer to $40, but the AirTran can get you conveniently to the A/C line and the C train is about 6 blocks from here. From JFK via subway/metro/train: From JFK take the AirTrain to Howard Beach to catch the A train toward Brooklyn/Manhattan. Take the A train to Utica Avenue and go across that same platform to catch the C local train (you could also transfer at Nostrand but you would have to carry luggage downstairs to cat | You will have exclusive use of and access to: a sizable private room as described in “The Space” section, furnished with two twin beds (which we will combine into one king bed upon request) and optional rollaway twin and/or inflatable beds, and other small furnishings; full private bath and private eat-in kitchen both renovated in Fall 2015; sizable dining table in sun-filled kitchen area doubles as a great desk space; alcove perfect for vertical bike storage. Upon request you may also have some use of the livingroom on the floor just below. | We’ll be around, but since you have the top floor to yourself, most of the interaction is on the way in or out - we’re open to socializing and did so frequently with our last long-term guests, so it’s really up to you | Smoking - outside please; pets allowed but please contact me first for arrangements | NA | NA | https://a0.muscache.com/im/pictures/e49999c2-9fd5-4ad5-b7cc-224deac989aa.jpg?aki_policy=large | NA | 4869 | https://www.airbnb.com/users/show/4869 | LisaRoxanne | 2008-12-07 | New York, New York, United States | Laid-back bi-coastal actor/professor/attorney. | within a few hours | 93% | N/A | FALSE | https://a0.muscache.com/im/users/4869/profile_pic/1371927771/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/4869/profile_pic/1371927771/original.jpg?aki_policy=profile_x_medium | Clinton Hill | 1 | 1 | [‘email’, ‘phone’, ‘reviews’, ‘kba’] | TRUE | TRUE | Brooklyn, NY, United States | Brooklyn | Clinton Hill | Brooklyn | Brooklyn | NY | 11238 | New York | Brooklyn, NY | US | United States | 40.68514 | -73.95976 | TRUE | Guest suite | Entire home/apt | 3 | 1 | 1 | 4 | Real Bed | {TV,“Cable TV”,Internet,Wifi,“Air conditioning”,Kitchen,“Pets allowed”,“Free street parking”,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self check-in”,Lockbox,Bathtub,“High chair”,“Stair gates”,“Children’s books and toys”,“Pack ’n Play/travel crib”,“Hot water”,“Luggage dropoff allowed”,“Long term stays allowed”} | 500 | $89.00 | $575.00 | $2,100.00 | $500.00 | NA | 1 | $0.00 | 1 | 730 | 1 | 1 | 730 | 730 | 1 | 730 | today | TRUE | 0 | 0 | 3 | 194 | 2019-07-09 | 270 | 69 | 2014-09-30 | 2019-07-05 | 90 | 10 | 9 | 10 | 10 | 10 | 9 | FALSE | NA | NA | FALSE | FALSE | moderate | FALSE | FALSE | 1 | 1 | 0 | 0 | 4.64 |
| 5022 | https://www.airbnb.com/rooms/5022 | 2.019071e+13 | 2019-07-08 | Entire Apt: Spacious Studio/Loft by central park | NA | Loft apartment with high ceiling and wood flooring located 10 minutes away from Central Park in Harlem - 1 block away from 6 train and 3 blocks from 2 & 3 line. This is in a recently renovated building which includes elevator, trash shoot. marble entrance and laundromat in the basement. The apartment is a spacious loft studio. The seating area and sleeping area is divided by a bookcase. There is a long hallway entrance where the bathroom and closet for your clothes is situated. The apartment is in mint condition, the walls have been freshly painted a few months ago. Supermarket, and 24 hour convenience store less than 1 block away. 1 block away from Hot Yoga Studio and NY Sports club facility. Perfect for anyone wanting to stay in Manhattan but get more space. 10 minutes away from midtown and 15 minutes away from downtown. The neighborhood is lively and diverse. You will need to travel at least 10 blocks to find cafe’s, restaurants etc.. There are a few restaurants on 100 street on | Loft apartment with high ceiling and wood flooring located 10 minutes away from Central Park in Harlem - 1 block away from 6 train and 3 blocks from 2 & 3 line. This is in a recently renovated building which includes elevator, trash shoot. marble entrance and laundromat in the basement. The apartment is a spacious loft studio. The seating area and sleeping area is divided by a bookcase. There is a long hallway entrance where the bathroom and closet for your clothes is situated. The apartment is in mint condition, the walls have been freshly painted a few months ago. Supermarket, and 24 hour convenience store less than 1 block away. 1 block away from Hot Yoga Studio and NY Sports club facility. Perfect for anyone wanting to stay in Manhattan but get more space. 10 minutes away from midtown and 15 minutes away from downtown. The neighborhood is lively and diverse. You will need to travel at least 10 blocks to find cafe’s, restaurants etc.. There are a few restaurants on 100 street on | none | NA | NA | NA | NA | NA | Please be considerate when staying in the apartment. This is a low key building and it’s important guest are respectful. You can come and go as you please I just ask that you keep a low profile. 1) Please be respectful of neighbors - no loud music after 10pm and keep a low profile 2) Do not open the door for anyone 3) Please keep the apt clean 4) No access to mailbox - please forward personal mail to job or school | NA | NA | https://a0.muscache.com/im/pictures/feb453bd-fdec-405c-8bfa-3f6963d827e9.jpg?aki_policy=large | NA | 7192 | https://www.airbnb.com/users/show/7192 | Laura | 2009-01-29 | Miami, Florida, United States | I have been a NYer for almost 10 years. I came to NY to study and never left. I work in the advertising industry and love to eat peanut butter & jelly sandwiches. | N/A | N/A | N/A | FALSE | https://a0.muscache.com/im/users/7192/profile_pic/1325651676/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/7192/profile_pic/1325651676/original.jpg?aki_policy=profile_x_medium | East Harlem | 1 | 1 | [‘email’, ‘phone’, ‘facebook’, ‘reviews’, ‘kba’] | TRUE | TRUE | New York, NY, United States | East Harlem | East Harlem | Manhattan | New York | NY | 10029 | New York | New York, NY | US | United States | 40.79851 | -73.94399 | TRUE | Apartment | Entire home/apt | 1 | 1 | NA | 1 | Real Bed | {Internet,Wifi,“Air conditioning”,Kitchen,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,Essentials,Shampoo,“Hair dryer”,“Hot water”,“Host greets you”} | NA | $80.00 | $600.00 | $1,600.00 | $100.00 | $80.00 | 1 | $20.00 | 10 | 120 | 10 | 10 | 120 | 120 | 10 | 120 | 3 months ago | TRUE | 0 | 0 | 0 | 0 | 2019-07-08 | 9 | 4 | 2012-03-20 | 2018-11-19 | 93 | 10 | 9 | 10 | 10 | 9 | 10 | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 1 | 1 | 0 | 0 | 0.10 |
| 5099 | https://www.airbnb.com/rooms/5099 | 2.019071e+13 | 2019-07-08 | Large Cozy 1 BR Apartment In Midtown East | My large 1 bedroom apartment is true New York City living. The apt is in midtown on the east side and centrally located, just a 10-minute walk from Grand Central Station, Empire State Building, Times Square. The kitchen and living room are large and bright with Apple TV. I have a new Queen Bed that sleeps 2 people, and a Queen Aero Bed that can sleep 2 people in the living room. The apartment is located on the 5th floor of a walk up - no elevator (lift). | I have a large 1 bedroom apartment centrally located in Midtown East. A 10 minute walk from Grand Central Station, Times Square, Empire State Building and all major subway and bus lines. The apartment is located on the 5th floor of a pre-war walk up building-no elevator/lift. The apartment is bright with has high ceilings and flow through rooms. A spacious, cozy living room with Netflix and Apple TV. A large bright kitchen to sit and enjoy coffee or tea. The bedroom is spacious with a comfortable queen size bed that sleeps 2. I have a comfortable queen size aero bed that fits in the living room and sleeps 2. It can be tucked away for living space and opened when ready for bed. I’d be happy to give you tips and advice on the best ways to experience the most of NYC. The apartment’s location is great for sightseeing. ** Check out my listing guidebook ** If you would like to stay local in the area, there is a very long & famous strip of bars and restaurants along 3rd Avenue, which | My large 1 bedroom apartment is true New York City living. The apt is in midtown on the east side and centrally located, just a 10-minute walk from Grand Central Station, Empire State Building, Times Square. The kitchen and living room are large and bright with Apple TV. I have a new Queen Bed that sleeps 2 people, and a Queen Aero Bed that can sleep 2 people in the living room. The apartment is located on the 5th floor of a walk up - no elevator (lift). I have a large 1 bedroom apartment centrally located in Midtown East. A 10 minute walk from Grand Central Station, Times Square, Empire State Building and all major subway and bus lines. The apartment is located on the 5th floor of a pre-war walk up building-no elevator/lift. The apartment is bright with has high ceilings and flow through rooms. A spacious, cozy living room with Netflix and Apple TV. A large bright kitchen to sit and enjoy coffee or tea. The bedroom is spacious with a comfortable queen size bed that sleeps 2. I | none | My neighborhood in Midtown East is called Murray Hill. The area is very centrally located with easy access to explore . The apartment is about 5 blocks (7 minute walk) to the United Nations and Grand Central Station the main and most historic train station. Grand Central will give you access to every train in the city. The apartment is also very close to main attractions, It’s about a 10 minute walk to both the Empire State Building and Times Square. There’s a great shopping area with dozens of stores including H&M, Zara, The Gap, BeBe and the world famous Macy’s department store. These shops are a 10 minute walk up East 34th Street from 5th avenue and 8th avenue. If you would like to stay local in the area, there is a very long & famous strip of bars and restaurants along 3rd avenue, which is just around the corner from the apartment. It’s commonly known as the 3rd avenue strip. | Read My Full Listing For All Information. New York City really is the city that doesn’t sleep. There’s a constant flow of people, bikes and cars. The city can be noisy at times, if you’re a light sleeper, ear plugs would help. Check out my local guide book for things to do. | From the apartment is a 10 minute walk to Grand Central Station on East 42nd Street, a 10 minute walk to the Empire State Building on East 34th Street and 5th Avenue, a 10 minute walk to Times Square on West 42 Street and about 20 minutes walk to Central Park on 59th Street. Depending on how long you are staying, I would recommend a 7 day unlimited metro card. This allows you to travel unlimited all day and night on any train or bus in and outside of the city. Grand Central Station is the main NYC train station. You can find any train connection and get anywhere in Manhattan from Grand Central Station. You can get to Brooklyn, Queens, The Bronx and Staten Island from Grand Central Station The M15 bus is around the corner on 2nd avenue. This bus will take you from uptown Harlem to the East Village and South Street Seaport. It will also take you to the Staten Island ferry and Statue of Liberty and everywhere in between along the east side. The M101 bus is just up the street on 3rd aven | I will meet you upon arrival. | I usually check in with guests via text or email. I’m available by text, email or phone call with any questions, suggestions or to help out. | • Check-in time is 2PM. • Check-out time is 12 PM. Please be respectful of the space and leave the apartment in the condition you were welcomed into. | NA | NA | https://a0.muscache.com/im/pictures/0790b1a5-8981-41cc-a370-fa2b982a8803.jpg?aki_policy=large | NA | 7322 | https://www.airbnb.com/users/show/7322 | Chris | 2009-02-02 | New York, New York, United States | I’m an artist, writer, traveler, and a native new yorker. Welcome to my city. | within an hour | 100% | N/A | FALSE | https://a0.muscache.com/im/pictures/user/26745d24-d818-4bf5-8f9e-26b097121ba7.jpg?aki_policy=profile_small | https://a0.muscache.com/im/pictures/user/26745d24-d818-4bf5-8f9e-26b097121ba7.jpg?aki_policy=profile_x_medium | Flatiron District | 1 | 1 | [‘email’, ‘phone’, ‘reviews’, ‘jumio’, ‘government_id’] | TRUE | FALSE | New York, NY, United States | Midtown East | Murray Hill | Manhattan | New York | NY | 10016 | New York | New York, NY | US | United States | 40.74767 | -73.97500 | FALSE | Apartment | Entire home/apt | 2 | 1 | 1 | 1 | Real Bed | {TV,“Cable TV”,Internet,Wifi,Kitchen,“Buzzer/wireless intercom”,Heating,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”,“Hot water”,“Bed linens”,“Extra pillows and blankets”,“Host greets you”} | NA | $200.00 | NA | NA | $300.00 | $125.00 | 2 | $100.00 | 3 | 21 | 3 | 3 | 21 | 21 | 3 | 21 | 3 weeks ago | TRUE | 23 | 48 | 48 | 129 | 2019-07-08 | 74 | 9 | 2009-04-20 | 2019-06-22 | 89 | 10 | 9 | 10 | 10 | 9 | 9 | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 1 | 1 | 0 | 0 | 0.59 |
Revenue is a large dataset with 48000 rows and 106 columns. Before checking for missing values Removing the columns in the airbnb dataset that has information related to the hosts, url and other data points not related to our analysis.
Removing column names specific to hosts and some other columns not necessary for the analysis.
pattern <-
colnames(airbnb %>% dplyr::select(starts_with("require"),starts_with("host"),starts_with("calendar"),
ends_with("url"),ends_with("nights")))
airbnb <- airbnb[,!(names(airbnb) %in% pattern)]
dropCol <-
c(
'scrape_id',
'name',
'summary',
'description',
'neighborhood_overview',
'notes',
'transit',
'access',
'interaction',
'house_rules',
'street',
'market',
'smart_location',
'country_code',
'CountyName',
'bed_type',
'amenities',
'guests_included',
'extra_people',
'review_scores_communication',
'review_scores_checkin',
'review_scores_cleanliness',
'review_scores_value',
'reviews_per_month',
'instant_bookable',
'is_location_exact'
)
airbnb <- airbnb[,!(names(airbnb) %in% dropCol)]Checking for missing values and dropping the columns with more than 50% missing values and some other not so useful columns
colSums(is.na(airbnb))*100/nrow(airbnb)
airbnb <- airbnb[,-c(3,4,5,6,9,11,22,23,26,27,28,34,35,36,38,39,40,41,42,43,44,45,46,47)]Dropping the columns with more than 50% missing values and some other not so useful columns
Merging the cost(zillow) datset with revenue(airbnb) dataset on zipcodes and checking for duplicates and missing values in our final dataset.
final_data <- merge(airbnb, zillow2, by.x = "zipcode",by.y = "RegionName")
final_data <- final_data %>% filter(bedrooms==filter_bedroom)
final_data<-final_data[!duplicated(final_data[1:25]),]
final_data<-final_data %>%
rename(
cost = price.x,
revenue = price.y
)Creating a copy for the final data. In case of any discrepencies later during the analysis, there’s no need to read and load the data again. Final data can be loaded and continued.
Creating a function to clean the price columns. This fucntion removes any spaces and/or $ signs then converts type to numeric from character.
clean_price <- function(df, col)
{
data<-df[,which(colnames(df)==col)]
clean_data <- str_replace_all(data, fixed("$"), "")
clean_data <-as.numeric(str_replace_all(clean_data, fixed(","), ""))
return(clean_data)
}
airbnb$cost <-clean_price(airbnb,"cost")This is the final cleansed dataset that will be used henceforth. Here’s a summary of the dataset.
## zipcode id last_scraped
## Min. :10003 Min. : 16458 Min. :2019-07-08
## 1st Qu.:10014 1st Qu.: 7664343 1st Qu.:2019-07-08
## Median :10028 Median :17586014 Median :2019-07-08
## Mean :10426 Mean :17837441 Mean :2019-07-08
## 3rd Qu.:11215 3rd Qu.:28501387 3rd Qu.:2019-07-09
## Max. :11434 Max. :36477307 Max. :2019-07-09
##
## neighbourhood_group_cleansed city latitude
## Length:1565 Length:1565 Min. :40.52
## Class :character Class :character 1st Qu.:40.68
## Mode :character Mode :character Median :40.73
## Mean :40.73
## 3rd Qu.:40.76
## Max. :40.81
##
## longitude property_type room_type accommodates
## Min. :-74.21 Length:1565 Length:1565 Min. : 1.000
## 1st Qu.:-74.00 Class :character Class :character 1st Qu.: 4.000
## Median :-73.99 Mode :character Mode :character Median : 4.000
## Mean :-73.98 Mean : 4.458
## 3rd Qu.:-73.97 3rd Qu.: 5.000
## Max. :-73.76 Max. :16.000
##
## bathrooms bedrooms beds square_feet cost
## Min. :0.000 Min. :2 Min. :0.000 Min. : 0.0 Min. : 50.0
## 1st Qu.:1.000 1st Qu.:2 1st Qu.:2.000 1st Qu.: 650.0 1st Qu.: 165.0
## Median :1.000 Median :2 Median :2.000 Median :1000.0 Median : 228.0
## Mean :1.299 Mean :2 Mean :2.381 Mean : 902.3 Mean : 284.6
## 3rd Qu.:2.000 3rd Qu.:2 3rd Qu.:3.000 3rd Qu.:1125.0 3rd Qu.: 320.0
## Max. :3.500 Max. :2 Max. :6.000 Max. :1600.0 Max. :4000.0
## NA's :3 NA's :1538
## security_deposit cleaning_fee availability_30 availability_60
## Length:1565 Length:1565 Min. : 0.000 Min. : 0.00
## Class :character Class :character 1st Qu.: 0.000 1st Qu.: 0.00
## Mode :character Mode :character Median : 0.000 Median : 7.00
## Mean : 5.894 Mean :15.87
## 3rd Qu.: 8.000 3rd Qu.:27.00
## Max. :30.000 Max. :60.00
##
## availability_90 availability_365 number_of_reviews review_scores_rating
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 20.00
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 1.00 1st Qu.: 92.00
## Median :12.00 Median : 59.0 Median : 4.00 Median : 96.00
## Mean :25.79 Mean :120.7 Mean : 19.78 Mean : 94.15
## 3rd Qu.:49.00 3rd Qu.:247.0 3rd Qu.: 17.00 3rd Qu.:100.00
## Max. :90.00 Max. :365.0 Max. :403.00 Max. :100.00
## NA's :387
## year revenue
## Length:1565 Min. : 370485
## Class :character 1st Qu.:1487596
## Mode :character Median :1770053
## Mean :1896679
## 3rd Qu.:2208878
## Max. :3320756
##
| zipcode | id | last_scraped | neighbourhood_group_cleansed | city | latitude | longitude | property_type | room_type | accommodates | bathrooms | bedrooms | beds | square_feet | cost | security_deposit | cleaning_fee | availability_30 | availability_60 | availability_90 | availability_365 | number_of_reviews | review_scores_rating | year | revenue |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10003 | 9905142 | 2019-07-08 | Manhattan | New York | 40.73036 | -73.98673 | Apartment | Entire home/apt | 2 | 1.5 | 2 | 2 | NA | 250 | $200.00 | $95.00 | 0 | 13 | 13 | 13 | 20 | 93 | 2019 | 2208878 |
| 10003 | 14759888 | 2019-07-08 | Manhattan | New York | 40.73029 | -73.98725 | House | Private room | 2 | 1.0 | 2 | 2 | NA | 95 | NA | $25.00 | 0 | 0 | 0 | 0 | 9 | 91 | 2019 | 2208878 |
| 10003 | 30014439 | 2019-07-08 | Manhattan | New York | 40.73015 | -73.98447 | Apartment | Entire home/apt | 4 | 1.0 | 2 | 3 | NA | 350 | $500.00 | $80.00 | 5 | 5 | 5 | 5 | 3 | 80 | 2019 | 2208878 |
| 10003 | 34642119 | 2019-07-08 | Manhattan | New York | 40.72877 | -73.98848 | Apartment | Entire home/apt | 5 | 1.0 | 2 | 2 | NA | 200 | $150.00 | $100.00 | 2 | 9 | 23 | 69 | 13 | 92 | 2019 | 2208878 |
| 10003 | 20028846 | 2019-07-08 | Manhattan | New York | 40.73726 | -73.99280 | Apartment | Entire home/apt | 5 | 1.0 | 2 | 2 | NA | 230 | $0.00 | $60.00 | 5 | 5 | 5 | 5 | 5 | 100 | 2019 | 2208878 |
| 10003 | 21783251 | 2019-07-08 | Manhattan | New York | 40.72909 | -73.99125 | Apartment | Entire home/apt | 4 | 1.0 | 2 | 2 | NA | 200 | $175.00 | $50.00 | 3 | 7 | 7 | 194 | 81 | 95 | 2019 | 2208878 |
missingvariables<-gg_miss_var(airbnb, show_pct = TRUE) + labs(x = "Variables", y = "Percentage of missing values")
ggplotly(missingvariables)Most of the property in our analysis will deal with Apartment. It does makes sense though looking at properties in New York City, the apartment count would be a lot larger than other house types.
countbyneighbourhood<-ggplot(airbnb,aes(fct_infreq(neighbourhood_group_cleansed)))+geom_bar(fill="#339999")+labs(x = "Neighbourhood", y = "Number of Properties") +theme(legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(countbyneighbourhood)by_neighbourhood <- airbnb %>% group_by(neighbourhood_group_cleansed) %>%summarize(mean_price_x = mean(cost)) %>% ungroup()
by_neighbourhoodplot<-ggplot(by_neighbourhood ,aes(x=neighbourhood_group_cleansed,y=mean_price_x))+geom_col(fill="#339999") + labs(x = "Neighbourhood", y = "Price per night of Properties")+scale_y_continuous(labels = scales::comma)+theme(legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(by_neighbourhoodplot)numberbyZipcode<-ggplot(airbnb,aes(fct_infreq(as.factor(zipcode))))+geom_bar(fill="#339999")+labs(x = "zipcodes", y = "Number of Properties")+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(numberbyZipcode)Some zipcodes have just 1, 2 or 3 properties, which is a small number for analysis. So these zipcodes would not be conisdered for the analysis. Dropping the zipcodes with less than 10 properties
Plotting the number of properties by their type. We see that mostly the property in our datset are Apartments.
Checking the price distribution of properties according to property types
pricebyproperty<-ggplot(airbnb, aes(x=property_type, y=cost, fill = as.factor(zipcode),color=as.factor(zipcode))) + scale_y_continuous(labels = scales::comma) + geom_point(aes(size=cost)) +
geom_smooth(method="loess", se=F) + labs( x="Property type", y="Price per night") + theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(pricebyproperty)by_neighbourhood_zip_price <- airbnb %>% group_by(neighbourhood_group_cleansed,zipcode) %>%summarize(mean_price_x = mean(cost)) %>% ungroup()
ggplot(airbnb ,aes(x=as.factor(zipcode),y=cost,fill = as.factor(neighbourhood_group_cleansed)))+geom_boxplot() + labs(x = "Zipcodes", y = "Price Per Night") + scale_y_continuous(limits = quantile(airbnb$cost, c(0, 0.99))) +theme(axis.text.x = element_text(angle = 90, hjust = 1))+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))It is noticeable that there are a lot of outliers in the price per night.Specially the zipcodes 10013 and 10003 have the costilest staying prices.
zipcodes_price <- airbnb %>% group_by(zipcode) %>%summarize(mean_price_x = mean(cost)) %>% ungroup()
byzipcodes_price<-ggplot(zipcodes_price %>% arrange(desc(mean_price_x)) %>% mutate(zipcode=factor(zipcode, levels=zipcode)),aes(x=as.factor(zipcode),y=mean_price_x))+geom_col(fill="#339999") + labs(x = "Zipcodes", y = "Mean price per night")+scale_y_continuous(labels = scales::comma)+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(byzipcodes_price)This column that has the count of reviews on a property is already present in our dataset.
zipcodes_review <- airbnb %>% group_by(zipcode) %>%summarize(average_review_count = mean(number_of_reviews)) %>% ungroup()
byzipcodes_review<-ggplot(zipcodes_review %>% arrange(desc(average_review_count)) %>% mutate(zipcode=factor(zipcode, levels=zipcode)) ,aes(x=as.factor(zipcode),y=average_review_count))+geom_col(fill="#339999") + labs(x = "Zipcodes", y = "Average number of reviews")+scale_y_continuous(labels = scales::comma)+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(byzipcodes_review)Availabiity of a property should be balanced. Less availability means the property is in high demand and is good for the property owner but higher availability means the property is not being booked and is a negative sign. Higher availability or lower occupancy can be attributed to many reasons, such as 1) High prices 2) Lesser/No tourist attractions 3) High crime rates
We have 30, 60, 90 and 365 days availability. All of these can be used for analysis. It’s important to check how these variables corelate to each other.
corCols <- airbnb[,c("availability_30","availability_60","availability_90","availability_365","number_of_reviews","cost","revenue")]
res<-cor(corCols)
corrplot(res, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45)For the purpose of this study, yearly avaialbility (availability_365) is a better choice as it shows the bigger picture and being highly correlated with other variables, it will complement our analysis.
Occupancy is the opposite of availability. Higher the occupancy the better it is for the property owners and the investors. Higher occupancy in properties is a good sign and properties in these areas will bring in more profits.
Calculating the occupancy rate explanation: Yearly occupancy rate is calculated by subtracting the number of available days by 365 and dividing it by 365. e.g. A property available for 100 days, means it is occupied for 265 days. so the occupancy percentage is 265*100/365= 72.6%.
Properites with 0% occupancy rates have been assigned extremely small values to avoid interference in calculating annual return.
airbnb<-airbnb %>% group_by(zipcode) %>%
mutate(occupancy_rt_30=(30-ifelse(availability_30==30,1,availability_30))*100/30,
occupancy_rt_60=(60-ifelse(availability_60==60,1,availability_60))*100/60, occupancy_rt_90=(90-ifelse(availability_90==90,1,availability_90))*100/90, occupancy_rt_year=(365-ifelse(availability_365==365,1,availability_365))*100/365) %>%
ungroup()OccupancyPrice <- airbnb %>% group_by(zipcode) %>% summarise(mean_price=mean(cost),mean_occupancy=mean(occupancy_rt_year))
byOccupancyPrice<-ggplot(OccupancyPrice,aes(x=mean_occupancy, y=mean_price)) + geom_point(aes(col=as.factor(zipcode), size=mean_price)) +labs(x="Occupancy", y="Price per night", shape="Price", colour= "Neighbourhood")+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(byOccupancyPrice)The highly occupied zipcodes are 11217, 11201, 10021, 11231, 10014 and 10025.
zipcodes_occ <- airbnb %>% group_by(zipcode) %>%summarize(mean_occupancy_rate = mean(occupancy_rt_year)) %>% ungroup()
byzipcodes_occ<-ggplot(zipcodes_occ %>% arrange(desc(mean_occupancy_rate)) %>% mutate(zipcode=factor(zipcode, levels=zipcode)),aes(x=as.factor(zipcode),y=mean_occupancy_rate))+geom_col(fill="#339999") + labs(x = "Zipcodes", y = "Occupancy percentage")+scale_y_continuous(labels = scales::comma)+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(byzipcodes_occ)Zipcodes 11217,11201,11434, 10021 and 11231 have the highest yearly occupancy percentages.
Rent ratio is the measure of price of the property divided by the gross annual rent collected. Higher the rent ratio the better it is. A higher rent ratio suggests 2 things. 1) Either the property was bought at a cheaper price. 2) Or, the rent of the property is higher In either of these cases the investor is at profit. It is an important factor to be kept in mind while buying a property.
zipcodes_rent_ratio <- airbnb %>% group_by(zipcode) %>%summarize(mean_rent_ratio = round(mean(rent_ratio)),0) %>% ungroup()
by_zipcodes_rent_ratio <-ggplot(zipcodes_rent_ratio %>% arrange(desc(mean_rent_ratio)) %>% mutate(zipcode=factor(zipcode, levels=zipcode)),aes(x=as.factor(zipcode),y=mean_rent_ratio))+geom_col(fill="#339999") + labs(x = "Zipcodes", y = "rent ratio")+scale_y_continuous(labels = scales::comma)+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(by_zipcodes_rent_ratio)Annual Return is the revenue generated from the property on a yearly basis. For the purpose of this analysis, the annual return is calculated by multiplying the price per night, yearly occupancy rate and 365 days. (price/night x yearly_occupancy x 365). Higher the annual return, higher is the profit.
zipcodes_return <- airbnb %>% group_by(zipcode) %>%summarize(mean_return = mean(annual_return)) %>% ungroup()
by_zipcodes_return<-ggplot(zipcodes_return %>% arrange(desc(mean_return)) %>% mutate(zipcode=factor(zipcode, levels=zipcode)),aes(x=as.factor(zipcode),y=mean_return))+geom_col(fill="#339999") + labs(x = "Zipcodes", y = "Average annual return / Return on investment")+scale_y_continuous(labels = scales::comma)+ theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(by_zipcodes_return)Break-even time represents the amount of time it takes for an investment to make back its original cost. It is caluclated here by dividing the original cost of the proerty by annual rent generated by the property. But since the rent for the property will not be same every year, an increase of 10% in the rent genrated yearly is considered.
Explanation: A property that was bought for 100,000 and generates an annual return(in 2019) of $20,000. The annual return will increase by 10% every year i.e in 2020 it will be 22,000 and 24,200 in 2021. Basically we multiply the price of current year by 1.1 to calculate the annual return next year.
The break even time for this property will be the year when total annual return collected will be equal to the original cost of property. i.e (100000/20000+22000+24200+26620+29282) ~ approx 4.5 years. So lesser is the break even time, the better it is for the investor. After the break event time, most of the revenue will be a profit for the investor.
get_break_even_time <- function(df)
{
break_even_time<-c()
for (i in 1:nrow(df))
{
year<-0
a_return<-airbnb$annual_return[i]
property_price<-airbnb$revenue[i]
while (property_price>a_return)
{
year=year+1
a_return=a_return*1.1
}
break_even_time[i]<-year
}
df$break_even_time<-break_even_time
return(df)
}
airbnb<-get_break_even_time(airbnb)Breakeventime <- airbnb %>% group_by(zipcode) %>% summarise(mean_break_even_time=mean(break_even_time))
byBreakeventime<-ggplot(Breakeventime %>% arrange(mean_break_even_time) %>%
mutate(zipcode=factor(zipcode, levels=zipcode)),aes(x=as.factor(zipcode), y=mean_break_even_time)) + geom_bar(stat = 'Identity',fill="#339999") +labs(x="Zipcodes", y="Break even time in years", shape="Price", colour= "Neighbourhood")+theme(axis.text.x = element_text(angle = 90, hjust = 1),legend.title = element_blank(),panel.grid.major = element_blank(),panel.grid.minor = element_blank(),panel.background = element_blank(),axis.line = element_line(colour = "black"))
ggplotly(byBreakeventime)Final data set with added columns
| zipcode | id | last_scraped | neighbourhood_group_cleansed | city | latitude | longitude | property_type | room_type | accommodates | bathrooms | bedrooms | beds | square_feet | cost | security_deposit | cleaning_fee | availability_30 | availability_60 | availability_90 | availability_365 | number_of_reviews | review_scores_rating | year | revenue | occupancy_rt_30 | occupancy_rt_60 | occupancy_rt_90 | occupancy_rt_year | rent_ratio | annual_return | break_even_time |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10003 | 9905142 | 2019-07-08 | Manhattan | New York | 40.73036 | -73.98673 | Apartment | Entire home/apt | 2 | 1.5 | 2 | 2 | NA | 250 | $200.00 | $95.00 | 0 | 13 | 13 | 13 | 20 | 93 | 2019 | 2208878 | 100.00000 | 78.33333 | 85.55556 | 96.43836 | 24.54309 | 88000 | 34 |
| 10003 | 14759888 | 2019-07-08 | Manhattan | New York | 40.73029 | -73.98725 | House | Private room | 2 | 1.0 | 2 | 2 | NA | 95 | NA | $25.00 | 0 | 0 | 0 | 0 | 9 | 91 | 2019 | 2208878 | 100.00000 | 100.00000 | 100.00000 | 100.00000 | 64.58708 | 34675 | 44 |
| 10003 | 30014439 | 2019-07-08 | Manhattan | New York | 40.73015 | -73.98447 | Apartment | Entire home/apt | 4 | 1.0 | 2 | 3 | NA | 350 | $500.00 | $80.00 | 5 | 5 | 5 | 5 | 3 | 80 | 2019 | 2208878 | 83.33333 | 91.66667 | 94.44444 | 98.63014 | 17.53078 | 126000 | 31 |
| 10003 | 34642119 | 2019-07-08 | Manhattan | New York | 40.72877 | -73.98848 | Apartment | Entire home/apt | 5 | 1.0 | 2 | 2 | NA | 200 | $150.00 | $100.00 | 2 | 9 | 23 | 69 | 13 | 92 | 2019 | 2208878 | 93.33333 | 85.00000 | 74.44444 | 81.09589 | 30.67886 | 59200 | 38 |
| 10003 | 20028846 | 2019-07-08 | Manhattan | New York | 40.73726 | -73.99280 | Apartment | Entire home/apt | 5 | 1.0 | 2 | 2 | NA | 230 | $0.00 | $60.00 | 5 | 5 | 5 | 5 | 5 | 100 | 2019 | 2208878 | 83.33333 | 91.66667 | 94.44444 | 98.63014 | 26.67727 | 82800 | 35 |
| 10003 | 21783251 | 2019-07-08 | Manhattan | New York | 40.72909 | -73.99125 | Apartment | Entire home/apt | 4 | 1.0 | 2 | 2 | NA | 200 | $175.00 | $50.00 | 3 | 7 | 7 | 194 | 81 | 95 | 2019 | 2208878 | 90.00000 | 88.33333 | 92.22222 | 46.84932 | 30.67886 | 34200 | 44 |
Zipcodes 11434,10305,11201,11217 and 10025 have the lowest break even times.
The zipcodes were analysed on the key metrics as described above and the top 5 zipcodes suggested by each of these metrics are as follows:
| Reviews | Occupancy | Rent Ratio | Annual Return | Break even time |
|---|---|---|---|---|
| 11434 | 11217 | 10013 | 10013 | 11434 |
| 11215 | 11201 | 10014 | 10011 | 10305 |
| 11231 | 11434 | 10021 | 10014 | 11201 |
| 10305 | 10021 | 10128 | 10003 | 11217 |
| 10003 | 11231 | 10003 | 10036 | 10025 |
We’ll take the zipcodes that managed to make it to 2 or more metrics.
| Top Zipcodes | |
|---|---|
| 11434 | |
| 10003 | |
| 10013 | |
| 11201 | |
| 11217 & 10014 |
Preference has been given to those zipcodes which appear in 3 metrics. Rent ratio and Annual return has been preffered over occupancy percentage. So above mentioned are the possible suggested zipcodes for investments. However, it is necessary to discuss with the firm as to what types of risk they are willing to take.