Introduction

Since 2008, guests and hosts have used AirBnB to expand on traveling and lodging possibilities that present a more unique, personalized way of experiencing the world. This data set describes the listing activity and metrics in New York City, NY from 2014-2019.

(Kaggle.com)

Dataset

This data set includes the necessary information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions.

Analyses Overview

Using this data set, I’ve used visualizations to show the following:

  1. Potential booking values for each borough,
  2. Potential booking values for each room type per borough,
  3. Borough that received the highest number of reviews per month from 2014 to 2019 (within 5 years),
  4. Percent of each room type advertised for AirBnB NYC by borough, and the
  5. Location of the top 6 expensive AirBnBs in NYC.

Formulae and assumptions are stated under each tab for the chart displayed.

Analysis and Findings

I am able to make the following assumptions:

  1. Manhattan is predicted to have to the highest potential booking value based on the advertised room types and minimum nights reserved.
  2. The “Entire Home/Apt” room type has the highest potential booking value for NYC.
  3. Bronx AirBnB listings received the most number of reviews for the month of June in 2019.

Tab 1

DF <- read.csv("C:/Users/jessica.wade/Desktop/DATA SCIENCE - Loyola/DS_736 Data Visualization/R/R_datafiles/AB_NYC_2019.csv")

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(scales)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
library(leaflet)

colnames(DF)
##  [1] "id"                             "name"                          
##  [3] "host_id"                        "host_name"                     
##  [5] "neighbourhood_group"            "neighbourhood"                 
##  [7] "latitude"                       "longitude"                     
##  [9] "room_type"                      "price"                         
## [11] "minimum_nights"                 "number_of_reviews"             
## [13] "last_review"                    "reviews_per_month"             
## [15] "calculated_host_listings_count" "availability_365"
head(DF)
##     id                                             name host_id   host_name
## 1 2539               Clean & quiet apt home by the park    2787        John
## 2 2595                            Skylit Midtown Castle    2845    Jennifer
## 3 3647              THE VILLAGE OF HARLEM....NEW YORK !    4632   Elisabeth
## 4 3831                  Cozy Entire Floor of Brownstone    4869 LisaRoxanne
## 5 5022 Entire Apt: Spacious Studio/Loft by central park    7192       Laura
## 6 5099        Large Cozy 1 BR Apartment In Midtown East    7322       Chris
##   neighbourhood_group neighbourhood latitude longitude       room_type price
## 1            Brooklyn    Kensington 40.64749 -73.97237    Private room   149
## 2           Manhattan       Midtown 40.75362 -73.98377 Entire home/apt   225
## 3           Manhattan        Harlem 40.80902 -73.94190    Private room   150
## 4            Brooklyn  Clinton Hill 40.68514 -73.95976 Entire home/apt    89
## 5           Manhattan   East Harlem 40.79851 -73.94399 Entire home/apt    80
## 6           Manhattan   Murray Hill 40.74767 -73.97500 Entire home/apt   200
##   minimum_nights number_of_reviews last_review reviews_per_month
## 1              1                 9  2018-10-19              0.21
## 2              1                45  2019-05-21              0.38
## 3              3                 0                            NA
## 4              1               270  2019-07-05              4.64
## 5             10                 9  2018-11-19              0.10
## 6              3                74  2019-06-22              0.59
##   calculated_host_listings_count availability_365
## 1                              6              365
## 2                              2              355
## 3                              1              365
## 4                              1              194
## 5                              1                0
## 6                              1              129
tail(DF)
##             id                                              name   host_id
## 48890 36484363                                QUIT PRIVATE HOUSE 107716952
## 48891 36484665   Charming one bedroom - newly renovated rowhouse   8232441
## 48892 36485057     Affordable room in Bushwick/East Williamsburg   6570630
## 48893 36485431           Sunny Studio at Historical Neighborhood  23492952
## 48894 36485609              43rd St. Time Square-cozy single bed  30985759
## 48895 36487245 Trendy duplex in the very heart of Hell's Kitchen  68119814
##           host_name neighbourhood_group      neighbourhood latitude longitude
## 48890       Michael              Queens            Jamaica 40.69137 -73.80844
## 48891       Sabrina            Brooklyn Bedford-Stuyvesant 40.67853 -73.94995
## 48892       Marisol            Brooklyn           Bushwick 40.70184 -73.93317
## 48893 Ilgar & Aysel           Manhattan             Harlem 40.81475 -73.94867
## 48894           Taz           Manhattan     Hell's Kitchen 40.75751 -73.99112
## 48895    Christophe           Manhattan     Hell's Kitchen 40.76404 -73.98933
##             room_type price minimum_nights number_of_reviews last_review
## 48890    Private room    65              1                 0            
## 48891    Private room    70              2                 0            
## 48892    Private room    40              4                 0            
## 48893 Entire home/apt   115             10                 0            
## 48894     Shared room    55              1                 0            
## 48895    Private room    90              7                 0            
##       reviews_per_month calculated_host_listings_count availability_365
## 48890                NA                              2              163
## 48891                NA                              2                9
## 48892                NA                              2               36
## 48893                NA                              1               27
## 48894                NA                              6                2
## 48895                NA                              1               23
dim(DF)
## [1] 48895    16
str(DF)
## 'data.frame':    48895 obs. of  16 variables:
##  $ id                            : int  2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
##  $ name                          : chr  "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
##  $ host_id                       : int  2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
##  $ host_name                     : chr  "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
##  $ neighbourhood_group           : chr  "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
##  $ neighbourhood                 : chr  "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
##  $ latitude                      : num  40.6 40.8 40.8 40.7 40.8 ...
##  $ longitude                     : num  -74 -74 -73.9 -74 -73.9 ...
##  $ room_type                     : chr  "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
##  $ price                         : int  149 225 150 89 80 200 60 79 79 150 ...
##  $ minimum_nights                : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ number_of_reviews             : int  9 45 0 270 9 74 49 430 118 160 ...
##  $ last_review                   : chr  "2018-10-19" "2019-05-21" "" "2019-07-05" ...
##  $ reviews_per_month             : num  0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
##  $ calculated_host_listings_count: int  6 2 1 1 1 1 1 1 1 4 ...
##  $ availability_365              : int  365 355 365 194 0 129 0 220 0 188 ...
summary(DF)
##        id               name              host_id           host_name        
##  Min.   :    2539   Length:48895       Min.   :     2438   Length:48895      
##  1st Qu.: 9471945   Class :character   1st Qu.:  7822033   Class :character  
##  Median :19677284   Mode  :character   Median : 30793816   Mode  :character  
##  Mean   :19017143                      Mean   : 67620011                     
##  3rd Qu.:29152178                      3rd Qu.:107434423                     
##  Max.   :36487245                      Max.   :274321313                     
##                                                                              
##  neighbourhood_group neighbourhood         latitude       longitude     
##  Length:48895        Length:48895       Min.   :40.50   Min.   :-74.24  
##  Class :character    Class :character   1st Qu.:40.69   1st Qu.:-73.98  
##  Mode  :character    Mode  :character   Median :40.72   Median :-73.96  
##                                         Mean   :40.73   Mean   :-73.95  
##                                         3rd Qu.:40.76   3rd Qu.:-73.94  
##                                         Max.   :40.91   Max.   :-73.71  
##                                                                         
##   room_type             price         minimum_nights    number_of_reviews
##  Length:48895       Min.   :    0.0   Min.   :   1.00   Min.   :  0.00   
##  Class :character   1st Qu.:   69.0   1st Qu.:   1.00   1st Qu.:  1.00   
##  Mode  :character   Median :  106.0   Median :   3.00   Median :  5.00   
##                     Mean   :  152.7   Mean   :   7.03   Mean   : 23.27   
##                     3rd Qu.:  175.0   3rd Qu.:   5.00   3rd Qu.: 24.00   
##                     Max.   :10000.0   Max.   :1250.00   Max.   :629.00   
##                                                                          
##  last_review        reviews_per_month calculated_host_listings_count
##  Length:48895       Min.   : 0.010    Min.   :  1.000               
##  Class :character   1st Qu.: 0.190    1st Qu.:  1.000               
##  Mode  :character   Median : 0.720    Median :  1.000               
##                     Mean   : 1.373    Mean   :  7.144               
##                     3rd Qu.: 2.020    3rd Qu.:  2.000               
##                     Max.   :58.500    Max.   :327.000               
##                     NA's   :10052                                   
##  availability_365
##  Min.   :  0.0   
##  1st Qu.:  0.0   
##  Median : 45.0   
##  Mean   :112.8   
##  3rd Qu.:227.0   
##  Max.   :365.0   
## 

Tab 2

Which Borough has the highest potential booking value?

This chart displays which borough has the highest potential booking value. Each borough is listed on the x-axis and potential booking value is on the y-axis.

Manhattan has the highest potential booking value with potential bookings over 4 million. Although it is unclear if the price listed in the data set for each property is the price per night or based on the minimum nights advertised, we have manipulated the data to get the total sum of price per borough as well as the the length of the price.

Using the derived data, we can assume that this is the potential booking value advertised for the property.

agg_DF <- DF %>%
  select(neighbourhood_group, price) %>%
  group_by(neighbourhood_group) %>%
  dplyr::summarise(tot=sum(price), n=length(price), .groups='keep') %>%
  data.frame()


max_y_axis <- round_any(max(agg_DF$tot), 6000000, ceiling)


ggplot(agg_DF, aes(x=reorder(neighbourhood_group,-tot), y=tot)) + 
  geom_bar(stat = "identity") +
  labs(title = "NYC AirBnB: Potential Booking Value Per Borough", 
       x = "The 5 Boroughs", y="Potential Booking Value") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_brewer(palette = "Paired") + 
  scale_y_continuous(labels = comma, limits = c(0, max_y_axis)) +
  geom_text(data=agg_DF, 
            aes(x=neighbourhood_group, y=tot, 
                label = scales::comma(tot), vjust=-0.7))

Tab 3

What is the Potential Booking Value for each room type per Borough?

The stacked bar chart displays total potential booking value per room type per borough. Manipulating the data to derive the total sum of price for each room type, per borough, we can assume which room type has the highest potential booking value for the borough.

We have assumed “Entire Room/Apt” room type has the highest potential booking value overall and for the Manhattan Borough.

agg_DF1 <- DF %>%
  select(neighbourhood_group, price, room_type) %>%
  group_by(neighbourhood_group, room_type) %>%
  dplyr::summarise(tot=sum(price), n=length(price), .groups='keep') %>%
  data.frame()


ggplot(agg_DF1, aes(x=reorder(neighbourhood_group,-tot), y=tot, 
                   fill = room_type)) + 
  geom_bar(stat = "identity") +
  labs(title = "NYC AirBnB: Projected Cost of Room Type Per Borough", 
       x = "The 5 Boroughs", y="Room Type Totals") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) + 
  scale_fill_brewer(palette = "Paired", 
                    guide = guide_legend(title = "Room Type")) + 
  scale_y_continuous(labels = comma) +
  geom_text(data=agg_DF1, 
            aes(x=neighbourhood_group, y=tot, 
                label=ifelse(tot>100000,scales::comma(tot), "")),
                position = position_stack(vjust = 0.5))

Tab 4

Which Borough Received the Most Number of Reviews Per Month from 2014 to 2019 (within 5 years)?

From the Trellis chart displayed, the number of reviews for each month of the year and borough is shown from 2014 to 2019. The “Review Count” is on the y-axis and the "Months are on the x-axis, for 2014-2019.

By manipulating the data to omit “NAs” (No Applicable) and grouping the data to filter by borough, year, months, and last review, we can assume that 2019 was a very active year for NYC AirBnB. June AirBnB rentals received the most number of reviews.

noNA <- na.omit(DF)

months_noNA <- noNA %>%
  select(neighbourhood_group, reviews_per_month, last_review) %>% 
  mutate(months = months(ymd(last_review), abbreviate = TRUE), 
         year = year(ymd(last_review))) %>%
  group_by(neighbourhood_group, year, months) %>%
  dplyr::summarise(n = length(reviews_per_month), .groups = 'keep') %>%
  data.frame()

months_noNA <- months_noNA[months_noNA$year %in% c("2014", "2015", "2016", 
                                                   "2017", "2018", "2019"),]

months_noNA$year <- factor(months_noNA$year)


mymonths_noNA <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 
                   'Aug', 'Sep', 'Oct', 'Nov', 'Dec')

months_ordernoNA <- factor(months_noNA$months, level = mymonths_noNA)

ggplot(months_noNA, aes(x = months_ordernoNA, y = n, 
                       fill = neighbourhood_group)) + 
  geom_bar(stat = "identity") + 
  theme_light() + 
  theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 10),
        axis.text.x = element_text(angle = 90, hjust=1)) +
  scale_y_continuous(labels = comma) + 
  labs(title = "Highest Reviews Per Month Per Borough", x = "Months",
       y = "Review Count", fill = "Year") + 
  scale_fill_brewer(palette = "Paired") + 
  facet_wrap(~year, ncol = 3, nrow = 3)

Tab 5

What is the Percent of Each Room Type Advertised for AirBnB NYC, by Borough?

This Trellis chart displays the cumulative percent of room types advertised by borough. By manipulating the data to include the percent of room type length (how many) in a pie chart, we can assume that “Shared Rooms” are not advertised much for NYC AirBnBs.

room_p <- DF %>%
  group_by(room_type, neighbourhood_group) %>%
  dplyr::summarise(n=length(room_type), .groups = 'keep') %>%
  data.frame() %>%
  mutate(percent_n = round(100*n/sum(n),1)) %>%
  data.frame()


ggplot(data = room_p, aes(x = "", y = n, fill=room_type)) + 
  geom_bar(stat = "identity", position = "fill") + 
  coord_polar(theta = "y", start = 0) +
  labs(fill = "Room Type", x = NULL, y = NULL, title = "Pie Chart:
  Percent of Room Type Advertised by Borough Throughout NYC") + 
  theme_light() + 
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_blank(), 
        axis.ticks = element_blank(), panel.grid = element_blank()) +
  scale_fill_brewer(palette = "Reds") + 
  geom_text(aes(x = 1.7,label = paste0(percent_n, "%")),
                size = 4, 
                position = position_fill(vjust = 0.5)) +
  facet_wrap(~neighbourhood_group, nrow = 2, ncol = 3)

Tab 6

Map of the Top 6 Expensive AirBnBs in NYC Advertised (Expense is a measurement of the Potential Booking Value).

Upon hovering over a pin, select it to view the borough and potential booking value of the AirBnB property. Data was manipulated to display only the top 6 expensive properties, based on it’s potential booking value.

long_lat <- DF %>%
  select(latitude, longitude, price, neighbourhood_group) %>%
  group_by(price, neighbourhood_group,latitude, longitude) %>%
  dplyr::summarise(n = length(neighbourhood_group), .groups = 'keep') %>%
  data.frame()


map <- tail(long_lat)


NYmap <- leaflet() %>%
  addTiles %>%
  addMarkers(lng = map$longitude, lat = map$latitude,
             popup = paste("$", map$price), 
             label = map$neighbourhood_group)
NYmap  

Conclusions

General takeaways from my output:

  1. Manhattan AirBnB has the highest potential booking value compared to the other boroughs. We might be able to conclude that most tourists prefer to reserve an AirBnB in this borough.
  2. Brooklyn advertises “Private Rooms” at a higher percent than the other boroughs. We might be able to conclude this borough attracts smaller groups of people who need lodging.
  3. Tourists/people were attracted to NYC most in June 2019. This is based on the number of AirBnB reviews in 2019. We could assume that a significant event was going on in the city during that time, or people preferred to be in NYC during that time.
  4. “Shared Rooms” is hardly advertised in NYC. This could mean they are not in demand.

Footnotes

Airbnb listings are categorized into the following home types:

  1. Entire place: Guests have the whole place to themselves. This usually includes a bedroom, a bathroom, and a kitchen. Hosts should note in the description if they’ll be on the property (ex: “Host occupies first floor of the home”)
  2. Private room: Guests have their own private room for sleeping. Other areas could be shared.
  3. Shared room: Guests sleep in a bedroom or a common area that could be shared with others.

(https://www.airbnb.com/help/article/317/what-do-the-different-home-types-mean)