Since 2008, guests and hosts have used AirBnB to expand on traveling and lodging possibilities that present a more unique, personalized way of experiencing the world. This data set describes the listing activity and metrics in New York City, NY from 2014-2019.
(Kaggle.com)
This data set includes the necessary information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions.
Using this data set, I’ve used visualizations to show the following:
Formulae and assumptions are stated under each tab for the chart displayed.
I am able to make the following assumptions:
DF <- read.csv("C:/Users/jessica.wade/Desktop/DATA SCIENCE - Loyola/DS_736 Data Visualization/R/R_datafiles/AB_NYC_2019.csv")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(scales)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(leaflet)
colnames(DF)
## [1] "id" "name"
## [3] "host_id" "host_name"
## [5] "neighbourhood_group" "neighbourhood"
## [7] "latitude" "longitude"
## [9] "room_type" "price"
## [11] "minimum_nights" "number_of_reviews"
## [13] "last_review" "reviews_per_month"
## [15] "calculated_host_listings_count" "availability_365"
head(DF)
## id name host_id host_name
## 1 2539 Clean & quiet apt home by the park 2787 John
## 2 2595 Skylit Midtown Castle 2845 Jennifer
## 3 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 Elisabeth
## 4 3831 Cozy Entire Floor of Brownstone 4869 LisaRoxanne
## 5 5022 Entire Apt: Spacious Studio/Loft by central park 7192 Laura
## 6 5099 Large Cozy 1 BR Apartment In Midtown East 7322 Chris
## neighbourhood_group neighbourhood latitude longitude room_type price
## 1 Brooklyn Kensington 40.64749 -73.97237 Private room 149
## 2 Manhattan Midtown 40.75362 -73.98377 Entire home/apt 225
## 3 Manhattan Harlem 40.80902 -73.94190 Private room 150
## 4 Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt 89
## 5 Manhattan East Harlem 40.79851 -73.94399 Entire home/apt 80
## 6 Manhattan Murray Hill 40.74767 -73.97500 Entire home/apt 200
## minimum_nights number_of_reviews last_review reviews_per_month
## 1 1 9 2018-10-19 0.21
## 2 1 45 2019-05-21 0.38
## 3 3 0 NA
## 4 1 270 2019-07-05 4.64
## 5 10 9 2018-11-19 0.10
## 6 3 74 2019-06-22 0.59
## calculated_host_listings_count availability_365
## 1 6 365
## 2 2 355
## 3 1 365
## 4 1 194
## 5 1 0
## 6 1 129
tail(DF)
## id name host_id
## 48890 36484363 QUIT PRIVATE HOUSE 107716952
## 48891 36484665 Charming one bedroom - newly renovated rowhouse 8232441
## 48892 36485057 Affordable room in Bushwick/East Williamsburg 6570630
## 48893 36485431 Sunny Studio at Historical Neighborhood 23492952
## 48894 36485609 43rd St. Time Square-cozy single bed 30985759
## 48895 36487245 Trendy duplex in the very heart of Hell's Kitchen 68119814
## host_name neighbourhood_group neighbourhood latitude longitude
## 48890 Michael Queens Jamaica 40.69137 -73.80844
## 48891 Sabrina Brooklyn Bedford-Stuyvesant 40.67853 -73.94995
## 48892 Marisol Brooklyn Bushwick 40.70184 -73.93317
## 48893 Ilgar & Aysel Manhattan Harlem 40.81475 -73.94867
## 48894 Taz Manhattan Hell's Kitchen 40.75751 -73.99112
## 48895 Christophe Manhattan Hell's Kitchen 40.76404 -73.98933
## room_type price minimum_nights number_of_reviews last_review
## 48890 Private room 65 1 0
## 48891 Private room 70 2 0
## 48892 Private room 40 4 0
## 48893 Entire home/apt 115 10 0
## 48894 Shared room 55 1 0
## 48895 Private room 90 7 0
## reviews_per_month calculated_host_listings_count availability_365
## 48890 NA 2 163
## 48891 NA 2 9
## 48892 NA 2 36
## 48893 NA 1 27
## 48894 NA 6 2
## 48895 NA 1 23
dim(DF)
## [1] 48895 16
str(DF)
## 'data.frame': 48895 obs. of 16 variables:
## $ id : int 2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
## $ name : chr "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
## $ host_id : int 2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
## $ host_name : chr "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
## $ neighbourhood_group : chr "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
## $ neighbourhood : chr "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
## $ latitude : num 40.6 40.8 40.8 40.7 40.8 ...
## $ longitude : num -74 -74 -73.9 -74 -73.9 ...
## $ room_type : chr "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
## $ price : int 149 225 150 89 80 200 60 79 79 150 ...
## $ minimum_nights : int 1 1 3 1 10 3 45 2 2 1 ...
## $ number_of_reviews : int 9 45 0 270 9 74 49 430 118 160 ...
## $ last_review : chr "2018-10-19" "2019-05-21" "" "2019-07-05" ...
## $ reviews_per_month : num 0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
## $ calculated_host_listings_count: int 6 2 1 1 1 1 1 1 1 4 ...
## $ availability_365 : int 365 355 365 194 0 129 0 220 0 188 ...
summary(DF)
## id name host_id host_name
## Min. : 2539 Length:48895 Min. : 2438 Length:48895
## 1st Qu.: 9471945 Class :character 1st Qu.: 7822033 Class :character
## Median :19677284 Mode :character Median : 30793816 Mode :character
## Mean :19017143 Mean : 67620011
## 3rd Qu.:29152178 3rd Qu.:107434423
## Max. :36487245 Max. :274321313
##
## neighbourhood_group neighbourhood latitude longitude
## Length:48895 Length:48895 Min. :40.50 Min. :-74.24
## Class :character Class :character 1st Qu.:40.69 1st Qu.:-73.98
## Mode :character Mode :character Median :40.72 Median :-73.96
## Mean :40.73 Mean :-73.95
## 3rd Qu.:40.76 3rd Qu.:-73.94
## Max. :40.91 Max. :-73.71
##
## room_type price minimum_nights number_of_reviews
## Length:48895 Min. : 0.0 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.: 69.0 1st Qu.: 1.00 1st Qu.: 1.00
## Mode :character Median : 106.0 Median : 3.00 Median : 5.00
## Mean : 152.7 Mean : 7.03 Mean : 23.27
## 3rd Qu.: 175.0 3rd Qu.: 5.00 3rd Qu.: 24.00
## Max. :10000.0 Max. :1250.00 Max. :629.00
##
## last_review reviews_per_month calculated_host_listings_count
## Length:48895 Min. : 0.010 Min. : 1.000
## Class :character 1st Qu.: 0.190 1st Qu.: 1.000
## Mode :character Median : 0.720 Median : 1.000
## Mean : 1.373 Mean : 7.144
## 3rd Qu.: 2.020 3rd Qu.: 2.000
## Max. :58.500 Max. :327.000
## NA's :10052
## availability_365
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 45.0
## Mean :112.8
## 3rd Qu.:227.0
## Max. :365.0
##
Which Borough has the highest potential booking value?
This chart displays which borough has the highest potential booking value. Each borough is listed on the x-axis and potential booking value is on the y-axis.
Manhattan has the highest potential booking value with potential bookings over 4 million. Although it is unclear if the price listed in the data set for each property is the price per night or based on the minimum nights advertised, we have manipulated the data to get the total sum of price per borough as well as the the length of the price.
Using the derived data, we can assume that this is the potential booking value advertised for the property.
agg_DF <- DF %>%
select(neighbourhood_group, price) %>%
group_by(neighbourhood_group) %>%
dplyr::summarise(tot=sum(price), n=length(price), .groups='keep') %>%
data.frame()
max_y_axis <- round_any(max(agg_DF$tot), 6000000, ceiling)
ggplot(agg_DF, aes(x=reorder(neighbourhood_group,-tot), y=tot)) +
geom_bar(stat = "identity") +
labs(title = "NYC AirBnB: Potential Booking Value Per Borough",
x = "The 5 Boroughs", y="Potential Booking Value") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Paired") +
scale_y_continuous(labels = comma, limits = c(0, max_y_axis)) +
geom_text(data=agg_DF,
aes(x=neighbourhood_group, y=tot,
label = scales::comma(tot), vjust=-0.7))
What is the Potential Booking Value for each room type per Borough?
The stacked bar chart displays total potential booking value per room type per borough. Manipulating the data to derive the total sum of price for each room type, per borough, we can assume which room type has the highest potential booking value for the borough.
We have assumed “Entire Room/Apt” room type has the highest potential booking value overall and for the Manhattan Borough.
agg_DF1 <- DF %>%
select(neighbourhood_group, price, room_type) %>%
group_by(neighbourhood_group, room_type) %>%
dplyr::summarise(tot=sum(price), n=length(price), .groups='keep') %>%
data.frame()
ggplot(agg_DF1, aes(x=reorder(neighbourhood_group,-tot), y=tot,
fill = room_type)) +
geom_bar(stat = "identity") +
labs(title = "NYC AirBnB: Projected Cost of Room Type Per Borough",
x = "The 5 Boroughs", y="Room Type Totals") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_brewer(palette = "Paired",
guide = guide_legend(title = "Room Type")) +
scale_y_continuous(labels = comma) +
geom_text(data=agg_DF1,
aes(x=neighbourhood_group, y=tot,
label=ifelse(tot>100000,scales::comma(tot), "")),
position = position_stack(vjust = 0.5))
Which Borough Received the Most Number of Reviews Per Month from 2014 to 2019 (within 5 years)?
From the Trellis chart displayed, the number of reviews for each month of the year and borough is shown from 2014 to 2019. The “Review Count” is on the y-axis and the "Months are on the x-axis, for 2014-2019.
By manipulating the data to omit “NAs” (No Applicable) and grouping the data to filter by borough, year, months, and last review, we can assume that 2019 was a very active year for NYC AirBnB. June AirBnB rentals received the most number of reviews.
noNA <- na.omit(DF)
months_noNA <- noNA %>%
select(neighbourhood_group, reviews_per_month, last_review) %>%
mutate(months = months(ymd(last_review), abbreviate = TRUE),
year = year(ymd(last_review))) %>%
group_by(neighbourhood_group, year, months) %>%
dplyr::summarise(n = length(reviews_per_month), .groups = 'keep') %>%
data.frame()
months_noNA <- months_noNA[months_noNA$year %in% c("2014", "2015", "2016",
"2017", "2018", "2019"),]
months_noNA$year <- factor(months_noNA$year)
mymonths_noNA <- c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul',
'Aug', 'Sep', 'Oct', 'Nov', 'Dec')
months_ordernoNA <- factor(months_noNA$months, level = mymonths_noNA)
ggplot(months_noNA, aes(x = months_ordernoNA, y = n,
fill = neighbourhood_group)) +
geom_bar(stat = "identity") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5), text = element_text(size = 10),
axis.text.x = element_text(angle = 90, hjust=1)) +
scale_y_continuous(labels = comma) +
labs(title = "Highest Reviews Per Month Per Borough", x = "Months",
y = "Review Count", fill = "Year") +
scale_fill_brewer(palette = "Paired") +
facet_wrap(~year, ncol = 3, nrow = 3)
What is the Percent of Each Room Type Advertised for AirBnB NYC, by Borough?
This Trellis chart displays the cumulative percent of room types advertised by borough. By manipulating the data to include the percent of room type length (how many) in a pie chart, we can assume that “Shared Rooms” are not advertised much for NYC AirBnBs.
room_p <- DF %>%
group_by(room_type, neighbourhood_group) %>%
dplyr::summarise(n=length(room_type), .groups = 'keep') %>%
data.frame() %>%
mutate(percent_n = round(100*n/sum(n),1)) %>%
data.frame()
ggplot(data = room_p, aes(x = "", y = n, fill=room_type)) +
geom_bar(stat = "identity", position = "fill") +
coord_polar(theta = "y", start = 0) +
labs(fill = "Room Type", x = NULL, y = NULL, title = "Pie Chart:
Percent of Room Type Advertised by Borough Throughout NYC") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_blank(),
axis.ticks = element_blank(), panel.grid = element_blank()) +
scale_fill_brewer(palette = "Reds") +
geom_text(aes(x = 1.7,label = paste0(percent_n, "%")),
size = 4,
position = position_fill(vjust = 0.5)) +
facet_wrap(~neighbourhood_group, nrow = 2, ncol = 3)
Map of the Top 6 Expensive AirBnBs in NYC Advertised (Expense is a measurement of the Potential Booking Value).
Upon hovering over a pin, select it to view the borough and potential booking value of the AirBnB property. Data was manipulated to display only the top 6 expensive properties, based on it’s potential booking value.
long_lat <- DF %>%
select(latitude, longitude, price, neighbourhood_group) %>%
group_by(price, neighbourhood_group,latitude, longitude) %>%
dplyr::summarise(n = length(neighbourhood_group), .groups = 'keep') %>%
data.frame()
map <- tail(long_lat)
NYmap <- leaflet() %>%
addTiles %>%
addMarkers(lng = map$longitude, lat = map$latitude,
popup = paste("$", map$price),
label = map$neighbourhood_group)
NYmap
General takeaways from my output:
Airbnb listings are categorized into the following home types:
(https://www.airbnb.com/help/article/317/what-do-the-different-home-types-mean)