In this project, I will explore the Airbnb listings dataset for New York City in 2019. The data set, as obtained in AIRBNB NYC It contains a total of 48895 observations with 16 variables, including both quantitative and categorical variables. The data set was collected by Airbnb , and the data collection method involved is not mentioned. I think that it was likely that this data was scrapped from Airbnb.
##Purpose This data set offers insights into the pricing dynamics of Airbnb listings in New York City. By analyzing various factors such as neighborhood, room type, and minimum nights required, I would like to discover trends that affect pricing. I chose this topic because I would like to visit the city someday or even move to NYC.
### Loading ggplot2,dlpyr,leaflet,tidyr, and plotly.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.3.3
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.3
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
### Using radr to read the data set
NYCAB <- readr::read_csv("AB_NYC_2019.csv")
## Rows: 48895 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): name, host_name, neighbourhood_group, neighbourhood, room_type, la...
## dbl (10): id, host_id, latitude, longitude, price, minimum_nights, number_of...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NYCAB)
## # A tibble: 6 × 16
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 2539 Clean & qu… 2787 John Brooklyn Kensington 40.6
## 2 2595 Skylit Mid… 2845 Jennifer Manhattan Midtown 40.8
## 3 3647 THE VILLAG… 4632 Elisabeth Manhattan Harlem 40.8
## 4 3831 Cozy Entir… 4869 LisaRoxa… Brooklyn Clinton Hill 40.7
## 5 5022 Entire Apt… 7192 Laura Manhattan East Harlem 40.8
## 6 5099 Large Cozy… 7322 Chris Manhattan Murray Hill 40.7
## # ℹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <chr>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>
str(NYCAB)
## spc_tbl_ [48,895 × 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ id : num [1:48895] 2539 2595 3647 3831 5022 ...
## $ name : chr [1:48895] "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
## $ host_id : num [1:48895] 2787 2845 4632 4869 7192 ...
## $ host_name : chr [1:48895] "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
## $ neighbourhood_group : chr [1:48895] "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
## $ neighbourhood : chr [1:48895] "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
## $ latitude : num [1:48895] 40.6 40.8 40.8 40.7 40.8 ...
## $ longitude : num [1:48895] -74 -74 -73.9 -74 -73.9 ...
## $ room_type : chr [1:48895] "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
## $ price : num [1:48895] 149 225 150 89 80 200 60 79 79 150 ...
## $ minimum_nights : num [1:48895] 1 1 3 1 10 3 45 2 2 1 ...
## $ number_of_reviews : num [1:48895] 9 45 0 270 9 74 49 430 118 160 ...
## $ last_review : chr [1:48895] "10/19/2018" "5/21/2019" NA "7/5/2019" ...
## $ reviews_per_month : num [1:48895] 0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
## $ calculated_host_listings_count: num [1:48895] 6 2 1 1 1 1 1 1 1 4 ...
## $ availability_365 : num [1:48895] 365 355 365 194 0 129 0 220 0 188 ...
## - attr(*, "spec")=
## .. cols(
## .. id = col_double(),
## .. name = col_character(),
## .. host_id = col_double(),
## .. host_name = col_character(),
## .. neighbourhood_group = col_character(),
## .. neighbourhood = col_character(),
## .. latitude = col_double(),
## .. longitude = col_double(),
## .. room_type = col_character(),
## .. price = col_double(),
## .. minimum_nights = col_double(),
## .. number_of_reviews = col_double(),
## .. last_review = col_character(),
## .. reviews_per_month = col_double(),
## .. calculated_host_listings_count = col_double(),
## .. availability_365 = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(NYCAB)
## id name host_id host_name
## Min. : 2539 Length:48895 Min. : 2438 Length:48895
## 1st Qu.: 9471945 Class :character 1st Qu.: 7822033 Class :character
## Median :19677284 Mode :character Median : 30793816 Mode :character
## Mean :19017143 Mean : 67620011
## 3rd Qu.:29152178 3rd Qu.:107434423
## Max. :36487245 Max. :274321313
##
## neighbourhood_group neighbourhood latitude longitude
## Length:48895 Length:48895 Min. :40.50 Min. :-74.24
## Class :character Class :character 1st Qu.:40.69 1st Qu.:-73.98
## Mode :character Mode :character Median :40.72 Median :-73.96
## Mean :40.73 Mean :-73.95
## 3rd Qu.:40.76 3rd Qu.:-73.94
## Max. :40.91 Max. :-73.71
##
## room_type price minimum_nights number_of_reviews
## Length:48895 Min. : 0.0 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.: 69.0 1st Qu.: 1.00 1st Qu.: 1.00
## Mode :character Median : 106.0 Median : 3.00 Median : 5.00
## Mean : 152.7 Mean : 7.03 Mean : 23.27
## 3rd Qu.: 175.0 3rd Qu.: 5.00 3rd Qu.: 24.00
## Max. :10000.0 Max. :1250.00 Max. :629.00
##
## last_review reviews_per_month calculated_host_listings_count
## Length:48895 Min. : 0.010 Min. : 1.000
## Class :character 1st Qu.: 0.190 1st Qu.: 1.000
## Mode :character Median : 0.720 Median : 1.000
## Mean : 1.373 Mean : 7.144
## 3rd Qu.: 2.020 3rd Qu.: 2.000
## Max. :58.500 Max. :327.000
## NA's :10052
## availability_365
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 45.0
## Mean :112.8
## 3rd Qu.:227.0
## Max. :365.0
##
### Using sum and is.na to check for missing NAs
sum(is.na(NYCAB))
## [1] 20141
### Omitting the missing data
na.omit(NYCAB)
## # A tibble: 38,821 × 16
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 2539 Clean & q… 2787 John Brooklyn Kensington 40.6
## 2 2595 Skylit Mi… 2845 Jennifer Manhattan Midtown 40.8
## 3 3831 Cozy Enti… 4869 LisaRoxa… Brooklyn Clinton Hill 40.7
## 4 5022 Entire Ap… 7192 Laura Manhattan East Harlem 40.8
## 5 5099 Large Coz… 7322 Chris Manhattan Murray Hill 40.7
## 6 5121 BlissArts… 7356 Garon Brooklyn Bedford-Stuy… 40.7
## 7 5178 Large Fur… 8967 Shunichi Manhattan Hell's Kitch… 40.8
## 8 5203 Cozy Clea… 7490 MaryEllen Manhattan Upper West S… 40.8
## 9 5238 Cute & Co… 7549 Ben Manhattan Chinatown 40.7
## 10 5295 Beautiful… 7702 Lena Manhattan Upper West S… 40.8
## # ℹ 38,811 more rows
## # ℹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <chr>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>
### Using lm to perform regression analysis
### the predictor is minimun_nights
lm_model <- lm(price ~ minimum_nights, data = NYCAB)
summary(lm_model)
##
## Call:
## lm(formula = price ~ minimum_nights, data = NYCAB)
##
## Residuals:
## Min 1Q Median 3Q Max
## -595.6 -84.2 -46.7 24.8 9848.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 149.1978 1.1470 130.070 <2e-16 ***
## minimum_nights 0.5011 0.0529 9.472 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 239.9 on 48893 degrees of freedom
## Multiple R-squared: 0.001832, Adjusted R-squared: 0.001811
## F-statistic: 89.73 on 1 and 48893 DF, p-value: < 2.2e-16
Equation for the Model: price= 0.5011 X minimum_nights + 149.20 (y=mx+b)
P-values:
Both the intercept and coefficient for minimum_nights are statistically significant (p < 0.001), suggesting that the number of minimum nights significantly affects the price.
Adjusted R-squared:
The adjusted R-squared value is 0.001811, this shows that approximately 0.18% of the variance in the price can be explained by the number of minimum nights. This value is quite low, suggesting that the model may not be a good fit for the data.
Analysis:
The coefficient estimate for minimum nights suggests that, on average, for each additional minimum night required, the price increases by $0.50. However, the low adjusted R-squared value indicates that the model explains only a small amount of the variability in price, suggesting that other factors not included in the model may influence price more significantly. Overall, while the number of minimum nights has a statistically significant effect on price, the model is not a good fit at explaning it.
### Grouping by neighbourhood_group and taking the average price
NYCAB_avg_price <- NYCAB |>
group_by(neighbourhood_group) |>
summarise(avg_price = mean(price))
### Setting custom colors
custom_colors <- c("#FF6F61", "#6B5B95", "#88B04B", "#F7CAC9", "#9A46B2")
### Making a bar chart using plotly
### Adding tool tips
bar_chart <- plot_ly(data = NYCAB_avg_price,
x = ~neighbourhood_group,
y = ~avg_price,
type = "bar",
text = ~paste("Neighbourhood Group: ", neighbourhood_group, "<br>Average Price: $", round(avg_price, 2)),
hoverinfo = "text",
marker = list(color = custom_colors)) |>
layout(title = "Average Price by Neighbourhood Group",
xaxis = list(title = "Neighbourhood Group"),
yaxis = list(title = "Average Price"),
hoverlabel = list(bgcolor = "white", font = list(family = "Arial", size = 12)),
margin = list(b = 80), # Adjustting the source
annotations = list(
list(
x = 0.5,
y = -0.25,
xref = "paper",
yref = "paper",
text = "Source: NYC Airbnb 2019",
showarrow = FALSE,
font = list(family = "Arial", size = 10)
)))
bar_chart
The bar chart shows the average price of Airbnb listings across different neighbourhoods groups in NYC. Each bar represents a neighbourhood group, with custom colors used for them. The chart reveals significant variations in average prices neighborhoods among different neighbourhoods, providing valuable information on pricing dynamics for potential Airbnb hosts and travelers.
### Filtering to leave the top prices
NYCAB_filtered <- NYCAB |>
filter(price <= quantile(price, 0.99)) # Removing top 1% of prices
### Making a histogram using plotly
### Adding tool tips
histogram <- plot_ly(data = NYCAB_filtered,
x = ~price,
color = ~room_type,
type = "histogram",
text = ~paste("Room Type: ", room_type, "<br>Price: $", price),
hoverinfo = "text") |>
layout(title = "Distribution of Prices by Room Type",
xaxis = list(title = "Price"),
yaxis = list(title = "Frequency"),
hoverlabel = list(bgcolor = "white", font = list(family = "Arial", size = 12)),
font = list(family = "Arial", size = 12),
margin = list(b = 80), # Adjusting the source
annotations = list(
list(
x = 0.5,
y = -0.25,
xref = "paper",
yref = "paper",
text = "Source: NYC Airbnb 2019",
showarrow = FALSE,
font = list(family = "Arial", size = 10)
)),
barmode = "overlay",
legend = list(title = "Room Type"))
histogram
The histogram shows the distribution of Airbnb listing prices categorized by room type. Different room types are categorized by colors, providing information on the pricing distribution for entire homes/apartments, private rooms, and shared rooms. This visualization allows the viewers to understand the range and frequency of prices across different room types in New York City’s Airbnb market.
### grouping by room type and neigbourhood group and taking the average price
avg_prices <- NYCAB_filtered |>
group_by(room_type, neighbourhood_group) |>
summarise(avg_price = mean(price))
## `summarise()` has grouped output by 'room_type'. You can override using the
## `.groups` argument.
### Making a heat map with plotly
### Adding tool tips
heatmap <- plot_ly() |>
add_heatmap(data = avg_prices,
x = ~neighbourhood_group,
y = ~room_type,
z = ~avg_price,
colorscale = "Viridis",
text = ~paste("Room Type: ", room_type, "<br>Neighbourhood Group: ", neighbourhood_group, "<br>Average Price: $", round(avg_price, 2)),
hoverinfo = "text",
colorbar = list(title = "Average Price ($)"))|>
layout(title = "Average Price by Room Type and Neighbourhood Group",
xaxis = list(title = "Neighbourhood Group"),
yaxis = list(title = "Room Type"),
hoverlabel = list(bgcolor = "white", font = list(family = "Arial", size = 12)),
font = list(family = "Arial", size = 12),
annotations = list( ### Adding the source
list(
x = 0.5,
y = -0.25,
xref = "paper",
yref = "paper",
text = "Source: NYC Airbnb 2019",
showarrow = FALSE,
font = list(family = "Arial", size = 10)
)))
heatmap
The heatmap visualizes the average price of Airbnb listings across different room types and neighborhood groups in NYC. The color intensity represents the average price, with darker shades indicating higher prices. This visualization allows viewers to identify trends and variations in pricing based on both room type and neighbourhood group, offering valuable information for potential Airbnb hosts and travelers.
library(leaflet)
library(htmltools)
#### Converting room_type to factor
NYCAB$room_type <- factor(NYCAB$room_type)
### Making custom palette
room_colors <- c("pink", "black", "purple")
### Making pop up
popup_content <- paste0(
"<b>Price: </b>", "$",NYCAB$price, "<br>",
"<b>Neighborhood: </b>", NYCAB$neighbourhood, "<br>",
"<b>Minimum Nights: </b>", NYCAB$minimum_nights, "<br>",
"<b>Neighborhood Group: </b>", NYCAB$neighbourhood_group, "<br>"
)
### Making leaflet map for NYC
### adding the locations for the airbnbs
map <- leaflet() |>
setView(lng = -73.935242, lat = 40.730610, zoom = 10) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircleMarkers(
data = NYCAB,
lng = ~longitude,
lat = ~latitude,
radius = 2,
color = ~colorFactor(palette = room_colors, levels = c("Entire home/apt", "Private room", "Shared room"))(room_type),
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto"
), ### adding the pop up
popup = popup_content
) |>
addLegend(
position = "bottomright",
colors = room_colors,
labels = c("Entire home/apt", "Private room", "Shared room"),
title = "Room Type"
)
map
A significant drop in New York City’s population during the COVID-19 pandemic led to record-breaking rent hikes in Manhattan and Brooklyn, according to The city. Between June 2020 and June 2022, roughly 400,000 people left the city, said The City. This is part of a trend opposite to what has occurred nationwide during the same time frame ᅳ when rents either stayed flat or fell. Why landlords were able to hike up prices amid a declining population was largely due to what one housing expert called “a perfect storm.” There was market churn, and changes in rent laws that emboldened property owners and prevented tenants from finding affordable places to live ᅳ while at the same time short-term rentals gobbled up units that could have been used for permanent housing. The Real Estate Board of New York (REBNY) also blamed a shortage of supply for exacerbating what it called a crisis. After the expiration of the 421-a tax abatement program, residential construction dropped off precipitously, according to REBNY. A spokesman for Mayor Bill de Blasio disputed this claim but did not provide any evidence or numbers refuting it when asked by The City.The visualization shows certain areas of NYC are more expensive than others despite smaller rooms being rented. The final visualization shows a map of NYC Airbnb listings which are colored by the type of room, and they also include the price of what it costs to rent it. This can tell us which NYC neighborhood it’s more affordable to stay at. However, it would be more helpful if this visualization provided greater context. For instance, what effect did particular policy changes or demographic shifts have on these numbers? Additionally including information about rental vacancy rates or housing construction trends might offer better insights into what’s driving New York City’s rental market dynamics. Also, including information from 2019 to 2022, when the pandemic was at its highest would have been better. In general, while capturing an upward trend of rent well enough, these graphic lacks depth because it does not explore variables around it which could have made it more explanatory and relevant to its environment. Also, A thing I wanted to but didn’t have enough time for was to group NYC city into its neighborhood and color by which type of room was more prevalent and how much the average price for each type of room was.
Reference: https://www.thecity.nyc/2023/08/04/why-is-nyc-rent-so-high/
** CHATGTP was used to fix errors.