The NYC Airbnb Open Data for 2019 is the dataset that was used for this study; it contains information on various Airbnb accommodation located throughout New York City. The dataset enables a thorough examination of the Airbnb market since it includes key features like price, number of reviews, minimum nights, and location.
library(tidyverse)
## āā Attaching core tidyverse packages āāāāāāāāāāāāāāāāāāāāāāāā tidyverse 2.0.0 āā
## ā dplyr 1.1.4 ā readr 2.1.5
## ā forcats 1.0.0 ā stringr 1.5.1
## ā ggplot2 3.5.1 ā tibble 3.2.1
## ā lubridate 1.9.3 ā tidyr 1.3.1
## ā purrr 1.0.2
## āā Conflicts āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā tidyverse_conflicts() āā
## ā dplyr::filter() masks stats::filter()
## ā dplyr::lag() masks stats::lag()
## ā¹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read_delim("./AB_NYC_2019.csv", delim = ",")
## Rows: 48895 Columns: 16
## āā Column specification āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
## Delimiter: ","
## chr (5): name, host_name, neighbourhood_group, neighbourhood, room_type
## dbl (10): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## date (1): last_review
##
## ā¹ Use `spec()` to retrieve the full column specification for this data.
## ā¹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 Ć 16
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 2539 Clean & qu⦠2787 John Brooklyn Kensington 40.6
## 2 2595 Skylit Mid⦠2845 Jennifer Manhattan Midtown 40.8
## 3 3647 THE VILLAG⦠4632 Elisabeth Manhattan Harlem 40.8
## 4 3831 Cozy Entir⦠4869 LisaRoxa⦠Brooklyn Clinton Hill 40.7
## 5 5022 Entire Apt⦠7192 Laura Manhattan East Harlem 40.8
## 6 5099 Large Cozy⦠7322 Chris Manhattan Murray Hill 40.7
## # ā¹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>
The primary goal of this project is to examine the pricing dynamics
of NYC Airbnb listings, with particular focus on how minimum nights and
guest reviews affect the price. Furthermore, the project aims to
identify insights that may benefit hosts and potential guests alike, as
well as evaluate perceived value using metrics such as price per
review.
Price Distribution by Neighborhood
A visual representation of the price distribution across different neighborhoods in New York City would show how location influences pricing. It may identify places with significantly higher or cheaper prices, assisting guests in making accommodation choices and helping hosts understand competitive pricing.
Reason for Investigation: Understanding neighborhood price dynamics is essential for both guests and hosts, as it affects budgeting and pricing strategies.
Room Type Distribution:
A bar chart of the distribution of room types (e.g., entire home/apartment, private room, shared room) to see which types dominate the NYC Airbnb market.
ggplot(data, aes(x = room_type)) +
geom_bar(fill = "skyblue") +
theme_minimal() +
labs(title = "Distribution of Room Types in NYC", x = "Room Type", y = "Count")
Reason for Investigation: Understanding the most common room types may show preferences between hosts and guests. This could also indicate how different accommodation types are priced.
To Do List:
Conduct further investigations on the relationship between various attributes (such as amenities and property type) and pricing.
Explore seasonal pricing and evaluate trends to determine how they impact demand and average pricing.
Investigate how the price per review varies with specific attributes provided by listings.
Create more data visualizations that illustrate significant trends and findings.
Hypothesis 1: Listings in popularĀ tourist neighborhoods (e.g., Manhattan) have higher average prices than listings in less popular regions (e.g., Staten Island).
ggplot(data, aes(x = neighbourhood_group, y = price)) +
geom_boxplot() +
labs(title = "Price Distribution by Neighborhood Group",
x = "Neighborhood Group",
y = "Price") +
theme_minimal()
Hypothesis 2: Across all boroughs, entire home/apartment listingsĀ price more than private and shared rooms.
This hypothesis argues that guests are willing to pay more for
private accommodation compared to shared living spaces.
Visualization: A violin plot can show the distribution of prices across room types
ggplot(data, aes(x = room_type, y = price)) +
geom_violin(fill = "lightgreen") +
theme_minimal() +
labs(title = "Price Distribution by Room Type", x = "Room Type", y = "Price")