library(tidyverse)
Airbnb listings in NYC
1 Introduction
This analysis explores Airbnb listings in NYC, identifying revenue drivers and investment opportunities.
2 Load Data
<- read_csv("/Users/aribarazzaq/Desktop/Data 608 knowledge and visual analytics /Story 4/nyc_airbnb_listings.csv") airbnb_data
3 Data Cleaning
# Convert price and revenue to numeric
<- airbnb_data %>%
airbnb_data mutate(price = as.numeric(price),
revenue = as.numeric(revenue),
occupancy = as.numeric(occupancy))
# Remove rows where revenue is NA
<- drop_na(airbnb_data, revenue)
airbnb_data
# Remove outliers in revenue using IQR method
<- quantile(airbnb_data$revenue, 0.25, na.rm = TRUE)
q1 <- quantile(airbnb_data$revenue, 0.75, na.rm = TRUE)
q3 <- q3 - q1
iqr <- airbnb_data %>%
airbnb_data filter(revenue <= (q3 + 1.5 * iqr))
# Handle missing values by replacing them with median
<- airbnb_data %>%
airbnb_data mutate(
price = ifelse(is.na(price), median(price, na.rm = TRUE), price),
review_scores_rating = ifelse(is.na(review_scores_rating), median(review_scores_rating, na.rm = TRUE), review_scores_rating),
occupancy = ifelse(is.na(occupancy), median(occupancy, na.rm = TRUE), occupancy)
)
4 Summary Statistics
<- airbnb_data %>%
summary_stats summarise(
avg_price = mean(price, na.rm = TRUE),
avg_occupancy = mean(occupancy, na.rm = TRUE),
avg_revenue = mean(revenue, na.rm = TRUE),
avg_reviews = mean(number_of_reviews, na.rm = TRUE)
)
print(summary_stats)
# A tibble: 1 × 4
avg_price avg_occupancy avg_revenue avg_reviews
<dbl> <dbl> <dbl> <dbl>
1 185. 0.504 77.5 33.7
Insight: On average, listings are priced at $221 per night with a 52.2% occupancy rate and earn about $102 in daily revenue. The average number of reviews is around 33, which suggests moderately active listings.
5 Revenue Analysis by Borough
# Calculate average revenue by borough
<- airbnb_data %>%
borough_revenue group_by(neighbourhood_group) %>%
summarise(avg_revenue = mean(revenue, na.rm = TRUE)) %>%
arrange(desc(avg_revenue))
# Plot revenue by borough
ggplot(borough_revenue, aes(x = reorder(neighbourhood_group, -avg_revenue), y = avg_revenue)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Average Airbnb Revenue by Borough", x = "Borough", y = "Revenue") +
theme_minimal()
Insight: Manhattan outperforms all other boroughs in average revenue, followed by Brooklyn. Bronx has the lowest revenue on average, signaling lower tourist activity or demand.
6 Price vs. Occupancy Rate
ggplot(airbnb_data, aes(x = price, y = occupancy)) +
geom_point(alpha = 0.5) +
labs(title = "Price vs. Occupancy Rate (Cleaned)", x = "Price ($)", y = "Occupancy Rate") +
xlim(0, 1000) + ylim(0, 1) +
theme_minimal()
Insight: There is no strong linear relationship between price and occupancy. Both high and low-priced listings can have high occupancy, which suggests that pricing strategy alone does not determine success.
7 Superhost Impact on Revenue
<- airbnb_data %>%
superhost_analysis group_by(host_is_superhost) %>%
summarise(avg_revenue = mean(revenue, na.rm = TRUE),
avg_occupancy = mean(occupancy, na.rm = TRUE))
print(superhost_analysis)
# A tibble: 3 × 3
host_is_superhost avg_revenue avg_occupancy
<lgl> <dbl> <dbl>
1 FALSE 71.4 0.462
2 TRUE 91.8 0.606
3 NA 90.9 0.540
Insight: Superhosts earn more on average ($116) than non-superhosts ($95), and also enjoy higher occupancy rates (61.5% vs. 48.3%). Hosting quality likely contributes to improved revenue.
8 Revenue by Room Type
<- airbnb_data %>%
room_type_revenue group_by(room_type) %>%
summarise(avg_revenue = mean(revenue, na.rm = TRUE))
ggplot(room_type_revenue, aes(x = reorder(room_type, -avg_revenue), y = avg_revenue, fill = room_type)) +
geom_bar(stat = "identity") +
labs(title = "Average Revenue by Room Type", x = "Room Type", y = "Revenue") +
theme_minimal()
Insight: Hotel rooms and entire home/apartments outperform private and shared rooms. Hosts should consider offering entire homes to increase profitability.
9 Guest Ratings vs. Revenue
ggplot(airbnb_data, aes(x = review_scores_rating, y = revenue)) +
geom_point(alpha = 0.5) +
labs(title = "Guest Ratings vs. Revenue (Cleaned)", x = "Review Scores", y = "Revenue") +
theme_minimal()
Insight: Listings with higher guest ratings tend to have better revenue. Guest experience strongly impacts earnings—hosts should focus on communication, cleanliness, and accuracy to drive better reviews.