Airbnb listings in NYC

Author

ARIBA MANDAVIA

Published

March 23, 2025

1 Introduction

This analysis explores Airbnb listings in NYC, identifying revenue drivers and investment opportunities.

library(tidyverse)

2 Load Data

airbnb_data <- read_csv("/Users/aribarazzaq/Desktop/Data 608 knowledge and visual analytics /Story 4/nyc_airbnb_listings.csv")

3 Data Cleaning

# Convert price and revenue to numeric
airbnb_data <- airbnb_data %>%
  mutate(price = as.numeric(price),
         revenue = as.numeric(revenue),
         occupancy = as.numeric(occupancy))

# Remove rows where revenue is NA
airbnb_data <- drop_na(airbnb_data, revenue)

# Remove outliers in revenue using IQR method
q1 <- quantile(airbnb_data$revenue, 0.25, na.rm = TRUE)
q3 <- quantile(airbnb_data$revenue, 0.75, na.rm = TRUE)
iqr <- q3 - q1
airbnb_data <- airbnb_data %>%
  filter(revenue <= (q3 + 1.5 * iqr))

# Handle missing values by replacing them with median
airbnb_data <- airbnb_data %>%
  mutate(
    price = ifelse(is.na(price), median(price, na.rm = TRUE), price),
    review_scores_rating = ifelse(is.na(review_scores_rating), median(review_scores_rating, na.rm = TRUE), review_scores_rating),
    occupancy = ifelse(is.na(occupancy), median(occupancy, na.rm = TRUE), occupancy)
  )

4 Summary Statistics

summary_stats <- airbnb_data %>%
  summarise(
    avg_price = mean(price, na.rm = TRUE),
    avg_occupancy = mean(occupancy, na.rm = TRUE),
    avg_revenue = mean(revenue, na.rm = TRUE),
    avg_reviews = mean(number_of_reviews, na.rm = TRUE)
  )

print(summary_stats)
# A tibble: 1 × 4
  avg_price avg_occupancy avg_revenue avg_reviews
      <dbl>         <dbl>       <dbl>       <dbl>
1      185.         0.504        77.5        33.7

Insight: On average, listings are priced at $221 per night with a 52.2% occupancy rate and earn about $102 in daily revenue. The average number of reviews is around 33, which suggests moderately active listings.

5 Revenue Analysis by Borough

# Calculate average revenue by borough
borough_revenue <- airbnb_data %>%
  group_by(neighbourhood_group) %>%
  summarise(avg_revenue = mean(revenue, na.rm = TRUE)) %>%
  arrange(desc(avg_revenue))

# Plot revenue by borough
ggplot(borough_revenue, aes(x = reorder(neighbourhood_group, -avg_revenue), y = avg_revenue)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Average Airbnb Revenue by Borough", x = "Borough", y = "Revenue") +
  theme_minimal()

Insight: Manhattan outperforms all other boroughs in average revenue, followed by Brooklyn. Bronx has the lowest revenue on average, signaling lower tourist activity or demand.

6 Price vs. Occupancy Rate

ggplot(airbnb_data, aes(x = price, y = occupancy)) +
  geom_point(alpha = 0.5) +
  labs(title = "Price vs. Occupancy Rate (Cleaned)", x = "Price ($)", y = "Occupancy Rate") +
  xlim(0, 1000) + ylim(0, 1) +
  theme_minimal()

Insight: There is no strong linear relationship between price and occupancy. Both high and low-priced listings can have high occupancy, which suggests that pricing strategy alone does not determine success.

7 Superhost Impact on Revenue

superhost_analysis <- airbnb_data %>%
  group_by(host_is_superhost) %>%
  summarise(avg_revenue = mean(revenue, na.rm = TRUE),
            avg_occupancy = mean(occupancy, na.rm = TRUE))

print(superhost_analysis)
# A tibble: 3 × 3
  host_is_superhost avg_revenue avg_occupancy
  <lgl>                   <dbl>         <dbl>
1 FALSE                    71.4         0.462
2 TRUE                     91.8         0.606
3 NA                       90.9         0.540

Insight: Superhosts earn more on average ($116) than non-superhosts ($95), and also enjoy higher occupancy rates (61.5% vs. 48.3%). Hosting quality likely contributes to improved revenue.

8 Revenue by Room Type

room_type_revenue <- airbnb_data %>%
  group_by(room_type) %>%
  summarise(avg_revenue = mean(revenue, na.rm = TRUE))

ggplot(room_type_revenue, aes(x = reorder(room_type, -avg_revenue), y = avg_revenue, fill = room_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Revenue by Room Type", x = "Room Type", y = "Revenue") +
  theme_minimal()

Insight: Hotel rooms and entire home/apartments outperform private and shared rooms. Hosts should consider offering entire homes to increase profitability.

9 Guest Ratings vs. Revenue

ggplot(airbnb_data, aes(x = review_scores_rating, y = revenue)) +
  geom_point(alpha = 0.5) +
  labs(title = "Guest Ratings vs. Revenue (Cleaned)", x = "Review Scores", y = "Revenue") +
  theme_minimal()

Insight: Listings with higher guest ratings tend to have better revenue. Guest experience strongly impacts earnings—hosts should focus on communication, cleanliness, and accuracy to drive better reviews.