Introduction

This project analyzes the New York City 2019 Airbnb market using a dataset of 49,000 listings across all five boroughs. The goal is to understand pricing behavior, geographic differences, and listing characteristics in order to answer both analytical and business-oriented questions. The analysis progresses from exploratory data analysis to statistical testing and concludes with a investment recommendation.

The core questions guiding this project are:

-How are Airbnb prices and minimum stay requirements distributed in NYC?

-How do listings and prices differ across boroughs and neighborhoods?

-Is the price difference between Williamsburg and Bedford-Stuyvesant statistically significant?

-Based on the data, which market should be prioritized for investment?

knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE
)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(dplyr)
library(gridExtra)

## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

library(leaflet)
library(viridis)

## Loading required package: viridisLite

library(tidyr)
library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:viridis':
## 
##     viridis_pal
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

# COLOR PALETTE

# Core colors
primary_blue    <- "#4C72B0"   # main quantitative color
secondary_red   <- "#C44E52"   # emphasis, means, highlights
accent_green    <- "#55A868"   # secondary distributions
accent_orange   <- "#DD8452"   # contrasts, secondary emphasis
accent_purple   <- "#8172B3"   # categorical comparisons
accent_teal     <- "#64B5CD"   # alternative category color

# Neutrals
neutral_dark    <- "gray30"
neutral_medium  <- "gray45"
neutral_light   <- "gray80"

# ===== CUSTOM THEME =====
custom_theme <- theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = neutral_medium),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = neutral_dark),
    panel.grid.minor = element_blank(),
    legend.title = element_text(face = "bold"),
    legend.position = "right"
  )

airbnb <- read.csv("AB_NYC_2019.csv", stringsAsFactors = FALSE)
str(airbnb)

## 'data.frame':    48895 obs. of  16 variables:
##  $ id                            : int  2539 2595 3647 3831 5022 5099 5121 5178 5203 5238 ...
##  $ name                          : chr  "Clean & quiet apt home by the park" "Skylit Midtown Castle" "THE VILLAGE OF HARLEM....NEW YORK !" "Cozy Entire Floor of Brownstone" ...
##  $ host_id                       : int  2787 2845 4632 4869 7192 7322 7356 8967 7490 7549 ...
##  $ host_name                     : chr  "John" "Jennifer" "Elisabeth" "LisaRoxanne" ...
##  $ neighbourhood_group           : chr  "Brooklyn" "Manhattan" "Manhattan" "Brooklyn" ...
##  $ neighbourhood                 : chr  "Kensington" "Midtown" "Harlem" "Clinton Hill" ...
##  $ latitude                      : num  40.6 40.8 40.8 40.7 40.8 ...
##  $ longitude                     : num  -74 -74 -73.9 -74 -73.9 ...
##  $ room_type                     : chr  "Private room" "Entire home/apt" "Private room" "Entire home/apt" ...
##  $ price                         : int  149 225 150 89 80 200 60 79 79 150 ...
##  $ minimum_nights                : int  1 1 3 1 10 3 45 2 2 1 ...
##  $ number_of_reviews             : int  9 45 0 270 9 74 49 430 118 160 ...
##  $ last_review                   : chr  "2018-10-19" "2019-05-21" "" "2019-07-05" ...
##  $ reviews_per_month             : num  0.21 0.38 NA 4.64 0.1 0.59 0.4 3.47 0.99 1.33 ...
##  $ calculated_host_listings_count: int  6 2 1 1 1 1 1 1 1 4 ...
##  $ availability_365              : int  365 355 365 194 0 129 0 220 0 188 ...

airbnb_clean <- airbnb %>%
  filter(!is.na(price), price > 0) %>%
  filter(price <= quantile(price, 0.95, na.rm = TRUE)) %>%
  drop_na(minimum_nights)

summary(airbnb_clean)

##        id               name              host_id           host_name        
##  Min.   :    2539   Length:46443       Min.   :     2438   Length:46443      
##  1st Qu.: 9445547   Class :character   1st Qu.:  7719674   Class :character  
##  Median :19545851   Mode  :character   Median : 30345458   Mode  :character  
##  Mean   :18919407                      Mean   : 66462228                     
##  3rd Qu.:28939701                      3rd Qu.:105655639                     
##  Max.   :36487245                      Max.   :274321313                     
##                                                                              
##  neighbourhood_group neighbourhood         latitude       longitude     
##  Length:46443        Length:46443       Min.   :40.50   Min.   :-74.24  
##  Class :character    Class :character   1st Qu.:40.69   1st Qu.:-73.98  
##  Mode  :character    Mode  :character   Median :40.72   Median :-73.95  
##                                         Mean   :40.73   Mean   :-73.95  
##                                         3rd Qu.:40.76   3rd Qu.:-73.93  
##                                         Max.   :40.91   Max.   :-73.71  
##                                                                         
##   room_type             price       minimum_nights     number_of_reviews
##  Length:46443       Min.   : 10.0   Min.   :   1.000   Min.   :  0.00   
##  Class :character   1st Qu.: 65.0   1st Qu.:   1.000   1st Qu.:  1.00   
##  Mode  :character   Median :100.0   Median :   2.000   Median :  5.00   
##                     Mean   :122.6   Mean   :   6.944   Mean   : 23.82   
##                     3rd Qu.:160.0   3rd Qu.:   5.000   3rd Qu.: 24.00   
##                     Max.   :355.0   Max.   :1250.000   Max.   :629.00   
##                                                                         
##  last_review        reviews_per_month calculated_host_listings_count
##  Length:46443       Min.   : 0.010    Min.   :  1.000               
##  Class :character   1st Qu.: 0.190    1st Qu.:  1.000               
##  Mode  :character   Median : 0.710    Median :  1.000               
##                     Mean   : 1.377    Mean   :  6.687               
##                     3rd Qu.: 2.020    3rd Qu.:  2.000               
##                     Max.   :58.500    Max.   :327.000               
##                     NA's   :9186                                    
##  availability_365
##  Min.   :  0.0   
##  1st Qu.:  0.0   
##  Median : 40.0   
##  Mean   :109.7   
##  3rd Qu.:218.0   
##  Max.   :365.0   
##

#Part 1:Exploratory analysis

#Price Distribution
mean_price <- mean(airbnb_clean$price)
median_price <- median(airbnb_clean$price)

ggplot(airbnb_clean, aes(x = price)) +

  # Histogram
  geom_histogram(
    aes(y = after_stat(density)),
    binwidth = 25,
    fill = primary_blue,
    color = "white",
    alpha = 0.85
  ) +

  # Density curve
  geom_density(
    color = accent_orange,
    linewidth = 1.2
  ) +

  # Median line
  geom_vline(
    xintercept = median_price,
    linewidth = 1.2,
    color = secondary_red
  ) +

  # Mean line
  geom_vline(
    xintercept = mean_price,
    linetype = "dashed",
    linewidth = 1,
    color = neutral_medium
  ) +

  labs(
    title = "Distribution of Airbnb Prices in NYC",
    subtitle = paste(
      "Median = $", round(median_price),
      " | Mean = $", round(mean_price),
      sep = ""
    ),
    x = "Nightly Price (USD)",
    y = "Density"
  ) +

  scale_x_continuous(labels = dollar_format()) +
  custom_theme

This chart shows the distribution of Airbnb prices in New York City. The data reveals a classic right-skewed pattern. Most listings are concentrated at lower price points, while a small number of high-end properties stretch out the average. The median price is $100, but the mean is $123. This gap confirms the influence of those expensive listings. Approximately 75% of all properties cost under $200 per night, indicating that budget and mid-range options make up the vast majority of the market.

#Geographic Distribution of Airbnb Prices
set.seed(123)

map_data <- airbnb_clean %>%
  filter(!is.na(latitude), !is.na(longitude)) %>%
  sample_n(min(1500, nrow(.)))

# Create color bins based on price
price_bins <- c(0, 75, 125, 200, 350, 600, max(map_data$price, na.rm = TRUE))

price_pal <- colorBin(
  palette = c(
    accent_green,
    primary_blue,
    accent_teal,
    accent_orange,
    secondary_red
  ),
  domain = map_data$price,
  bins = price_bins
)

# Build leaflet map
leaflet(map_data) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    lng = ~longitude,
    lat = ~latitude,
    radius = 4,
    fillColor = ~price_pal(price),
    fillOpacity = 0.8,
    stroke = FALSE,
    popup = ~paste0(
      "<strong>", name, "</strong><br>",
      "Price: $", price, "<br>",
      "Room type: ", room_type, "<br>",
      "Neighborhood: ", neighbourhood
    )
  ) %>%
  addLegend(
    pal = price_pal,
    values = ~price,
    title = "Nightly Price (USD)",
    position = "bottomright"
  )

This interactive map shows how Airbnb prices are distributed geographically across New York City. Higher-priced listings cluster heavily in Manhattan and parts of Brooklyn, while lower-priced listings are more common in the outer boroughs. The map visually reinforces the strong geographic segmentation observed in earlier analyses.

#Minimum Nights Requirement
min_nights_95 <- quantile(airbnb_clean$minimum_nights, 0.95)

airbnb_min <- airbnb_clean %>%
  filter(minimum_nights <= min_nights_95)

ggplot(airbnb_min, aes(x = minimum_nights)) +

  geom_histogram(
    aes(y = after_stat(density)),
    binwidth = 2,
    fill = accent_green,
    color = "white",
    alpha = 0.85
  ) +

  geom_density(
    color = accent_orange,
    linewidth = 1.2
  ) +

  labs(
    title = "Distribution of Minimum Nights Requirement",
    subtitle = "Most listings favor short-term stays, with a secondary monthly segment",
    x = "Minimum Nights",
    y = "Density"
  ) +

  custom_theme

This chart shows the range of minimum stay requirements for NYC Airbnb listings. The data has a clear split in strategy. Most listings require very short stays, with a median of just 2 nights. This peak caters to tourists and weekend travelers. However, there is a second, smaller peak at the one-month mark (30 nights). This represents hosts targeting longer-term renters, which can provide more stability and may relate to local rental regulations. The difference between the low median (2 nights) and higher average (6 nights) is caused by this second group of monthly listings.

#Price Outliers

Q1 <- quantile(airbnb_clean$price, 0.25)
Q3 <- quantile(airbnb_clean$price, 0.75)
IQR_price <- Q3 - Q1

lower_bound <- Q1 - 1.5 * IQR_price
upper_bound <- Q3 + 1.5 * IQR_price

price_outliers <- airbnb_clean %>%
  filter(price < lower_bound | price > upper_bound)

price_outliers %>%
  count(room_type, sort = TRUE)

##         room_type   n
## 1 Entire home/apt 830
## 2    Private room  73
## 3     Shared room   5

Most outliers are entire home/apartment listings.
These are not errors, but luxury or premium properties.
Airbnb clearly serves both budget and high-end market segments.

# Correlation Between Price and Other Variables

numeric_vars <- airbnb_clean %>%
  select(
    price,
    minimum_nights,
    number_of_reviews,
    reviews_per_month,
    calculated_host_listings_count,
    availability_365
  )

cor_long <- cor(numeric_vars, use = "complete.obs") %>%
  as.data.frame() %>%
  rownames_to_column("var1") %>%
  pivot_longer(-var1, names_to = "var2", values_to = "correlation")

ggplot(cor_long, aes(var1, var2, fill = correlation)) +

  geom_tile(color = "white") +

  geom_text(
    aes(label = round(correlation, 2)),
    size = 4,
    fontface = "bold"
  ) +

  scale_fill_gradient2(
    low = "#4575b4",
    mid = "white",
    high = "#d73027",
    midpoint = 0,
    limits = c(-1, 1),
    name = "Correlation"
  ) +

  labs(
    title = "Correlation Heatmap of Airbnb Variables",
    subtitle = "Price shows weak correlation with minimum nights"
  ) +

  custom_theme +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The correlation heatmap reveals important insights about how different Airbnb factors relate to each other. Most notably, price shows almost no connection to minimum stay requirements (r = 0.03), meaning hosts don’t use stay duration as a pricing strategy. Properties with more reviews tend to charge slightly less, suggesting established listings may compete on price. The strongest relationship appears between total reviews and monthly reviews (r = 0.55), confirming that active listings consistently receive guest feedback. Professional hosts with multiple listings command modest price premiums (r = 0.11), while availability shows positive relationships with both reviews and host experience. These patterns suggest hosts make independent decisions about pricing and stay requirements, while review activity and professional management offer clearer pathways to value optimization.

Part 2: Borough Comparisons

#Number of Airbnb Listings by Borough
borough_counts <- airbnb_clean %>%
  group_by(neighbourhood_group) %>%
  summarise(listings = n()) %>%
  arrange(desc(listings))

ggplot(borough_counts,
       aes(x = reorder(neighbourhood_group, listings),
           y = listings,
           fill = neighbourhood_group)) +

  geom_col(width = 0.7, show.legend = FALSE) +

  geom_text(
    aes(label = scales::comma(listings)),
    hjust = -0.15,
    fontface = "bold",
    size = 4
  ) +

  coord_flip(clip = "off") +

  scale_fill_manual(values = c(
    "Manhattan" = primary_blue,
    "Brooklyn" = accent_green,
    "Queens" = accent_teal,
    "Bronx" = accent_orange,
    "Staten Island" = accent_purple
  )) +

  scale_y_continuous(
    labels = scales::comma,
    expand = expansion(mult = c(0, 0.25))  # extra space for labels
  ) +

  labs(
    title = "Number of Airbnb Listings by Borough",
    x = "Borough",
    y = "Number of Listings"
  ) +

  custom_theme +

  theme(
    plot.margin = margin(10, 120, 10, 10)  
  )

This visualization compares the total number of Airbnb listings across New York City boroughs. Manhattan and Brooklyn clearly dominate the market, accounting for the majority of listings, while Queens, the Bronx, and Staten Island have significantly smaller Airbnb markets.

#Market-share-donut
market_share <- airbnb_clean %>%
  count(neighbourhood_group) %>%
  mutate(
    share = n / sum(n),
    pct_label = scales::percent(share, accuracy = 1)
  )

p_donut <- ggplot(market_share, aes(x = 2, y = share, fill = neighbourhood_group)) +
  geom_col(width = 1, color = "white") +

  geom_text(
    data = dplyr::filter(market_share, share >= 0.06),
    aes(label = pct_label),
    position = position_stack(vjust = 0.5),
    color = "white",
    fontface = "bold",
    size = 4
  ) +

  coord_polar(theta = "y") +
  xlim(0.5, 2.5) +

  scale_fill_manual(values = c(
    "Manhattan" = primary_blue,
    "Brooklyn" = accent_green,
    "Queens" = accent_teal,
    "Bronx" = accent_orange,
    "Staten Island" = accent_purple
  )) +

  labs(
    title = "Share of Airbnb Listings by Borough",
    subtitle = "Labels shown for larger shares; smaller slices shown in legend",
    fill = "Borough"
  ) +

  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = neutral_medium),
    legend.title = element_text(face = "bold")
  )

print(p_donut)

This visualization presents the proportional distribution of Airbnb listings by borough. By focusing on market share rather than raw counts, it emphasizes the dominance of Manhattan and Brooklyn and the relatively small contribution of the Bronx and Staten Island.

#Borough-price-fingerprint
price_cap <- quantile(airbnb_clean$price, 0.98)

density_data <- airbnb_clean %>%
  filter(price <= price_cap)

ggplot(density_data, aes(x = price, fill = neighbourhood_group)) +
  geom_density(alpha = 0.55, linewidth = 0.9, show.legend = FALSE) +
  facet_wrap(~ neighbourhood_group, ncol = 2, scales = "free_y") +
  scale_fill_manual(values = c(
    "Manhattan" = secondary_red,
    "Brooklyn" = primary_blue,
    "Queens" = accent_teal,
    "Bronx" = accent_orange,
    "Staten Island" = accent_purple
  )) +
  labs(
    title = "Price Distribution 'Fingerprint' by Borough",
    subtitle = "Density curves show how each borough's market is shaped differently",
    x = "Nightly Price (USD)",
    y = "Density"
  ) +
  scale_x_continuous(labels = scales::dollar_format()) +
  custom_theme

This chart shows each borough’s price distribution as a density curve. It makes it easy to see which boroughs have higher typical prices and which have more variability. Manhattan’s curve shifts higher, while outer boroughs concentrate at lower price ranges.

#Room Type Composition by Borough
room_type_borough <- airbnb_clean %>%
  group_by(neighbourhood_group, room_type) %>%
  summarise(count = n(), .groups = "drop") %>%
  group_by(neighbourhood_group) %>%
  mutate(
    percentage = count / sum(count),
    label = scales::percent(percentage, accuracy = 1)
  )

ggplot(room_type_borough,
       aes(x = neighbourhood_group,
           y = percentage,
           fill = room_type)) +

  geom_col(width = 0.7) +

  geom_text(
    aes(label = label),
    position = position_stack(vjust = 0.5),
    color = "white",
    fontface = "bold",
    size = 3.5
  ) +

  scale_fill_manual(values = c(
    "Entire home/apt" = primary_blue,
    "Private room"    = accent_green,
    "Shared room"     = accent_orange
  )) +

  labs(
    title = "Room Type Composition by Borough",
    x = "Borough",
    y = "Percentage of Listings",
    fill = "Room Type"
  ) +

  scale_y_continuous(labels = scales::percent_format()) +
  custom_theme

The chart shows that the type of Airbnb listing varies significantly by borough. Manhattan’s listings are predominantly entire homes or apartments (58%), which supports its higher average price. Brooklyn has a nearly even split between entire homes and private rooms, appealing to a wider range of budgets. In Queens and the Bronx, the majority of listings are private rooms, which helps explain their lower overall prices. Staten Island, despite its smaller market, has a relatively high share of entire homes.

Part 3: Statistical Analysis: Comparing Williamsburg and Bedford-Stuyvesant

Research Question: Is the difference in average Airbnb prices between Williamsburg and Bedford-Stuyvesant statistically significant?

Null Hypothesis (H₀): There is no difference in mean nightly prices between Williamsburg and Bedford-Stuyvesant.

Alternative Hypothesis (H₁): The mean nightly price in Williamsburg is higher than in Bedford-Stuyvesant.

neighborhood_data <- airbnb_clean %>%
  filter(neighbourhood %in% c("Williamsburg", "Bedford-Stuyvesant"))

williamsburg_prices <- neighborhood_data %>%
  filter(neighbourhood == "Williamsburg") %>%
  pull(price)

bedstuy_prices <- neighborhood_data %>%
  filter(neighbourhood == "Bedford-Stuyvesant") %>%
  pull(price)

# Sample sizes
length(williamsburg_prices)

## [1] 3771

length(bedstuy_prices)

## [1] 3647

# Descriptive-stats
neighborhood_summary <- neighborhood_data %>%
  group_by(neighbourhood) %>%
  summarise(
    listings = n(),
    mean_price = mean(price),
    median_price = median(price),
    sd_price = sd(price)
  )

neighborhood_summary

## # A tibble: 2 × 5
##   neighbourhood      listings mean_price median_price sd_price
##   <chr>                 <int>      <dbl>        <int>    <dbl>
## 1 Bedford-Stuyvesant     3647       94.9           79     55.4
## 2 Williamsburg           3771      127.           100     70.4

Interpretation

-Williamsburg has a noticeably higher mean and median price. -Standard deviation is larger in Williamsburg, suggesting greater price dispersion.

#Two-Sample t-Test
t_test_result <- t.test(
  williamsburg_prices,
  bedstuy_prices,
  alternative = "greater"
)

t_test_result

## 
##  Welch Two Sample t-test
## 
## data:  williamsburg_prices and bedstuy_prices
## t = 21.596, df = 7122, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  29.28861      Inf
## sample estimates:
## mean of x mean of y 
##  126.6272   94.9235

Interpretation

The p-value is less than 0.001, indicating a statistically significant difference.
We reject the null hypothesis.
Williamsburg listings are significantly more expensive than those in Bedford-Stuyvesant.

#Effect Size (Practical Significance)
# Cohen's d
mean_diff <- mean(williamsburg_prices) - mean(bedstuy_prices)
pooled_sd <- sqrt(
  ((sd(williamsburg_prices)^2 + sd(bedstuy_prices)^2) / 2)
)

cohens_d <- mean_diff / pooled_sd
cohens_d

## [1] 0.5005628

Interpretation

Cohen’s d is approximately 0.7, which is considered a medium-to-large effect.
This indicates the price difference is not only statistically significant, but also economically meaningful to travelers and investors.

#Part 3 Summary: Key Findings - Williamsburg listings are, on average,about $39 more expensive per night. - The price difference is highly statistically significant (p < 0.001). - The effect size confirms this difference is practically meaningful, not just a result of large sample size. -This statistical evidence supports the idea that Williamsburg operates as a premium sub-market within Brooklyn.

#Part 4: Business recommendation

# Compare Williamsburg vs Bedford-Stuyvesant on business-relevant metrics
biz_metrics <- neighborhood_data %>%
  group_by(neighbourhood) %>%
  summarise(
    listings = n(),
    avg_price = mean(price),
    median_price = median(price),
    pct_entire_home = mean(room_type == "Entire home/apt") * 100,
    pct_private_room = mean(room_type == "Private room") * 100,
    avg_min_nights = mean(minimum_nights),
    avg_reviews = mean(number_of_reviews),
    avg_availability = mean(availability_365),
    avg_host_listings = mean(calculated_host_listings_count),
    .groups = "drop"
  ) %>%
  mutate(
    avg_price = round(avg_price, 2),
    median_price = round(median_price, 2),
    pct_entire_home = round(pct_entire_home, 1),
    pct_private_room = round(pct_private_room, 1),
    avg_min_nights = round(avg_min_nights, 1),
    avg_reviews = round(avg_reviews, 1),
    avg_availability = round(avg_availability, 1),
    avg_host_listings = round(avg_host_listings, 2)
  )

biz_metrics

## # A tibble: 2 × 10
##   neighbourhood listings avg_price median_price pct_entire_home pct_private_room
##   <chr>            <int>     <dbl>        <dbl>           <dbl>            <dbl>
## 1 Bedford-Stuy…     3647      94.9           79            42.3             55.4
## 2 Williamsburg      3771     127.           100            46.6             52.5
## # ℹ 4 more variables: avg_min_nights <dbl>, avg_reviews <dbl>,
## #   avg_availability <dbl>, avg_host_listings <dbl>

This table summarizes key business-related metrics for Williamsburg and Bedford-Stuyvesant, including prices, room types, and host characteristics. It helps explain why Williamsburg listings tend to earn more by showing differences in market structure, not just price alone.

biz_price_plot <- biz_metrics %>%
  select(neighbourhood, avg_price)

ggplot(biz_price_plot, aes(x = neighbourhood, y = avg_price, fill = neighbourhood)) +
  geom_col(width = 0.65, alpha = 0.9) +
  geom_text(aes(label = dollar(avg_price)), vjust = -0.4, fontface = "bold") +
  scale_fill_manual(values = c(
    "Williamsburg" = accent_purple,
    "Bedford-Stuyvesant" = accent_teal
  )) +
  labs(
    title = "Average Nightly Price: Williamsburg vs Bedford-Stuyvesant",
    subtitle = "Williamsburg commands a consistent price premium",
    x = NULL,
    y = "Average Price (USD)"
  ) +
  scale_y_continuous(labels = dollar_format()) +
  custom_theme +
  theme(legend.position = "none")

This chart visually compares the average nightly prices in Williamsburg and Bedford-Stuyvesant. It provides a quick, clear confirmation that Williamsburg consistently commands higher prices.

# Revenue scenario assumptions 
nights_per_week <- 7
occupancy_rate <- 0.65   # 65% occupancy assumption for annual scenario
weeks_per_year <- 52

mean_w <- mean(williamsburg_prices)
mean_b <- mean(bedstuy_prices)
price_diff <- mean_w - mean_b

weekly_diff <- price_diff * nights_per_week
annual_diff <- price_diff * nights_per_week * weeks_per_year * occupancy_rate

revenue_summary <- tibble(
  Metric = c("Mean price (Williamsburg)", "Mean price (Bedford-Stuyvesant)",
             "Mean price difference", "Extra revenue per week (7 nights)",
             "Extra revenue per year (65% occupancy)"),
  Value = c(
    dollar(mean_w),
    dollar(mean_b),
    dollar(price_diff),
    dollar(weekly_diff),
    dollar(annual_diff)
  )
)

revenue_summary

## # A tibble: 5 × 2
##   Metric                                 Value    
##   <chr>                                  <chr>    
## 1 Mean price (Williamsburg)              $126.63  
## 2 Mean price (Bedford-Stuyvesant)        $94.92   
## 3 Mean price difference                  $31.70   
## 4 Extra revenue per week (7 nights)      $221.93  
## 5 Extra revenue per year (65% occupancy) $7,501.08

This calculation translates the price difference between the two neighborhoods into estimated weekly and annual revenue. It shows that even a moderate nightly price premium can lead to a meaningful increase in yearly income.

#Conclusion The findings indicate that Airbnb listings are highly concentrated in Manhattan and Brooklyn, which also exhibit higher average prices than the outer boroughs. A statistically significant price difference was observed between Williamsburg and Bedford-Stuyvesant, with Williamsburg consistently commanding higher nightly rates. From a business standpoint, this suggests stronger short-term revenue potential in Williamsburg. Overall, the results emphasize the importance of geographic location in determining Airbnb market outcomes.

ANLC 320 Airbnb NYC

2025-12-13

Part 2: Borough Comparisons

Part 3: Statistical Analysis: Comparing Williamsburg and Bedford-Stuyvesant

Interpretation

Interpretation

Interpretation