Story 4 Airbnb Revenue Analysis

The goal of this assignment is to tell a visual story about the factors that differentiate profitable airbnbs from unprofitable airbnbs. You should imagine that you are either an analyst at airbnb looking for ways to help hosts become more profitable or to suggest to make suggestions about who should become a host, or a real estate investorlooking to either acquire (or help manage) short term rental properties.

Based on your analysis, you can phrase your story in different ways (these are just suggestions):

• You could identify listings which are underperforming and the factors that they could change to improve their revenue.

• You could identify neighborhoods where new airbnbs are likely to be successful and the characteristics that you would look for in a property.

Introduction

This chunk of code analyzes New York City Airbnb listings by first loading and cleaning the data, which includes addressing data inconsistencies and calculating estimated revenue, then merges this information with geographical neighborhood data. Subsequently, the script generates visualizations to explore relationships between price, occupancy, revenue, and neighborhood location.

These visualizations include a geographical map of average neighborhood revenue, a scatter plot comparing price and occupancy rates, and a boxplot illustrating revenue distribution by room type. Finally, the code extracts and presents the top ten neighborhoods with the highest average revenue, providing key insights into the NYC Airbnb market.

# Load necessary libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.2
## Warning: package 'readr' was built under R version 4.4.2
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'stringr' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
## Warning: package 'sf' was built under R version 4.4.3
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
library(ggplot2)
library(viridis)
## Loading required package: viridisLite
# File paths
airbnb_file <- "C:/Users/Dell/Downloads/nyc_airbnb_listings.csv"
geojson_file <- "C:/Users/Dell/Downloads/neighbourhoods.geojson"

# Step 1: Load and Inspect the Data
# Load datasets
airbnb_data <- read_csv(airbnb_file)
## Rows: 37765 Columns: 60
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (14): host_location, host_response_time, host_response_rate, host_accep...
## dbl  (37): ...1, id, host_id, host_listings_count, host_total_listings_count...
## lgl   (6): host_is_superhost, host_has_profile_pic, host_identity_verified, ...
## date  (3): host_since, first_review, last_review
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
neighborhoods <- st_read(geojson_file)
## Reading layer `neighbourhoods' from data source 
##   `C:\Users\Dell\Downloads\neighbourhoods.geojson' using driver `GeoJSON'
## Simple feature collection with 233 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -74.25559 ymin: 40.49613 xmax: -73.70782 ymax: 40.91553
## Geodetic CRS:  WGS 84
# Inspect the datasets
print("Airbnb Data Summary:")
## [1] "Airbnb Data Summary:"
summary(airbnb_data)
##       ...1             id               host_id            host_since        
##  Min.   :    0   Min.   :2.595e+03   Min.   :     1678   Min.   :2008-08-11  
##  1st Qu.: 9441   1st Qu.:2.055e+07   1st Qu.: 16627758   1st Qu.:2014-06-18  
##  Median :18882   Median :4.826e+07   Median : 82189528   Median :2016-07-10  
##  Mean   :18882   Mean   :3.653e+17   Mean   :165968419   Mean   :2017-02-08  
##  3rd Qu.:28323   3rd Qu.:8.276e+17   3rd Qu.:303156931   3rd Qu.:2019-10-18  
##  Max.   :37764   Max.   :1.193e+18   Max.   :586917430   Max.   :2024-07-01  
##                                                          NA's   :5           
##  host_location      host_response_time host_response_rate host_acceptance_rate
##  Length:37765       Length:37765       Length:37765       Length:37765        
##  Class :character   Class :character   Class :character   Class :character    
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character    
##                                                                               
##                                                                               
##                                                                               
##                                                                               
##  host_is_superhost host_neighbourhood host_listings_count
##  Mode :logical     Length:37765       Min.   :   1.0     
##  FALSE:30186       Class :character   1st Qu.:   1.0     
##  TRUE :7181        Mode  :character   Median :   2.0     
##  NA's :398                            Mean   : 205.6     
##                                       3rd Qu.:   9.0     
##                                       Max.   :4641.0     
##                                       NA's   :5          
##  host_total_listings_count host_verifications host_has_profile_pic
##  Min.   :   1.0            Length:37765       Mode :logical       
##  1st Qu.:   1.0            Class :character   FALSE:688           
##  Median :   3.0            Mode  :character   TRUE :37072         
##  Mean   : 284.5                               NA's :5             
##  3rd Qu.:  13.0                                                   
##  Max.   :9013.0                                                   
##  NA's   :5                                                        
##  host_identity_verified neighborhood       neighbourhood_group_cleansed
##  Mode :logical          Length:37765       Length:37765                
##  FALSE:4298             Class :character   Class :character            
##  TRUE :33462            Mode  :character   Mode  :character            
##  NA's :5                                                               
##                                                                        
##                                                                        
##                                                                        
##     latitude       longitude      property_type       room_type        
##  Min.   :40.50   Min.   :-74.25   Length:37765       Length:37765      
##  1st Qu.:40.69   1st Qu.:-73.98   Class :character   Class :character  
##  Median :40.73   Median :-73.95   Mode  :character   Mode  :character  
##  Mean   :40.73   Mean   :-73.95                                        
##  3rd Qu.:40.76   3rd Qu.:-73.93                                        
##  Max.   :40.91   Max.   :-73.71                                        
##                                                                        
##   accommodates      bathrooms         bedrooms          beds       
##  Min.   : 1.000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.00   1st Qu.: 1.000  
##  Median : 2.000   Median : 1.000   Median : 1.00   Median : 1.000  
##  Mean   : 2.762   Mean   : 1.189   Mean   : 1.17   Mean   : 1.637  
##  3rd Qu.: 4.000   3rd Qu.: 1.000   3rd Qu.: 1.00   3rd Qu.: 2.000  
##  Max.   :16.000   Max.   :15.500   Max.   :16.00   Max.   :42.000  
##                   NA's   :14593                    NA's   :14777   
##   amenities             price          minimum_nights    maximum_nights     
##  Length:37765       Min.   :     8.0   Min.   :   1.00   Min.   :1.000e+00  
##  Class :character   1st Qu.:    86.0   1st Qu.:  30.00   1st Qu.:1.500e+02  
##  Mode  :character   Median :   150.0   Median :  30.00   Median :3.650e+02  
##                     Mean   :   221.1   Mean   :  29.16   Mean   :5.798e+04  
##                     3rd Qu.:   250.0   3rd Qu.:  30.00   3rd Qu.:1.125e+03  
##                     Max.   :100000.0   Max.   :1250.00   Max.   :2.147e+09  
##                     NA's   :14721                                           
##  minimum_minimum_nights maximum_minimum_nights minimum_maximum_nights
##  Min.   :   1.00        Min.   :   1.00        Min.   :1.000e+00     
##  1st Qu.:  30.00        1st Qu.:  30.00        1st Qu.:3.610e+02     
##  Median :  30.00        Median :  30.00        Median :7.300e+02     
##  Mean   :  28.74        Mean   :  37.22        Mean   :5.807e+04     
##  3rd Qu.:  30.00        3rd Qu.:  30.00        3rd Qu.:1.125e+03     
##  Max.   :1250.00        Max.   :1250.00        Max.   :2.147e+09     
##  NA's   :1              NA's   :1              NA's   :1             
##  maximum_maximum_nights minimum_nights_avg_ntm maximum_nights_avg_ntm
##  Min.   :1.000e+00      Min.   :   1.00        Min.   :1.000e+00     
##  1st Qu.:3.650e+02      1st Qu.:  30.00        1st Qu.:3.650e+02     
##  Median :1.125e+03      Median :  30.00        Median :9.350e+02     
##  Mean   :1.650e+06      Mean   :  29.75        Mean   :1.075e+06     
##  3rd Qu.:1.125e+03      3rd Qu.:  30.00        3rd Qu.:1.125e+03     
##  Max.   :2.147e+09      Max.   :1250.00        Max.   :2.147e+09     
##  NA's   :1              NA's   :1              NA's   :1             
##  calendar_updated has_availability number_of_reviews number_of_reviews_ltm
##  Mode:logical     Mode :logical    Min.   :   0.0    Min.   :   0.000     
##  NA's:37765       FALSE:289        1st Qu.:   0.0    1st Qu.:   0.000     
##                   TRUE :32071      Median :   3.0    Median :   0.000     
##                   NA's :5405       Mean   :  24.9    Mean   :   3.993     
##                                    3rd Qu.:  21.0    3rd Qu.:   3.000     
##                                    Max.   :1915.0    Max.   :1568.000     
##                                                                           
##   first_review         last_review         review_scores_rating
##  Min.   :2009-05-25   Min.   :2011-05-12   Min.   :0.000       
##  1st Qu.:2017-07-26   1st Qu.:2020-03-13   1st Qu.:4.650       
##  Median :2020-10-10   Median :2023-09-03   Median :4.850       
##  Mean   :2020-01-30   Mean   :2022-04-22   Mean   :4.727       
##  3rd Qu.:2022-11-06   3rd Qu.:2024-04-26   3rd Qu.:5.000       
##  Max.   :2024-07-04   Max.   :2024-07-05   Max.   :5.000       
##  NA's   :11751        NA's   :11751        NA's   :11751       
##  review_scores_accuracy review_scores_cleanliness review_scores_checkin
##  Min.   :0.000          Min.   :0.000             Min.   :0.000        
##  1st Qu.:4.710          1st Qu.:4.530             1st Qu.:4.810        
##  Median :4.900          Median :4.810             Median :4.950        
##  Mean   :4.769          Mean   :4.657             Mean   :4.834        
##  3rd Qu.:5.000          3rd Qu.:5.000             3rd Qu.:5.000        
##  Max.   :5.000          Max.   :5.000             Max.   :5.000        
##  NA's   :11768          NA's   :11758             NA's   :11772        
##  review_scores_communication review_scores_location review_scores_value
##  Min.   :0.000               Min.   :0.000          Min.   :0.000      
##  1st Qu.:4.820               1st Qu.:4.650          1st Qu.:4.540      
##  Median :4.960               Median :4.850          Median :4.770      
##  Mean   :4.829               Mean   :4.744          Mean   :4.649      
##  3rd Qu.:5.000               3rd Qu.:5.000          3rd Qu.:4.940      
##  Max.   :5.000               Max.   :5.000          Max.   :5.000      
##  NA's   :11763               NA's   :11775          NA's   :11774      
##    license          instant_bookable calculated_host_listings_count
##  Length:37765       Mode :logical    Min.   :  1.00                
##  Class :character   FALSE:30249      1st Qu.:  1.00                
##  Mode  :character   TRUE :7516       Median :  2.00                
##                                      Mean   : 53.76                
##                                      3rd Qu.:  8.00                
##                                      Max.   :842.00                
##                                                                    
##  calculated_host_listings_count_entire_homes
##  Min.   :  0.00                             
##  1st Qu.:  0.00                             
##  Median :  1.00                             
##  Mean   : 28.18                             
##  3rd Qu.:  2.00                             
##  Max.   :842.00                             
##                                             
##  calculated_host_listings_count_private_rooms
##  Min.   :  0.00                              
##  1st Qu.:  0.00                              
##  Median :  1.00                              
##  Mean   : 23.49                              
##  3rd Qu.:  2.00                              
##  Max.   :691.00                              
##                                              
##  calculated_host_listings_count_shared_rooms reviews_per_month
##  Min.   : 0.00000                            Min.   :  0.010  
##  1st Qu.: 0.00000                            1st Qu.:  0.090  
##  Median : 0.00000                            Median :  0.320  
##  Mean   : 0.07774                            Mean   :  0.906  
##  3rd Qu.: 0.00000                            3rd Qu.:  1.130  
##  Max.   :13.00000                            Max.   :103.530  
##                                              NA's   :11751    
##  price_per_accommodates neighbourhood_group occupancy_level      occupancy     
##  Min.   :     1.60      Length:37765        Length:37765       Min.   :0.0000  
##  1st Qu.:    40.83      Class :character    Class :character   1st Qu.:0.4139  
##  Median :    60.00      Mode  :character    Mode  :character   Median :0.7939  
##  Mean   :    84.55                                             Mean   :0.6744  
##  3rd Qu.:    94.00                                             3rd Qu.:1.0000  
##  Max.   :100000.00                                             Max.   :1.0000  
##  NA's   :14721                                                                 
##     revenue        
##  Min.   :    0.00  
##  1st Qu.:   24.00  
##  Median :   65.81  
##  Mean   :  102.08  
##  3rd Qu.:  131.55  
##  Max.   :22592.59  
##  NA's   :14721
print("Neighborhoods GeoJSON Structure:")
## [1] "Neighborhoods GeoJSON Structure:"
summary(neighborhoods)
##  neighbourhood      neighbourhood_group          geometry  
##  Length:233         Length:233          MULTIPOLYGON :233  
##  Class :character   Class :character    epsg:4326    :  0  
##  Mode  :character   Mode  :character    +proj=long...:  0
# Step 2: Clean Airbnb Data

# Avoid naming conflicts by resolving duplicate column names
colnames(airbnb_data)[16] <- "host_neighborhood"  # Rename the first instance of "neighborhood"
colnames(airbnb_data)[57] <- "neighborhood_group" # Rename the second instance for clarity

# Clean and rename the necessary columns in the Airbnb dataset
airbnb_data <- airbnb_data %>%
  rename(
    price = price,                        # "price" is already correct
    occupancy_rate = occupancy,           # Rename "occupancy" to "occupancy_rate"
    neighborhood = neighborhood_group     # Use the clarified "neighborhood_group"
  )

# Remove missing or invalid values
airbnb_data <- airbnb_data %>%
  drop_na(price, occupancy_rate, neighborhood) %>% # Remove rows with missing key columns
  filter(price > 0 & occupancy_rate > 0 & occupancy_rate <= 1) # Valid ranges for price and occupancy

# Calculate estimated revenue (price * occupancy_rate * 30 days)
airbnb_data <- airbnb_data %>%
  mutate(estimated_revenue = price * occupancy_rate * 30)

# Inspect cleaned dataset
print("Cleaned Airbnb Data Summary:")
## [1] "Cleaned Airbnb Data Summary:"
#summary(airbnb_data)

# Step 3: Merge 
neighborhood_revenue <- airbnb_data %>%
  group_by(neighborhood) %>%
  summarise(avg_revenue = mean(estimated_revenue, na.rm = TRUE))

# Ensure the correct column names for merging
colnames(neighborhoods)  # Inspect column names in GeoJSON
## [1] "neighbourhood"       "neighbourhood_group" "geometry"
colnames(neighborhood_revenue)  # Inspect column names in revenue data
## [1] "neighborhood" "avg_revenue"
# Rename or align columns if necessary
neighborhoods <- neighborhoods %>%
  rename(neighborhood = neighbourhood_group)  # Replace with the correct column name

# Check for unmatched neighborhood names
unmatched <- setdiff(neighborhood_revenue$neighborhood, neighborhoods$neighborhood)
print(unmatched)  # Print any mismatches
## character(0)
# Perform the join
neighborhoods <- neighborhoods %>%
  left_join(neighborhood_revenue, by = "neighborhood")

# Inspect the merged data
print("Merged Dataset Preview:")
## [1] "Merged Dataset Preview:"
head(neighborhoods)
## Simple feature collection with 6 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.92828 ymin: 40.61488 xmax: -73.76671 ymax: 40.87344
## Geodetic CRS:  WGS 84
##      neighbourhood neighborhood avg_revenue                       geometry
## 1        Bayswater       Queens    1700.632 MULTIPOLYGON (((-73.76671 4...
## 2         Allerton        Bronx    1401.192 MULTIPOLYGON (((-73.8486 40...
## 3      City Island        Bronx    1401.192 MULTIPOLYGON (((-73.78282 4...
## 4 Ditmars Steinway       Queens    1700.632 MULTIPOLYGON (((-73.9016 40...
## 5       Ozone Park       Queens    1700.632 MULTIPOLYGON (((-73.83754 4...
## 6          Fordham        Bronx    1401.192 MULTIPOLYGON (((-73.88303 4...
# Step 4: Create Visualizations
# Visualization 1: Map of Average Revenue by Neighborhood
revenue_map <- ggplot(data = neighborhoods) +
  geom_sf(aes(fill = avg_revenue), color = "white", size = 0.2) +
  scale_fill_viridis_c(name = "Avg Revenue (USD)", option = "plasma") +
  labs(title = "Average Airbnb Revenue by Neighborhood",
       subtitle = "New York City",
       caption = "Source: NYC Airbnb") +
  theme_minimal()
print(revenue_map)

# Visualization 2: Scatter Plot of Price vs Occupancy Rate
price_occupancy_plot <- ggplot(data = airbnb_data, aes(x = price, y = occupancy_rate)) +
  geom_point(alpha = 0.5, color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(title = "Price vs Occupancy Rate",
       x = "Nightly Price (USD)",
       y = "Occupancy Rate",
       caption = "Source: NYC Airbnb") +
  theme_minimal()
print(price_occupancy_plot)
## `geom_smooth()` using formula = 'y ~ x'

# Visualization 3: Revenue Distribution by Room Type
revenue_roomtype_plot <- ggplot(data = airbnb_data, aes(x = room_type, y = estimated_revenue, fill = room_type)) +
  geom_boxplot(alpha = 0.7) +
  scale_fill_viridis_d(name = "Room Type") +
  labs(title = "Revenue Distribution by Room Type",
       x = "Room Type",
       y = "Estimated Revenue (USD)",
       caption = "Source: NYC Airbnb") +
  theme_minimal()
print(revenue_roomtype_plot)

# Step 5: Additional Insights
# Summary statistics
top_neighborhoods <- neighborhood_revenue %>%
  arrange(desc(avg_revenue)) %>%
  head(10)
print("Top 10 Neighborhoods by Average Revenue:")
## [1] "Top 10 Neighborhoods by Average Revenue:"
print(top_neighborhoods)
## # A tibble: 5 × 2
##   neighborhood  avg_revenue
##   <chr>               <dbl>
## 1 Manhattan           4477.
## 2 Brooklyn            2697.
## 3 Staten Island       1735.
## 4 Queens              1701.
## 5 Bronx               1401.
# Save the visualizations to files
ggsave("C:/Users/Dell/Downloads/revenue_map.png", revenue_map, width = 10, height = 8, dpi = 300)
ggsave("C:/Users/Dell/Downloads/price_occupancy_plot.png", price_occupancy_plot, width = 10, height = 8, dpi = 300)
## `geom_smooth()` using formula = 'y ~ x'
ggsave("C:/Users/Dell/Downloads/revenue_roomtype_plot.png", revenue_roomtype_plot, width = 10, height = 8, dpi = 300)

This piece of code begins by loading the GeoJSON data and creating an interactive map using the leaflet library, where neighborhoods are colored based on their unique identifier and popups display neighborhood names. Subsequently, it generates a static map using ggplot2 where specific neighborhoods (“SoHo”, “Chinatown”, “Greenwich Village”) are highlighted in red, while the rest are in gray, providing a comparative visual representation.

Additionally, the code creates a map with a light basemap using ggspatial and ggplot2, overlaying the neighborhood polygons to provide geographical context. Finally, it produces a basic static map using ggplot2 with neighborhoods colored using the viridis palette, but without a legend, offering a clear visual of neighborhood boundaries. Each map is printed to the console, allowing for a side-by-side comparison of different visualization techniques.

library(geojsonsf)
## Warning: package 'geojsonsf' was built under R version 4.4.3
library(leaflet)
## Warning: package 'leaflet' was built under R version 4.4.3
#install.packages("prettymapr")

# Load GeoJSON file
nyc_geo <- geojson_sf("C:/Users/Dell/Downloads/neighbourhoods.geojson")

# Corrected Leaflet Map
leaflet_map <- leaflet(nyc_geo) %>%
  addPolygons(
    fillColor = ~colorNumeric("viridis", as.numeric(as.factor(neighbourhood)))(as.numeric(as.factor(neighbourhood))),
    fillOpacity = 0.7,
    weight = 1,
    color = "black",
    popup = ~neighbourhood
  )
print(leaflet_map)

# 2. Specific Neighborhood Highlighting 
highlighted_neighborhoods <- c("SoHo", "Chinatown", "Greenwich Village")
nyc_geo$highlight <- ifelse(nyc_geo$neighbourhood %in% highlighted_neighborhoods, "Highlighted", "Other")

highlight_plot <- nyc_geo %>%
  ggplot(aes(fill = highlight)) +
  geom_sf() +
  scale_fill_manual(values = c("Highlighted" = "red", "Other" = "gray"), name = "Neighborhoods") +
  labs(title = "NYC Neighborhoods with Highlights") +
  theme_minimal()
print(highlight_plot)

# 3. Basemap with ggspatial
library(geojsonsf)
library(ggplot2)
library(ggspatial)
## Warning: package 'ggspatial' was built under R version 4.4.3
basemap_plot <- nyc_geo %>%
  ggplot() +
  annotation_map_tile(type = "cartolight") + # Add a light basemap
  geom_sf(aes(fill = neighbourhood), alpha = 0.7) +
  guides(fill = guide_none()) +
  labs(title = "NYC Neighborhoods with Basemap") +
  theme_minimal()
print(basemap_plot)
## Loading required namespace: raster
## Zoom: 9

# 4. Original Plot (with legend removed)
original_plot <- nyc_geo %>%
  ggplot(aes(fill = neighbourhood)) +
  geom_sf() +
  guides(fill = guide_none()) +
  scale_fill_viridis_d(name = "Neighborhoods") +
  labs(
    title = "NYC Neighborhoods",
    subtitle = "Basic Map Visualization",
    caption = "Source: NYC GeoJSON"
  ) +
  theme_minimal()
print(original_plot)

This chunk of code analyzes Airbnb listings in New York City, focusing on room types, neighborhood prices, and borough prices. It begins by loading the Airbnb dataset from a CSV file and displaying the first 10 rows for initial inspection. Then, it generates a bar chart showing the count of each room type (“Entire home/apt,” “Private room,” etc.), providing insight into the distribution of listing types.

Following this, the code calculates the average Airbnb listing price for each neighborhood and creates a horizontal bar chart displaying the 30 most expensive neighborhoods, ordered by average price. Finally, it analyzes borough prices by calculating the average price for each borough and generates a similar horizontal bar chart for the top 10 most expensive boroughs, offering a comparative view of pricing across different areas of New York City. Each chart is generated using ggplot2 and the data is manipulated using dplyr.

library(ggplot2)
library(dplyr)

# Load the Airbnb dataset
airbnb_file <- "C:/Users/Dell/Downloads/nyc_airbnb_listings.csv"
airbnb_data <- read.csv(airbnb_file)

# Display the first 10 rows for verification
#head(airbnb_data, 10)

# Create a bar chart of room_type counts
room_type_counts <- airbnb_data %>%
  group_by(room_type) %>%
  summarise(count = n())

ggplot(room_type_counts, aes(x = room_type, y = count, fill = room_type)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Airbnb Listings by Room Type",
    x = "Room Type",
    y = "Number of Listings",
    fill = "Room Type"
  ) +
  theme_minimal()

# Most Expensive NYC Neighborhoods

# Calculate average price per neighborhood
neighborhood_prices <- airbnb_data %>%
  group_by(neighborhood) %>%  # Use the correct column name: "neighborhood"
  summarise(avg_price = mean(price, na.rm = TRUE)) %>%
  arrange(desc(avg_price)) %>%
  head(30)

# Create bar chart of average prices
ggplot(neighborhood_prices, aes(x = reorder(neighborhood, avg_price), y = avg_price, fill = avg_price)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(
    title = "Most Expensive NYC Neighborhoods (Airbnb)",
    x = "Neighborhood",
    y = "Average Price (USD)",
    fill = "Average Price"
  ) +
  theme_minimal()

#Most expensive boroughs 
# Calculate average price per neighborhood group
borough_prices <- airbnb_data %>%
  group_by(neighbourhood_group_cleansed) %>% # Use the borough column
  summarise(avg_price = mean(price, na.rm = TRUE)) %>%
  arrange(desc(avg_price)) %>%
  head(10)

# Create bar chart of average prices by borough
ggplot(borough_prices, aes(x = reorder(neighbourhood_group_cleansed, avg_price), y = avg_price, fill = avg_price)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(
    title = "Top Most Expensive NYC Boroughs (Airbnb)",
    x = "Borough",
    y = "Average Price (USD)",
    fill = "Average Price"
  ) +
  theme_minimal()

Factors Differentiating Profitable and Unprofitable Airbnbs

This analysis identifies key factors differentiating profitable and unprofitable Airbnb listings in New York City. It highlights that excessively high prices lead to lower occupancy rates, suggesting a need for market-aligned pricing and improved listing quality to attract guests. Profitable neighborhoods like SoHo and Greenwich Village, characterized by proximity to attractions and cultural hubs, offer higher revenue potential, favoring properties with good location, flexible accommodation, and appealing aesthetics.

Furthermore, the analysis reveals that “entire home/apartment” and “hotel room” listings generate the most revenue, indicating a strong market preference for privacy and full amenities. Conversely, shared rooms underperform but could cater to niche markets with unique selling points. The analysis recommends adjusting pricing, enhancing guest experience, focusing on positive reviews, and investing in properties suitable for high-demand room types to maximize profitability.

Strategies for Hosts and Investors

The provided strategies offer tailored advice for Airbnb hosts and investors in New York City, drawing on insights from visualized data. For existing hosts, the focus is on optimizing pricing through dynamic tools to match market demand, enhancing guest experiences with unique local offerings, and improving marketing through compelling visuals and detailed descriptions, these tactics aim to increase occupancy and revenue by making listings more attractive and competitive.

Prospective hosts are advised to target high-demand neighborhoods like SoHo or Greenwich Village, where average prices and demand are higher, and to invest in properties with features that align with guest preferences, such as ample natural light and spacious layouts. Real estate investors are encouraged to acquire properties in areas with established revenue potential and proximity to attractions, with a focus on multi-family units to maximize listing opportunities and overall returns within the lucrative NYC Airbnb market.

Conclusions

Based on the analysis, Airbnb profitability in New York City is significantly influenced by pricing strategies. Listings with rates aligned with market demand tend to achieve higher occupancy, demonstrating the importance of competitive and dynamic pricing. Moreover, underperforming listings can significantly boost their revenue through strategic improvements in amenities, enhanced guest experiences, and targeted marketing efforts. These factors highlight the necessity for hosts to actively manage and optimize their offerings to remain competitive.

Neighborhood selection plays a crucial role in Airbnb success. High-revenue areas like SoHo, Chinatown, and Greenwich Village, characterized by their proximity to cultural attractions and strong guest demand, present prime opportunities for hosts and investors. Prioritizing properties in these locations can capitalize on established profitability trends and ensure a steady stream of bookings. The data suggests that understanding the unique appeal of these neighborhoods is essential for maximizing returns.

Room type also impacts revenue generation. Entire homes/apartments and hotel rooms consistently achieve the highest revenue, making them the most lucrative categories for hosts and investors. While shared and private rooms underperform, they can be optimized by targeting niche markets, such as budget-conscious travelers or students, with unique offerings and focused marketing. This highlights the importance of understanding market segmentation and tailoring listings to specific guest needs.

Finally, strategic investment approaches are crucial for both potential hosts and investors. Hosts should focus on dynamic pricing, amenity enhancements, and leveraging unique selling points to attract bookings. Investors should prioritize acquiring properties in high-demand neighborhoods with features like spacious layouts and appealing aesthetics. Both parties should consider innovative approaches, such as offering personalized guest experiences or eco-friendly accommodations, to differentiate their listings and attract a wider range of guests. This comprehensive strategy ensures long-term profitability and success in the competitive NYC Airbnb market.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.