Airbnb - Melbourne

Author

Tony Wang

Airbnb - Melbourne

Overview

This analysis explores the dynamics of the Melbourne Airbnb market using a dataset of over 15,000 listings. The study focuses on three key pillars: geographic density, accessibility to the Central Business District (CBD), and booking requirements. By cleaning and validating spatial data, this report provides insights into where supply is concentrated and how listing requirements shift as one moves from the urban core to the suburbs.

Key Findings:

  • CBD Dominance: Supply follows a “power law” decay; the vast majority of listings are concentrated within 5km of the city center.

  • Short-term Flexibility: Despite Melbourne’s diverse housing market, the “weekend stay” (less than 2 nights) remains the dominant requirement for hosts.

  • Market Maturity: The high density of listings in specific postcodes suggests a mature, highly competitive short-term rental market in the inner city.

Load library

Load data

Show the code
df <- readr::read_csv("Airbnb dataset - Melbourne.csv")
head(df)
# A tibble: 6 × 20
     id name        host_id host_name neighbourhood_group neighbourhood latitude
  <dbl> <chr>       <chr>   <chr>     <chr>               <chr>         <chr>   
1 10803 Room in De… 38901   Lindsay   <NA>                Moreland      -37.766…
2 12936 St Kilda 1… 50121   The A2C … <NA>                Port Phillip  -37.859…
3 41836 CLOSE TO C… 182833  Diana     <NA>                Darebin       -37.697…
4 43429 Tranquil J… 189684  Allan     <NA>                Monash        -37.899…
5 44082 Queen Room… 193031  Vicki     <NA>                Frankston     -38.147…
6 47100 Cosy, cute… 212071  Loren     <NA>                Yarra         -37.818…
# ℹ 13 more variables: longitude <chr>, room_type <chr>, price <chr>,
#   minimum_nights <chr>, number_of_reviews <chr>, last_review <chr>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <dbl>,
#   ...19 <lgl>, ...20 <lgl>
Show the code
# Observe the structure of the dataframe
glimpse(df)
Rows: 25,587
Columns: 20
$ id                             <dbl> 10803, 12936, 41836, 43429, 44082, 4710…
$ name                           <chr> "Room in Deco Apartment, Brunswick East…
$ host_id                        <chr> "38901", "50121", "182833", "189684", "…
$ host_name                      <chr> "Lindsay", "The A2C Team", "Diana", "Al…
$ neighbourhood_group            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ neighbourhood                  <chr> "Moreland", "Port Phillip", "Darebin", …
$ latitude                       <chr> "-37.76606", "-37.85999", "-37.69761", …
$ longitude                      <chr> "144.97951", "144.97662", "145.00066", …
$ room_type                      <chr> "Private room", "Entire home/apt", "Pri…
$ price                          <chr> "54", NA, NA, "128", "79", "116", "257"…
$ minimum_nights                 <chr> "5", "3", "7", "2", "5", "4", "2", "3",…
$ number_of_reviews              <chr> "204", "42", "157", "269", "65", "180",…
$ last_review                    <chr> "23/04/2025", "15/03/2020", "22/08/2018…
$ reviews_per_month              <dbl> 1.35, 0.23, 0.89, 1.52, 0.37, 1.00, 2.8…
$ calculated_host_listings_count <dbl> 1, 10, 2, 3, 8, 1, 1, 10, 1, 8, 1, 10, …
$ availability_365               <dbl> 148, 0, 0, 165, 127, 16, 193, 0, 364, 1…
$ number_of_reviews_ltm          <dbl> 14, 0, 0, 10, 6, 1, 52, 0, 23, 4, 12, 0…
$ license                        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ...19                          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ...20                          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Find duplicate records

Show the code
# Find duplicate records
duplicates <- df %>% 
  filter(duplicated(.))

# Count how many duplicates there are
nrow(duplicates)
[1] 0
Show the code
# Remove exact duplicates and save it back to 'df'
df <- df %>% 
  distinct()

# Check your new total count
total_rooms_clean <- nrow(df)
print(total_rooms_clean)
[1] 25587

There is no duplicate

Find records with NA

Show the code
# This keeps the rows as long as the essential info is there
df <- df %>% 
  drop_na(price, neighbourhood, latitude, longitude, room_type, price, reviews_per_month, calculated_host_listings_count, availability_365)

# Now check the count again
nrow(df)
[1] 15702

Now only 15,702 rows left after cleaning the records with NA

Rooms analysis

Total rooms

Show the code
total_rooms <- nrow(df)
print(total_rooms)
[1] 15702

Room types

Show the code
# Look at the unique room types
unique(df$room_type)
 [1] "Private room"    "Entire home/apt" "93"              "Hotel room"     
 [5] "Shared room"     "194"             "53"              "63"             
 [9] "51"              "147"             "514"             "139"            
[13] "808"             "122"             "80"              "90"             
[17] "107"             "188"             "195"             "73"             
[21] "62"              "56"              "114"             "99"             
[25] "270"             "126"             "54"              "109"            
[29] "174"             "141"             "134"             "50"             
[33] "115"             "138"             "137"             "120"            
[37] "112"             "220"             "117"             "160"            
[41] "618"             "128"             "169"             "102"            
[45] "45"              "60"              "97"              "32"             
[49] "125"             "168"             "118"             "960"            
[53] "113"             "357"             "41"              "47"             
[57] "44"              "39"              "55"              "86"             
[61] "394"             "178"             "260"             "145"            
[65] "43"              "64"              "71"              "156"            
[69] "267"             "111"             "100"             "79"             
[73] "123"             "548"             "88"              "69"             
[77] "105"             "87"              "74"              "94"             
[81] "186"             "61"              "205"             "573"            

There are basically four types of rooms available, with the rest represented by numbers that are not relevant.

Number of rooms by room type

Show the code
# 1. Define the only valid room types allowed
valid_types <- c("Private room", "Entire home/apt", "Hotel room", "Shared room")

# 2. Filter the dataframe
df_clean <- df %>%
  filter(room_type %in% valid_types)

# 3. Now try your count again
room_type_counts <- df_clean %>%
  group_by(room_type) %>%
  summarise(num_rooms = n())

print(room_type_counts)
# A tibble: 4 × 2
  room_type       num_rooms
  <chr>               <int>
1 Entire home/apt     12505
2 Hotel room             27
3 Private room         2978
4 Shared room            83
Show the code
# Plot the rooms by type
ggplot(room_type_counts, aes(x = room_type, y = num_rooms, fill = room_type)) +
  geom_col() +
  geom_text(aes(label = paste0(num_rooms)), 
            size = 3, vjust = -1, check_overlap = TRUE) +
  labs(
    title = "Number of rooms by room types",
    x = "Room Type",
    y = "Number of rooms"
  )

Insight: Market Composition Dominance The Melbourne Airbnb market is heavily weighted toward Entire home/apt listings, which account for over 80% of the total supply (~12,505 listings). In contrast, Private rooms make up a much smaller segment (~2,978), while Shared and Hotel rooms are nearly negligible.

Strategic Takeaway: This suggests that Melbourne is primarily a “self-contained” travel market. The high volume of entire homes indicates that the platform is being used by travelers seeking privacy and long-term amenities (like kitchens and laundry), typical of family vacations or business trips. For a new host, the “Private room” category represents a potential niche, but the “Entire home” category is clearly where the highest consumer demand and competition reside.

Rooms by neighbourhood

Show the code
# 1. Clean the data types
airbnb_clean <- df %>%
  mutate(
    # Remove any non-numeric characters from price (like $ or ,)
    price_num = as.numeric(gsub("[^0-9.]", "", price)),
    # Convert coordinates to numeric
    lat_num = as.numeric(latitude),
    long_num = as.numeric(longitude)
  ) %>%
  # Filter out rows where coordinates or price are missing
  filter(!is.na(lat_num), !is.na(long_num))

# 2. Aggregate by Neighbourhood
map_summary <- airbnb_clean %>%
  group_by(neighbourhood) %>%
  summarize(
    num_rooms = n(),
    avg_price = mean(price_num, na.rm = TRUE),
    lat = mean(lat_num),
    long = mean(long_num)
  )

# 3. Plot again
ggplot(map_summary, aes(x = long, y = lat)) +
  geom_point(aes(size = num_rooms, color = avg_price), alpha = 0.7) +
  geom_text(aes(label = paste0(neighbourhood, "\n", num_rooms, " rms")), 
            size = 3, vjust = -1, check_overlap = TRUE) +
  scale_color_gradient(low = "blue", high = "red") +
  theme_minimal() +
  labs(title = "Melbourne Airbnb Distribution",
       size = "Room Count",
       color = "Avg Price ($)")

View on the interactive map

Show the code
# 1. Clean data
map_summary <- df %>%
  mutate(
    price_num = as.numeric(gsub("[^0-9.]", "", price)),
    lat_num = as.numeric(latitude),
    long_num = as.numeric(longitude)
  ) %>%
  group_by(neighbourhood) %>%
  summarize(
    num_rooms = n(),
    avg_price = mean(price_num, na.rm = TRUE),
    lat = mean(lat_num, na.rm = TRUE),
    long = mean(long_num, na.rm = TRUE)
  ) %>%
  filter(!is.na(lat))

# 2. Create the Leaflet Map
leaflet(map_summary) %>%
  addTiles() %>%  # Adds the standard OpenStreetMap background
  addCircleMarkers(
    lng = ~long, lat = ~lat,
    radius = ~sqrt(num_rooms) * 2, # Scale bubble size
    color = ~ifelse(avg_price > 200, "red", "blue"), # Simple color logic
    stroke = FALSE, fillOpacity = 0.6,
    label = ~paste0(neighbourhood, ": ", num_rooms, " rooms, avg $", round(avg_price))
  )

Insight: The spatial visualization confirms a “hub-and-spoke” model. While the CBD is the densest cluster, secondary clusters are visible in coastal areas (e.g., St Kilda) and inner-northern suburbs (e.g., Fitzroy/Carlton). This map identifies “hot zones” where competition is highest, providing a roadmap for potential investors to identify underserved neighborhoods or high-traffic areas.

Distance from CBD

Show the code
# 1. Ensure data is clean and distance is calculated
airbnb_density <- df %>%
  mutate(
    lat_num = as.numeric(as.character(latitude)),
    long_num = as.numeric(as.character(longitude)),
    price_num = as.numeric(as.character(price)),
    mini_nights = as.numeric(as.character(minimum_nights)),
  ) %>%
  filter(!is.na(lat_num), !is.na(long_num), !is.na(price_num)) %>%
  rowwise() %>%
  mutate(
    # Calculate distance from CBD
    dist_cbd = distHaversine(c(long_num, lat_num), c(144.9631, -37.8136)) / 1000
  ) %>% 
  ungroup()

# 2. Plot: Number of Rooms vs. Distance
ggplot(airbnb_density, aes(x = dist_cbd)) +
  # 1. The Histogram
  geom_histogram(binwidth = 5, fill = "steelblue", color = "white") +
  # 2. The Text Labels
  stat_bin(binwidth = 5, geom = "text", 
           aes(label = paste0(after_stat(count), 
                              " (", 
                              round(after_stat(count) / sum(after_stat(count)) * 100, 1), 
                              "%)")), 
           vjust = -0.5, 
           size = 3) +
  # 3. The Density Line (using linewidth instead of size)
  geom_density(aes(y = after_stat(count)), color = "red", linewidth = 1) +
  # 4. Labels and Titles
  labs(
    title = "Concentration of Rooms by Distance from Melbourne CBD",
    subtitle = paste("Analysis of", nrow(airbnb_density), "listings"),
    x = "Distance from CBD (km)",
    y = "Number of Rooms (Listings)"
  ) +
  # 5. The Window
  coord_cartesian(xlim = c(0, 40)) + 
  theme_minimal()

Insight: There is a sharp “spatial decay” in listing density as distance from the CBD increases. The 0–5km bracket contains the highest volume of listings, representing the core tourism and business hub. Beyond 15km, the supply drops significantly, indicating that the Melbourne Airbnb market is primarily an urban-centric phenomenon rather than a suburban or rural one. This concentration suggests that proximity to public transport and city amenities is the primary driver for host participation.

Price Analysis

Price summary

Show the code
price_summary <- airbnb_clean %>%
  # Clean price to numeric
  mutate(price_num = as.numeric(gsub("[^0-9.]", "", as.character(price)))) %>%
  filter(!is.na(price_num)) %>%
  summarise(
    Min = min(price_num),
    Q1 = quantile(price_num, 0.25),
    Median = median(price_num),
    Mean = mean(price_num),
    Q3 = quantile(price_num, 0.75),
    Max = max(price_num),
    Total_Listings = n()
  )

print(price_summary)
# A tibble: 1 × 7
    Min    Q1 Median  Mean    Q3    Max Total_Listings
  <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>          <int>
1    17   108    153  217.   224 118611          15593

Insight: The median price is much lower than the average price, suggesting the average price is distorted by a few high-price listings. The maximum price is way higher than Q3 price, which is clearly an outline

Plot the price distribution by room types

Show the code
# This compares prices across the different Room Types
ggplot(airbnb_clean, aes(x = room_type, y = price_num, fill = room_type)) +
  geom_boxplot() +
  coord_cartesian(ylim = c(0, 500)) +
  labs(title = "Price Distribution by Room Type", x = "Room Category")

Insight: Pricing Tiers and Market Volatility This visualization reveals distinct pricing “stratums” across Melbourne. While Entire homes/apartments have a higher median price (approx. $175–$200), the most striking feature is the high number of outliers (the black dots stretching to $500+). This indicates a luxury segment within the “Entire home” category that significantly exceeds standard market rates.

Key Observations:

  • Hotel Rooms: Though few in number (as seen in the previous plot), they command the highest median price, suggesting these are likely boutique or high-end establishments.

  • Private vs. Shared: Private rooms offer a highly stable, affordable tier (median ~$75), while Shared rooms represent the “budget floor” of the Melbourne market with the lowest price variance.

  • Market Premium: The “Interquartile Range” (the height of the boxes) is much larger for Entire homes than for Private rooms, showing that Entire home pricing is much more sensitive to factors like location, views, and luxury amenities.

Price by distance

Show the code
# 1. Prepare the plotting groups
# We create 'dist_group' to turn the continuous distance into 5km categories
plot_data <- airbnb_density %>%
  mutate(dist_group = cut(dist_cbd, 
                          breaks = seq(0, 40, by = 5), 
                          include.lowest = TRUE,
                          labels = c("0-5km", "5-10km", "10-15km", "15-20km", 
                                     "20-25km", "25-30km", "30-35km", "35-40km"))) %>%
  filter(!is.na(dist_group))

# 2. Plot Mean Price against these Distance Groups
ggplot(plot_data, aes(x = dist_group, y = price_num)) +
  # Calculate the MEAN price for each bar
  stat_summary(fun = mean, geom = "col", fill = "steelblue", color = "white") +
  
  # Add the Dollar Label on top of each bar
  stat_summary(fun = mean, geom = "text", 
               aes(label = paste0("$", round(after_stat(y), 0))),
               vjust = -0.5, size = 4, fontface = "bold") +
  
  labs(
    title = "Average Airbnb Price by Distance from Melbourne CBD",
    subtitle = "Calculated using mean price per 5km distance bracket",
    x = "Distance from CBD",
    y = "Mean Price ($)"
  ) +
  # Adjust y-axis to give room for labels
  coord_cartesian(ylim = c(0, 500)) + 
  theme_minimal()

Insight: The average price is the highest between 5km and10m away from CBD, and generally trending down further away from the CBD. However price picks up from 30km to 40km

Neighbourhoods that are 35km to 40km from CBD

Show the code
airbnb_density %>%
  filter(dist_cbd > 35 & dist_cbd <= 40) %>%
  group_by(neighbourhood) %>%
  summarise(
    count = n(),
    avg_price = mean(price_num, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_price))
# A tibble: 8 × 3
  neighbourhood count avg_price
  <chr>         <int>     <dbl>
1 Cardinia          1      657 
2 Yarra Ranges    220      358.
3 Whittlesea        4      289 
4 Hume              7      253 
5 Casey            50      213.
6 Melton           43      201.
7 Nillumbik        10      198.
8 Frankston       106      190.

Insight: Luxury and Lifestyle Premiums in the Periphery While listing volume decreases significantly in the 35km–40km range, the pricing remains remarkably high in specific LGAs (Local Government Areas). The Cardinia and Yarra Ranges neighborhoods lead the group with average prices of $657 and $358 respectively.

Key Observations:

  • The “Lifestyle” Effect: The high average in the Yarra Ranges (despite a large count of 220 listings) suggests a high concentration of premium “lifestyle” properties or holiday rentals (e.g., vineyards, cottages) rather than standard residential apartments.

  • High-Volume Clusters: Frankston (106 listings) and Casey (50 listings) represent the primary suburban hubs at this distance. Their lower average prices (~$190–$213) reflect a more competitive, residential-style market compared to the Yarra Ranges.

Median vs Mean price

Show the code
# Using the same plot_data we created earlier
ggplot(plot_data, aes(x = dist_group, y = price_num)) +
  # Change 'fun' to median
  stat_summary(fun = median, geom = "col", fill = "steelblue", color = "white") +
  
  # Update text label to show the Median
  stat_summary(fun = median, geom = "text", 
               aes(label = paste0("$", round(after_stat(y), 0))),
               vjust = -0.5, size = 4, fontface = "bold") +
  
  labs(
    title = "Median Airbnb Price by Distance from Melbourne CBD",
    subtitle = "Median is more robust against luxury outliers",
    x = "Distance from CBD",
    y = "Median Price (AUD)"
  ) +
  coord_cartesian(ylim = c(0, 400)) + 
  theme_minimal()

Insight: The median prices are lower than the average price across all the neighbourhoods, which reflects the fact that the average price is distorted by a few high price listings. For investors, this suggests that the 20-25km zone offers the highest competion for budget-conscious travelers, while the 35km+ zone is a specialized high-yield market for larger groups.

Availability

Distirbution of Annual Availability

Show the code
ggplot(airbnb_density, aes(x = availability_365)) +
  # 1. Histogram: Binning by 30 days (roughly one month)
  geom_histogram(binwidth = 30, fill = "steelblue", color = "white", boundary = 0) +
  
  # 2. Labels: Calculate count and percentage for each 30-day month block
  stat_bin(binwidth = 30, geom = "text", boundary = 0,
           aes(label = paste0(after_stat(count), 
                            " (", 
                            round(after_stat(count) / sum(after_stat(count)) * 100, 1), 
                            "%)")), 
           vjust = -0.5, 
           size = 3) +
  
  # 3. Density Line: Adjusted to match the count scale
  geom_density(aes(y = after_stat(density) * nrow(airbnb_density) * 30), 
               color = "red", size = 1) +
  
  labs(
    title = "Distribution of Annual Availability",
    subtitle = paste("Analysis of", nrow(airbnb_density), "listings in Melbourne"),
    x = "Days Available (out of 365)",
    y = "Number of Listings"
  ) +
  # 4. Zoom: Set X-axis to the full year (0-365 days)
  coord_cartesian(xlim = c(0, 365)) + 
  theme_minimal()

Insight: The Bifurcation of the Melbourne Market The distribution of annual availability (days per year the listing is active) reveals two distinct types of host behavior in Melbourne. Rather than a steady average, we see a “bimodal” distribution with significant peaks at both ends of the scale.

Key Observations:

  • The Professional Tier (300+ Days): The largest single group of listings (over 13%) is available for more than 300 days a year. This indicates a high volume of dedicated, full-time short-term rentals that are likely managed by professional agencies or investors rather than residents sharing their spare space.

  • The “Side-Hustle” Tier (60-90 Days): A secondary peak occurs around the 3-month mark. This likely represents residents who list their homes during major Melbourne events (like the Australian Open or Grand Prix) or during their own annual holidays.

  • The Mid-Range Gap: The “valleys” in the data suggest that few hosts are “casual” in the middle of the year. Most are either fully committed to the platform or only active during peak seasonal windows.

Strategic Takeaway: > For the Melbourne market, this high concentration of 300+ day availability suggests a mature industry where Airbnb competes directly with the long-term rental market for housing stock.

Distribution of Minimum Night Requirements

Show the code
ggplot(airbnb_density, aes(x = as.numeric(minimum_nights))) +
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white", boundary = 0.5) +
  
  stat_bin(binwidth = 1, geom = "text", boundary = 0.5,
           aes(label = ifelse(after_stat(count) > 500, after_stat(count), "")), 
           vjust = -0.7, size = 3) +
  
  # Density line (scaled for binwidth = 1)
  geom_density(aes(y = after_stat(density) * nrow(df)), 
               stat = "density", 
               color = "darkred", size = 1,
               adjust = 2) +
  
  labs(
    title = "Distribution of Minimum Night Requirements",
    x = "Minimum Nights",
    y = "Number of Listings"
  ) +
  # Zooming in to see the 'action' (1-30 nights)
  coord_cartesian(xlim = c(0, 10)) + 
  theme_minimal()

Insight: High-Velocity Short-Term Market The Melbourne market is overwhelmingly dominated by high-turnover listings. Over 70% of all properties (approx. 11,400+ listings) require a minimum stay of only 1 or 2 nights. This reflects a market highly optimized for weekend tourism, event-goers (such as those attending the Australian Open or Grand Prix), and short-term business travelers.

Key Observations:

  • The “Weekend Anchor”: The almost identical volume of 1-night (~5,693) and 2-night (~5,721) requirements suggests that hosts are split between maximizing occupancy (1-night) and reducing cleaning/turnover costs (2-nights).

  • The 7-Night Threshold: A minor but visible spike at the 7-night mark likely indicates a specific segment of “weekly stays,” potentially catering to domestic relocations or longer-term project workers.

  • Market Accessibility: The rapid decay of listings requiring more than 3 nights shows that “medium-term” stays (4-10 nights) are a significantly smaller niche in the Greater Melbourne area.

Strategic Takeaway: > For travelers, Melbourne offers extreme flexibility. For the city, this high volume of short-stay availability highlights the platform’s role as a direct competitor to traditional hotels rather than long-term residential leasing.

Suburbs where the minimum nights are more than 6 days

Show the code
airbnb_density %>%
  filter(minimum_nights > 6) %>%
  group_by(neighbourhood) %>%
  summarise(
    count = n(),
    avg_price = mean(price_num, na.rm = TRUE)
  ) %>%
  arrange(desc(neighbourhood))
# A tibble: 30 × 3
   neighbourhood count avg_price
   <chr>         <int>     <dbl>
 1 Yarra Ranges      2      878.
 2 Yarra            23      214.
 3 Wyndham           7      155.
 4 Whittlesea        7      245.
 5 Whitehorse       10      144.
 6 Stonnington      44      447.
 7 Port Phillip     51      198.
 8 Nillumbik         1      198 
 9 Moreland         21      195.
10 Moonee Valley     9      105 
# ℹ 20 more rows

Distribution of Host Listings

Show the code
# 1. Clean the data
airbnb_density <- airbnb_density %>%
  mutate(calculated_host_listings_count = as.numeric(as.character(calculated_host_listings_count))) %>%
  filter(!is.na(calculated_host_listings_count))

# 2. Plot the Distribution
ggplot(airbnb_density, aes(x = calculated_host_listings_count)) +
  # Use binwidth of 1 to see individual listing counts (1, 2, 3...)
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white", boundary = 0.5) +
  
  # 3. Add Labels (Only for bars with > 30 listings to avoid clutter)
  stat_bin(binwidth = 1, geom = "text", boundary = 0.5,
           aes(label = ifelse(after_stat(count) > 10, 
                              paste0(after_stat(count), "\n(", 
                                     round(after_stat(count) / sum(after_stat(count)) * 100, 1), "%)"), 
                              "")), 
           vjust = -0.5, size = 3, fontface = "bold") +
  
  # 4. Add Density Line (Scaled to match count)
  geom_density(aes(y = after_stat(density) * nrow(airbnb_density)), 
               color = "darkred", linewidth = 1,
               adjust = 1.5) +
  
  labs(
    title = "Distribution of Host Listings",
    subtitle = paste("Analysis of", nrow(airbnb_density), "listings in Melbourne"),
    x = "Number of Listings Managed by the Host",
    y = "Frequency (Number of Listings)"
  ) +
  
  # 5. The Zoom: Focus on the 1-10 range where most hosts sit
  coord_cartesian(xlim = c(0, 15)) + 
  
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold", size = 14)
  )

Insight: The Long Tail of Professional Management This distribution highlights a significant concentration of property ownership within the Melbourne Airbnb ecosystem. While the largest single group consists of “single-listing” hosts (those with just 1 property), a substantial portion of the market is controlled by “Multi-Unit Hosts” who manage dozens of properties simultaneously.

Key Observations:

  • The Individual Host (1 Listing): This remains the largest segment by frequency, representing the traditional “sharing economy” model.

  • The Professional Pivot: As we move along the X-axis, we see a “long tail” of hosts managing 10, 20, or even 30+ listings. These are likely professional real estate management firms or dedicated short-term rental agencies.

  • Market Influence: While individual hosts are more numerous, professional hosts with large portfolios often command a disproportionate share of the total available nights and revenue due to higher availability and optimized pricing strategies.

Strategic Takeaway: > For a property manager or investor, this chart shows a highly “institutionalized” market. Entering the Melbourne market today means competing not just with individuals, but with sophisticated operators who benefit from economies of scale in cleaning, maintenance, and guest communication.

Monthly Reviews

Show the code
# 1. Clean the data
airbnb_density <- airbnb_density %>%
  mutate(calculated_host_listings_count = as.numeric(as.character(reviews_per_month))) %>%
  filter(!is.na(reviews_per_month))

# 2. Plot the Distribution
ggplot(airbnb_density, aes(x = reviews_per_month)) +
  # Use binwidth of 1 to see individual listing counts (1, 2, 3...)
  geom_histogram(binwidth = 1, fill = "steelblue", color = "white", boundary = 0.5) +
  # 4. Add Density Line (Scaled to match count)
  geom_density(aes(y = after_stat(density) * nrow(airbnb_density)), 
               color = "red", size = 1) +
# 2. Labels: 
  stat_bin(binwidth = 1, geom = "text", boundary = 0,
           aes(label = paste0(after_stat(count), 
                            " (", 
                            round(after_stat(count) / sum(after_stat(count)) * 100, 1), 
                            "%)")), 
           vjust = -3,
           hjust = 1.25,
           size = 3) +
  
  labs(
    title = "Monthly Review Distribution",
    subtitle = paste("Analysis of", nrow(airbnb_density), "listings in Melbourne"),
    x = "Number of Reviews per Month",
    y = "Frequency (Number of Listings)"
  ) +
  
  # 5. The Zoom: Focus on the 1-30 range where most hosts sit
  coord_cartesian(xlim = c(0, 10)) + 
  
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold", size = 14)
  )

Insight: The monthly review per unit is largely less than 3, which indicates that the guests are not active in leaving reviews, unless they are strongly satisfied or upset about the property.