Optimizing Airbnb Investment in Austin, TX

A Data-Driven Guide for Real Estate (Airbnb) Investors

Author

Peter Aina

Audience Description

Primary Audience: DFX Housing Investments (a fictional Austin-based real estate firm specializing in short-term rentals).

Key Stakeholders:
- Real estate investors evaluating Austin properties
- Property managers optimizing existing Airbnb listings
- Entrepreneurs entering the short-term rental market

Why This Matters:
- Austin’s competitive market requires data-backed decisions to avoid overpaying for underperforming properties.
- Misjudging location or property type can lead to lower annual returns.


Background Information

Driven by tourism, vacationers, and demand for warmer winter climates, the Austin short-term rental market is expanding quickly. DFX Investments seeks to capitalize on this growth by investing strategically. To maximize profitability and ensure they acquire the most frequently rented properties, they recognize the need for a data-driven approach. This is crucial for navigating the competitive market and avoiding costly mistakes, such as overpaying for low-yield properties or selecting suboptimal locations or types, which could otherwise lead to reduced annual returns.

Data Source: Airbnb listings (from open datasets at Inside Airbnb)


Problem Statement or Goal

Core Question:
“Where and what type of property should DFX housing invest in to maximize ROI for short-term rentals in Austin?”

Key Objectives:
1. Identify top 3 neighbourhoods by projected revenue.
2. Compare the profitability of entire homes vs. private rooms vs. shared rooms.
3. Highlight seasons where demand is higher.


Analyses and Support

Estimate Occupancy Rate

We need to estimate the occupancy rate using this variables:

  • availability_365: Days the property is available for bookings in a year.

  • number_of_reviews: Total reviews (proxy for bookings, assuming ~50% of guests leave reviews).

  • minimum_nights: Average stay length (affects turnover).

Formula:

Occupancy Rate = (Estimated Bookings . Average Stay Length​) / Availability_365

Estimated Bookings: We are also estimating the bookings as twice the number of reviews

Bookings = number_of_reviews × 2

Average Stay Length: Use median minimum_nights (we don’t have actual stay length so we estimated as well).

airbnb <- read_delim("./airbnb_austin.csv", delim = ",")
Rows: 15244 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (3): name, host_name, room_type
dbl  (12): id, host_id, neighbourhood, latitude, longitude, price, minimum_n...
lgl   (2): neighbourhood_group, license
date  (1): last_review

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
airbnb_data <- airbnb |>
  mutate(
    estimated_bookings = number_of_reviews * 2,
    avg_stay_length = median(minimum_nights, na.rm = TRUE),
    occupancy_rate = (estimated_bookings * avg_stay_length) / availability_365
  ) |>
  # Cap occupancy at 100% (avoid overestimation)
  mutate(occupancy_rate = pmin(occupancy_rate, 1))

Top 3 Neighbourhoods by Projected Revenue

Approach

  • Group listings by neighbourhood.

  • Calculate median revenue (price * availability * occupancy_rate).

airbnb_data <- airbnb_data |>
  mutate(
    projected_revenue = price * availability_365 * occupancy_rate
  )
summary(airbnb_data$occupancy_rate)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
 0.0000  0.0782  0.6921  0.5733  1.0000  1.0000    1117 

Results

top_neighbourhoods <- airbnb_data |>
  group_by(neighbourhood) |>
  summarise(
    median_revenue = median(projected_revenue, na.rm = TRUE),
    count = n()
  ) |>
  arrange(desc(median_revenue)) |>
  head(3)
kable(top_neighbourhoods, caption = "Top 3 Neighborhoods by Median Projected Revenue")
Top 3 Neighborhoods by Median Projected Revenue
neighbourhood median_revenue count
78739 16560 36
78737 15666 199
78732 15606 76

Interpretation:

  • Travis County: 78739, Southwest Austin: 78737, and West Austin: 78732

  • yield the highest median

  • Action: Prioritise investments in these areas.

Profitability by Room Type

Hypothesis Test

  • Null Hypothesis: No difference in median revenue across room types.

  • Alternative: At least one room type differs (Kruskal-Wallis test).

kruskal.test(projected_revenue ~ room_type, data = airbnb_data)

    Kruskal-Wallis rank sum test

data:  projected_revenue by room_type
Kruskal-Wallis chi-squared = 975.09, df = 3, p-value < 2.2e-16

Interpretation:

  • Reject the null (p < 0.05); Significant differences exist.

Visualization

ggplot(airbnb_data, aes(x = room_type, y = projected_revenue, fill = room_type)) +
  geom_boxplot() +
  coord_cartesian(ylim = c(0, 30000)) + #Capped at 30,000 for better visuals
  labs(title = "Revenue Distribution by Room Type", y = "Projected Revenue", x = "") +
  theme_minimal()
Warning: Removed 4067 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Metrics by Room Type

property_type_stats <- airbnb_data |>
  group_by(room_type) |>
  summarise(
    avg_nightly_rate = mean(price, na.rm = TRUE),
    avg_occupancy = mean(occupancy_rate, na.rm = TRUE) * 100, 
    estimated_monthly_revenue = mean(price * 30 * occupancy_rate, na.rm = TRUE)
  ) |>
  mutate(across(where(is.numeric), ~ round(., 2))) 
kable(property_type_stats, 
      col.names = c("Property Type", "Avg. Nightly Rate ($)", "Avg. Occupancy (%)", "Estimated Monthly Revenue ($)"),
      align = c("l", "c", "c", "c"),
      caption = "Property Type Performance Comparison") |>
  kable_styling(bootstrap_options = c("striped", "hover"))
Property Type Performance Comparison
Property Type Avg. Nightly Rate ($) Avg. Occupancy (%) Estimated Monthly Revenue ($)
Entire home/apt 298.40 58.98 3387.75
Hotel room 452.90 4.61 382.94
Private room 175.71 52.68 766.32
Shared room 27.13 37.87 211.24

Conclusion:

  • Entire homes generate significantly higher revenue and has the highest average occupancy.

  • Action: Focus on entire homes unless private rooms in high-demand areas show exceptional occupancy.

neighbourhood_stats <- airbnb_data |>
  group_by(neighbourhood) |>
  summarise(
    median_price = median(price, na.rm = TRUE),
    median_revenue = median(projected_revenue, na.rm = TRUE),
    total_reviews = sum(number_of_reviews, na.rm = TRUE),
    total_listings = n(),                                  
    .groups = "drop"
  ) |>
  mutate(
    category = case_when(
      median_price > 150 & total_listings > 50 ~ "Premium (High Price, High Competition)",
      total_reviews > 1000 & median_price <= 150 ~ "High Demand, Moderate Price",
      median_price > 200 & total_reviews > 500 ~ "Tourist Hotspot (Premium Pricing)",
      TRUE ~ "Other"
    )
  )

# Top 3 categories for investment
top_categories <- neighbourhood_stats |>
  filter(category != "Other") |>
  arrange(desc(median_revenue))
kable(top_categories, caption = "Neighbourhood Categories by Revenue Potential")
Neighbourhood Categories by Revenue Potential
neighbourhood median_price median_revenue total_reviews total_listings category
78737 249.0 15666.0 9545 199 Premium (High Price, High Competition)
78732 345.5 15606.0 3043 76 Premium (High Price, High Competition)
78734 225.5 14920.0 16677 359 Premium (High Price, High Competition)
78736 199.0 13272.0 5437 93 Premium (High Price, High Competition)
78702 162.0 11178.0 133287 1821 Premium (High Price, High Competition)
78733 287.5 10734.0 3088 99 Premium (High Price, High Competition)
78731 181.0 9738.5 4580 175 Premium (High Price, High Competition)
78704 158.0 9309.0 113001 2264 Premium (High Price, High Competition)
78746 364.5 8068.0 6183 285 Premium (High Price, High Competition)
78703 187.0 7056.0 31148 715 Premium (High Price, High Competition)
78722 143.5 7054.5 14413 289 High Demand, Moderate Price
78721 125.0 6946.0 21728 400 High Demand, Moderate Price
78756 113.0 6694.5 8892 193 High Demand, Moderate Price
78757 124.5 6144.0 10472 287 High Demand, Moderate Price
78753 95.0 5957.0 8238 223 High Demand, Moderate Price
78729 137.0 5772.0 4971 166 High Demand, Moderate Price
78745 133.5 5504.0 24347 813 High Demand, Moderate Price
78741 101.0 5096.0 29111 927 High Demand, Moderate Price
78752 88.0 5034.0 9604 274 High Demand, Moderate Price
78705 100.0 4566.0 15448 568 High Demand, Moderate Price
78750 105.5 4452.0 1728 95 High Demand, Moderate Price
78723 116.5 4354.0 20693 496 High Demand, Moderate Price
78744 104.5 4300.0 18597 461 High Demand, Moderate Price
78730 370.0 4224.0 1074 43 Tourist Hotspot (Premium Pricing)
78751 84.0 4088.0 24602 546 High Demand, Moderate Price
78749 161.0 3750.0 3854 173 Premium (High Price, High Competition)
78727 110.5 3708.0 5586 177 High Demand, Moderate Price
78728 104.5 3472.0 3443 116 High Demand, Moderate Price
78758 120.0 3212.0 12439 407 High Demand, Moderate Price
78701 227.0 3124.0 36344 1112 Premium (High Price, High Competition)
78748 121.5 3070.0 8274 261 High Demand, Moderate Price
78724 107.5 2889.0 6054 232 High Demand, Moderate Price
78725 124.0 2880.0 2808 99 High Demand, Moderate Price
78759 125.0 2568.0 3042 175 High Demand, Moderate Price
78747 123.0 2376.0 1855 124 High Demand, Moderate Price
78735 98.0 2204.5 1269 76 High Demand, Moderate Price
78717 101.5 1914.0 1496 73 High Demand, Moderate Price
78754 110.0 1712.0 3527 179 High Demand, Moderate Price
78738 149.0 0.0 1211 83 High Demand, Moderate Price

Interpretation:

  1. Premium Areas:

    • Example: 78737 — high nightly rates but saturated.

    • Strategy: Target unique properties (e.g., luxury condos).

  2. High Demand, Moderate Price:

    • Example: 78722 — strong bookings with lower competition.
  3. Tourist Hotspots:

    • Example: 78730 — premium pricing near attractions.

Visual: Profitability Map

Color-code neighbourhoods by median_revenue.

# Replace this with your actual coordinate data
neighbourhood_coords <- airbnb_data |>
  group_by(neighbourhood) |>
  summarise(
    lat = median(latitude, na.rm = TRUE),
    long = median(longitude, na.rm = TRUE)
  )

neighbourhood_stats <- left_join(neighbourhood_stats, neighbourhood_coords, by = "neighbourhood")

# Convert to sf object
neighbourhood_sf <- st_as_sf(neighbourhood_stats, coords = c("long", "lat"), crs = 4326)

# Plot using OpenStreetMap tiles
ggplot() +
  annotation_map_tile(type = "osm", zoomin = 0) +
  geom_sf(data = neighbourhood_sf, aes(size = median_revenue, color = category), alpha = 0.7) +
  scale_color_manual(values = c("#E69F00", "#56B4E9", "#009E73", "#CC79A7")) +
  labs(
    title = "Austin STR Profitability Zones",
    subtitle = "Size = Revenue, Color = Category",
    x = "", y = ""
  ) +
  theme_minimal()
Zoom: 11
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_sf()`).

Austin Profitability Map

Regression Model (Revenue Predictors)

model <- lm(projected_revenue ~ neighbourhood + room_type + price, data = airbnb_data)
summary(model)

Call:
lm(formula = projected_revenue ~ neighbourhood + room_type + 
    price, data = airbnb_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-690513  -15707   -7789    2826 1115661 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            7.088e+06  1.394e+06   5.085 3.73e-07 ***
neighbourhood         -8.984e+01  1.771e+01  -5.074 3.96e-07 ***
room_typeHotel room   -2.086e+04  3.966e+03  -5.261 1.46e-07 ***
room_typePrivate room -1.493e+04  1.081e+03 -13.811  < 2e-16 ***
room_typeShared room  -1.320e+04  4.968e+03  -2.657  0.00791 ** 
price                  1.771e+01  4.308e-01  41.103  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 38240 on 11171 degrees of freedom
  (4067 observations deleted due to missingness)
Multiple R-squared:  0.1534,    Adjusted R-squared:  0.153 
F-statistic: 404.8 on 5 and 11171 DF,  p-value: < 2.2e-16

Result:

  • Hotel rooms earn ~20k less than Entire home/apt

  • Private rooms earn ~15k less than Entire home/apt

  • Shared rooms earn ~13k less than Entire home/apt

Key Findings:

  • Entire homes contribute more to revenue than shared rooms and private rooms.

Recommendations

  1. Location: Invest in Travis County: 78739, Southwest Austin: 78737, and West Austin: 78732.

  2. Property Type: Prioritise entire homes over private/shared rooms.

  3. Seasonal Strategy: Increase prices around July to September.

Limitations & Assumptions

  • Seasonality: Monthly estimates assume constant demand (adjust for peaks like SXSW).

  • Costs Excluded: Cleaning fees, maintenance, and taxes would reduce net profit.

  • Data Gaps: Actual bookings vs. reviews may vary.

  • Regulations: Check local short-term rentals laws

  • Review-to-Booking Ratio: We assume 50% of guests leave reviews ( but it can be adjusted if we have industry benchmarks).

  • Stay Length: minimum_nights is a poor proxy for actual stay length.

  • Availability Accuracy: availability_365 may include blocked days (e.g., host cancellations), leading to underestimation.