Why This Matters:
- Austin’s competitive market requires data-backed decisions to avoid overpaying for underperforming properties.
- Misjudging location or property type can lead to lower annual returns.
Background Information
Driven by tourism, vacationers, and demand for warmer winter climates, the Austin short-term rental market is expanding quickly. DFX Investments seeks to capitalize on this growth by investing strategically. To maximize profitability and ensure they acquire the most frequently rented properties, they recognize the need for a data-driven approach. This is crucial for navigating the competitive market and avoiding costly mistakes, such as overpaying for low-yield properties or selecting suboptimal locations or types, which could otherwise lead to reduced annual returns.
Data Source: Airbnb listings (from open datasets at Inside Airbnb)
Problem Statement or Goal
Core Question: “Where and what type of property should DFX housing invest in to maximize ROI for short-term rentals in Austin?”
Key Objectives:
1. Identify top 3 neighbourhoods by projected revenue.
2. Compare the profitability of entire homes vs. private rooms vs. shared rooms.
3. Highlight seasons where demand is higher.
Analyses and Support
Estimate Occupancy Rate
We need to estimate the occupancy rate using this variables:
availability_365: Days the property is available for bookings in a year.
number_of_reviews: Total reviews (proxy for bookings, assuming ~50% of guests leave reviews).
minimum_nights: Average stay length (affects turnover).
Rows: 15244 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): name, host_name, room_type
dbl (12): id, host_id, neighbourhood, latitude, longitude, price, minimum_n...
lgl (2): neighbourhood_group, license
date (1): last_review
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
kable(top_neighbourhoods, caption ="Top 3 Neighborhoods by Median Projected Revenue")
Top 3 Neighborhoods by Median Projected Revenue
neighbourhood
median_revenue
count
78739
16560
36
78737
15666
199
78732
15606
76
Interpretation:
Travis County: 78739, Southwest Austin: 78737, and West Austin: 78732
yield the highest median
Action: Prioritise investments in these areas.
Profitability by Room Type
Hypothesis Test
Null Hypothesis: No difference in median revenue across room types.
Alternative: At least one room type differs (Kruskal-Wallis test).
kruskal.test(projected_revenue ~ room_type, data = airbnb_data)
Kruskal-Wallis rank sum test
data: projected_revenue by room_type
Kruskal-Wallis chi-squared = 975.09, df = 3, p-value < 2.2e-16
Interpretation:
Reject the null (p < 0.05); Significant differences exist.
Visualization
ggplot(airbnb_data, aes(x = room_type, y = projected_revenue, fill = room_type)) +geom_boxplot() +coord_cartesian(ylim =c(0, 30000)) +#Capped at 30,000 for better visualslabs(title ="Revenue Distribution by Room Type", y ="Projected Revenue", x ="") +theme_minimal()
Warning: Removed 4067 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Example: 78722 — strong bookings with lower competition.
Tourist Hotspots:
Example: 78730 — premium pricing near attractions.
Visual: Profitability Map
Color-code neighbourhoods by median_revenue.
# Replace this with your actual coordinate dataneighbourhood_coords <- airbnb_data |>group_by(neighbourhood) |>summarise(lat =median(latitude, na.rm =TRUE),long =median(longitude, na.rm =TRUE) )neighbourhood_stats <-left_join(neighbourhood_stats, neighbourhood_coords, by ="neighbourhood")# Convert to sf objectneighbourhood_sf <-st_as_sf(neighbourhood_stats, coords =c("long", "lat"), crs =4326)# Plot using OpenStreetMap tilesggplot() +annotation_map_tile(type ="osm", zoomin =0) +geom_sf(data = neighbourhood_sf, aes(size = median_revenue, color = category), alpha =0.7) +scale_color_manual(values =c("#E69F00", "#56B4E9", "#009E73", "#CC79A7")) +labs(title ="Austin STR Profitability Zones",subtitle ="Size = Revenue, Color = Category",x ="", y ="" ) +theme_minimal()
Zoom: 11
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_sf()`).
Austin Profitability Map
Seasonal Demand Trends
Approach
Use review dates as a proxy for booking activity.
Aggregate reviews by month.
airbnb_ <- airbnb |>mutate(last_review_date =as.Date(last_review, format ="%d/%m/%Y"),review_month =format(last_review_date, "%m") )# Plot monthly review frequencymonthly_reviews <- airbnb_ |>filter(!is.na(review_month)) |>group_by(review_month) |>summarise(review_count =n())ggplot(monthly_reviews, aes(x = review_month, y = review_count)) +geom_col(fill ="steelblue") +labs(title ="Monthly Booking Activity (Reviews as Proxy)", x ="Month", y ="Review Count") +theme_minimal()
Interpretation:
Demand peaks between July and September, coinciding with festivals like SXSW and ACL. March through May also represents a high season. These periods are popular partly because they feature ideal weather conditions compared to other times or places.
Action: Adjust pricing dynamically during high-demand periods.
Regression Model (Revenue Predictors)
model <-lm(projected_revenue ~ neighbourhood + room_type + price, data = airbnb_data)summary(model)
Call:
lm(formula = projected_revenue ~ neighbourhood + room_type +
price, data = airbnb_data)
Residuals:
Min 1Q Median 3Q Max
-690513 -15707 -7789 2826 1115661
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.088e+06 1.394e+06 5.085 3.73e-07 ***
neighbourhood -8.984e+01 1.771e+01 -5.074 3.96e-07 ***
room_typeHotel room -2.086e+04 3.966e+03 -5.261 1.46e-07 ***
room_typePrivate room -1.493e+04 1.081e+03 -13.811 < 2e-16 ***
room_typeShared room -1.320e+04 4.968e+03 -2.657 0.00791 **
price 1.771e+01 4.308e-01 41.103 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 38240 on 11171 degrees of freedom
(4067 observations deleted due to missingness)
Multiple R-squared: 0.1534, Adjusted R-squared: 0.153
F-statistic: 404.8 on 5 and 11171 DF, p-value: < 2.2e-16
Result:
Hotel rooms earn ~20k less than Entire home/apt
Private rooms earn ~15k less than Entire home/apt
Shared rooms earn ~13k less than Entire home/apt
Key Findings:
Entire homes contribute more to revenue than shared rooms and private rooms.
Recommendations
Location: Invest in Travis County: 78739, Southwest Austin: 78737, and West Austin: 78732.
Property Type: Prioritise entire homes over private/shared rooms.
Seasonal Strategy: Increase prices around July to September.
Limitations & Assumptions
Seasonality: Monthly estimates assume constant demand (adjust for peaks like SXSW).
Costs Excluded: Cleaning fees, maintenance, and taxes would reduce net profit.
Data Gaps: Actual bookings vs. reviews may vary.
Regulations: Check local short-term rentals laws
Review-to-Booking Ratio: We assume 50% of guests leave reviews ( but it can be adjusted if we have industry benchmarks).
Stay Length: minimum_nights is a poor proxy for actual stay length.
Availability Accuracy: availability_365 may include blocked days (e.g., host cancellations), leading to underestimation.