airbnb <- read_delim("./airbnb_austin.csv", delim = ",")
## Rows: 15244 Columns: 18
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (3): name, host_name, room_type
## dbl (12): id, host_id, neighbourhood, latitude, longitude, price, minimum_n...
## lgl (2): neighbourhood_group, license
## date (1): last_review
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
I’ll enhance the previous model (price ~ stay_length) by
adding two new predictors:
room_type (categorical) – Expected
to significantly impact price.
number_of_reviews (continuous) –
Could signal demand/quality, potentially affecting price.
I’ll also test an interaction term
(stay_length * room_type) to see if the effect of stay
length differs by room type.
room_type:
ANOVA showed strong evidence that room type affects price.
Expect entire homes to cost more than private rooms.
There is no multicollinearity with stay_length (they
measure different things).
available_365:
Listings available more days per year may have lower prices (hosts lowering prices to attract bookings).
High-availability listings could signal lower demand or less desirable properties.
airbnb_group <- airbnb |>
mutate(
stay_length = case_when(
minimum_nights >= 1 & minimum_nights <= 3 ~ "Short Stay",
minimum_nights >= 4 & minimum_nights <= 7 ~ "Medium Stay",
minimum_nights >= 8 & minimum_nights <= 30 ~ "Long Stay",
minimum_nights >= 31 ~ "Extended Stay",
TRUE ~ NA_character_
)
)
airbnb_data <- airbnb_group |>
mutate(stay_length = factor(stay_length, levels = c("Short Stay", "Medium Stay", "Long Stay", "Extended Stay")))
model_v2 <- lm(price ~ stay_length + room_type + availability_365,
data = airbnb_data)
summary(model_v2)
##
## Call:
## lm(formula = price ~ stay_length + room_type + availability_365,
## data = airbnb_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -401 -192 -108 3 37754
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 186.0677 16.7215 11.127 < 2e-16 ***
## stay_lengthMedium Stay 24.4672 40.4668 0.605 0.545442
## stay_lengthLong Stay -172.4597 23.5977 -7.308 2.89e-13 ***
## stay_lengthExtended Stay -169.6141 48.6802 -3.484 0.000495 ***
## room_typeHotel room 80.7334 85.8393 0.941 0.346972
## room_typePrivate room -99.1842 23.5330 -4.215 2.52e-05 ***
## room_typeShared room -206.0941 109.0684 -1.890 0.058839 .
## availability_365 0.6549 0.0673 9.731 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 834.5 on 11175 degrees of freedom
## (4061 observations deleted due to missingness)
## Multiple R-squared: 0.01624, Adjusted R-squared: 0.01563
## F-statistic: 26.36 on 7 and 11175 DF, p-value: < 2.2e-16
The model explains only 1.6% of price variation (R² = 0.016), meaning it lacks important price drivers.
Despite being statistically significant (F-statistic, p < 2e-16), its practical usefulness is very limited due to the low R².
Each additional day of availability increases price by $0.65 (p < 0.001), which contradicts typical demand-based pricing logic.
Possible explanations:
Premium listings may remain available longer due to higher pricing.
Popular hosts might keep calendars open longer to attract bookings.
Potential data issues, as new listings may default to 365-day availability.
Long & Extended Stays are significantly cheaper than Short Stays ($172–$170 less, p < 0.001).
Medium Stays show no significant price difference from Short Stays (p = 0.55).
Private Rooms: $99 cheaper than Entire Homes (p < 0.001).
Shared Rooms: $206 cheaper, but only marginally significant (p = 0.059).
Hotel Rooms: No significant price difference from Entire Homes (p = 0.35).
model_interaction <- lm(price ~ stay_length * room_type, data = airbnb_data)
summary(model_interaction)
##
## Call:
## lm(formula = price ~ stay_length * room_type, data = airbnb_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -310 -196 -123 -18 37822
##
## Coefficients: (5 not defined because of singularities)
## Estimate Std. Error t value
## (Intercept) 321.163 9.499 33.809
## stay_lengthMedium Stay 13.381 45.065 0.297
## stay_lengthLong Stay -161.674 26.539 -6.092
## stay_lengthExtended Stay -140.615 52.261 -2.691
## room_typeHotel room 131.732 86.069 1.531
## room_typePrivate room -104.389 27.874 -3.745
## room_typeShared room -275.726 209.754 -1.315
## stay_lengthMedium Stay:room_typeHotel room NA NA NA
## stay_lengthLong Stay:room_typeHotel room NA NA NA
## stay_lengthExtended Stay:room_typeHotel room NA NA NA
## stay_lengthMedium Stay:room_typePrivate room -47.454 103.886 -0.457
## stay_lengthLong Stay:room_typePrivate room 2.707 59.829 0.045
## stay_lengthExtended Stay:room_typePrivate room -22.903 146.393 -0.156
## stay_lengthMedium Stay:room_typeShared room NA NA NA
## stay_lengthLong Stay:room_typeShared room 136.714 246.123 0.555
## stay_lengthExtended Stay:room_typeShared room NA NA NA
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## stay_lengthMedium Stay 0.766531
## stay_lengthLong Stay 1.15e-09 ***
## stay_lengthExtended Stay 0.007143 **
## room_typeHotel room 0.125912
## room_typePrivate room 0.000181 ***
## room_typeShared room 0.188697
## stay_lengthMedium Stay:room_typeHotel room NA
## stay_lengthLong Stay:room_typeHotel room NA
## stay_lengthExtended Stay:room_typeHotel room NA
## stay_lengthMedium Stay:room_typePrivate room 0.647832
## stay_lengthLong Stay:room_typePrivate room 0.963918
## stay_lengthExtended Stay:room_typePrivate room 0.875680
## stay_lengthMedium Stay:room_typeShared room NA
## stay_lengthLong Stay:room_typeShared room 0.578585
## stay_lengthExtended Stay:room_typeShared room NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 838.2 on 11172 degrees of freedom
## (4061 observations deleted due to missingness)
## Multiple R-squared: 0.007956, Adjusted R-squared: 0.007068
## F-statistic: 8.96 on 10 and 11172 DF, p-value: 7.473e-15
Base Price: $321 for Short Stay in Entire homes
Long/Extended Stays: Cheaper by 162(p<0.001) and162(p<0.001) and 141 (p=0.007) respectively
Private Rooms: $104 cheaper than Entire homes (p<0.001)
The estimable interactions (e.g., Medium Stay:Private Room) showed no significant price differences (all p > 0.05)
Only one interaction (Long Stay:Shared Room +$137) approached marginal significance (p=0.58)
Very low R² (0.8%) - explains almost none of price variation
Significant F-statistic but poor practical utility
Room type matters more than stay length in pricing
No evidence that stay-length discounts should vary by room type
plot(model_v2)
The flat red line suggests no severe non-linearity
Slight fanning at higher prices indicates mild heteroscedasticity
Slight upward trend → higher variance at higher prices
Confirms mild heteroscedasticity