airbnb <- read_delim("./airbnb_austin.csv", delim = ",")
## Rows: 15244 Columns: 18
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (3): name, host_name, room_type
## dbl (12): id, host_id, neighbourhood, latitude, longitude, price, minimum_n...
## lgl (2): neighbourhood_group, license
## date (1): last_review
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
I’ll define a listing from number_of_review
as
“Frequently Booked” if it has above-median
reviews (indicates consistent bookings):
# Convert to binary (1 = frequently booked, 0 = not)
median_reviews <- median(airbnb$number_of_reviews, na.rm = TRUE)
airbnb_data <- airbnb |>
mutate(frequently_booked = as.numeric(number_of_reviews > median_reviews))
room_type
(categorical): Entire home
vs. others
price
(continuous): Standardized
availability_365
(continuous): Days
available/year
model_logit <- glm(frequently_booked ~ room_type + scale(price) +
availability_365,
data = airbnb_data, family = "binomial")
summary(model_logit)
##
## Call:
## glm(formula = frequently_booked ~ room_type + scale(price) +
## availability_365, family = "binomial", data = airbnb_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5752 -1.3056 0.8798 1.0139 2.8624
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.7904910 0.0419073 18.863 < 2e-16 ***
## room_typeHotel room -3.1837214 0.4603770 -6.915 4.66e-12 ***
## room_typePrivate room -0.9882397 0.0584118 -16.918 < 2e-16 ***
## room_typeShared room -0.4852431 0.2603035 -1.864 0.0623 .
## scale(price) -0.3717990 0.0467163 -7.959 1.74e-15 ***
## availability_365 -0.0015754 0.0001686 -9.346 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 15228 on 11182 degrees of freedom
## Residual deviance: 14620 on 11177 degrees of freedom
## (4061 observations deleted due to missingness)
## AIC: 14632
##
## Number of Fisher Scoring iterations: 5
Entire homes : Most likely to be frequently booked
Hotel rooms: 96.8% lower odds vs entire homes
Private rooms: 62.8% lower odds vs entire homes
Shared rooms: Not significantly different
For every 1 SD increase in price, odds of frequent bookings drop by 31%
Each additional available day reduces odds by 0.16% which might indicate that less desirable listings stay available longer and popular listings get booked early, reducing availability
price
confint(model_logit, "scale(price)", level = 0.95)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## -0.4672775 -0.2845991
We’re 95% confident the true coefficient for price
lies
between -0.47 and -0.28 which means that $1
SD price increase reduces odds of frequent bookings by
28–47%.
All variables except shared rooms strongly predict booking frequency
Residual deviance (14,620) < Null deviance (15,228) → Model explains variance better than intercept-only
Median near zero → Good symmetry
Max residual (2.86) suggests some under-predicted cases
Largest impact: Hotel rooms (3x stronger effect than private rooms)
Price matters but less than room type
Convert private rooms to entire homes if possible
Avoid overpricing (even small increases hurt booking frequency)
Limit calendar availability to signal exclusivity