title: “HW3 - Simple Linear Regression with Punta Cana Data” subtitle: “Analyzing the Relationship Between Bedrooms and Nightly Rate” author: “David J. Gibbens” date: “2026-04-07” output: ioslides_presentation


Introduction: Why I’m Doing This Analysis

Given the recent volatility in the stock market, I have decided to diversify my long‑term investment strategy by opening a self‑directed Roth IRA to purchase real estate in Punta Cana, Dominican Republic. My goal is to acquire a property that can operate as a profitable Airbnb rental while also appreciating in value over time.

To make informed investment decisions, I need to understand which property features — such as the number of bedrooms — make a listing more attractive to tourists and more lucrative as a short‑term rental. This analysis uses real Punta Cana listing data to explore how bedroom count relates to nightly rental rates, forming the foundation for more advanced modeling in Project 1.

Data Overview

##     bedrooms      ttm_avg_rate    
##  Min.   :1.000   Min.   :  18.30  
##  1st Qu.:1.000   1st Qu.:  63.20  
##  Median :2.000   Median :  99.25  
##  Mean   :2.147   Mean   : 177.45  
##  3rd Qu.:3.000   3rd Qu.: 164.88  
##  Max.   :7.000   Max.   :2738.10

Scatterplot: Bedrooms vs Nightly Rate

Scatterplot with Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Regression Model Output

## 
## Call:
## lm(formula = ttm_avg_rate ~ bedrooms, data = pdc_small)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -384.74 -102.09  -41.81   98.40 2017.90 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -231.11      40.50  -5.707 4.41e-08 ***
## bedrooms      190.26      16.72  11.377  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 257.9 on 188 degrees of freedom
## Multiple R-squared:  0.4077, Adjusted R-squared:  0.4046 
## F-statistic: 129.4 on 1 and 188 DF,  p-value: < 2.2e-16

Regression Equation

Using the coefficient estimates from the regression output: - Intercept (β0) = -231.11 - Slope (β1) = 190.26

The fitted regression equation is:

\[ \hat{Y} = -231.11 + 190.26 \cdot \text{Bedrooms} \]

Interpretation of the Slope

The estimated slope for bedrooms is 190.26.
This means:

For every additional bedroom, the predicted TTM average nightly rate increases by about $190.26 on average.

Because the p‑value for the slope is extremely small (\(< 2e-16\)), the relationship between bedrooms and nightly rate is statistically significant.


Interpretation of the Intercept

The intercept is –231.11, which represents the predicted nightly rate when the number of bedrooms is zero.

This value is not meaningful in a real‑estate context, but it is required mathematically to anchor the regression line.


Interpretation of R-Squared

The regression model has an R-squared value of 0.4077.

This means:

Approximately 40.8% of the variation in TTM average nightly rate is explained by differences in the number of bedrooms.

This indicates a moderately strong relationship for a single‑predictor model.


Interpretation of Overall Model Significance

The F-statistic is 129.4 with a p-value < 2.2e-16.

This indicates:

The overall regression model is statistically significant, meaning bedrooms meaningfully predict nightly rate.


Conclusion

Based on this simple linear regression analysis:

  • The number of bedrooms is a statistically significant predictor of TTM average nightly rate.
  • Each additional bedroom is associated with an estimated $190.26 increase in nightly rate, on average.
  • The model explains approximately 40.8% of the variation in nightly rates.
  • While bedrooms matter, other variables not included in this simple model also influence pricing.

Overall, there is a clear positive relationship between bedroom count and nightly rate, and this model provides a strong first step in understanding pricing dynamics in this market.

Why Include Diagnostic Plots?

To ensure that the simple linear regression model is appropriate for this analysis, it is important to evaluate whether the model assumptions are reasonably met.

  • Residual Plot — checks whether the relationship is approximately linear and whether the residuals show constant variance across fitted values.
  • Fitted vs Actual Plot — visualizes how well the model’s predictions align with the true nightly rates.
  • Purpose — these diagnostics help confirm that the model is reliable and that the conclusions drawn from it are valid.

Including these plots strengthens the credibility of the analysis and provides a more complete understanding of model performance.

Residual Plot

Fitted vs Actual Plot

The fitted vs. actual plot compares predicted nightly rates to the true nightly rates.
Points that fall close to the dashed 45‑degree line indicate accurate predictions.
A clear upward trend supports the conclusion that bedroom count is positively associated with nightly rate.

Interactive Plotly Scatterplot

## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

Thank You

Questions or comments are welcome.