library(tidyverse)
library(ggplot2)
library(DT)

1 Problem / Objective Statement

The core challenge of this project is to determine the most cost-effective entry point into the Holland, MI housing market using a VA loan. Specifically, the analysis seeks to answer whether it is better to purchase a more affordable 1-bathroom property and create value through DIY renovations (adding bathrooms and flex space), or to buy a higher-priced home that already has multiple bathrooms. By evaluating purchase prices, mortgage costs, rental income potential, and renovation value-adds, the objective is to identify a strategy that balances affordability, equity growth, and long-term rental cash flow stability.

2 Obtain the Data

DB1 <- read_csv("realtor.csv")

3 Scrub the Data (Cleaning the Data)

 [1] "BrokerTitle_titleText__RL8Gk"                                  
 [2] "card-anchor href"                                              
 [3] "LazyPicturestyles__StyledLazyPicture-rui__pond98-0 src"        
 [4] "Tagstyles__StyledTag-rui__sc-9ogsft-0"                         
 [5] "VisuallyHiddenstyles__StyledVisuallyHidden-rui__sc-1ohi4lq-0"  
 [6] "SavePropertyIconImage_saveIconImage__Zqz5o src"                
 [7] "VisuallyHiddenstyles__StyledVisuallyHidden-rui__sc-1ohi4lq-0 2"
 [8] "base__StyledType-rui__sc-18muj27-0"                            
 [9] "Price"                                                         
[10] "Bed"                                                           
[11] "Baths"                                                         
[12] "Square_Footage"                                                
[13] "Acre_Lot"                                                      
[14] "zip_code"                                                      
# Clean up price column
DB1$Price <- gsub("\\$|,", "", DB1$Price)  # remove $ and ,
DB1$Price <- as.numeric(DB1$Price)         # convert to numeric

summary(DB1)
     Price              Bed            Baths       Square_Footage 
 Min.   : 189900   Min.   :1.000   Min.   :1.000   Min.   :  531  
 1st Qu.: 364900   1st Qu.:3.000   1st Qu.:2.000   1st Qu.: 1640  
 Median : 444900   Median :4.000   Median :2.500   Median : 2025  
 Mean   : 651828   Mean   :3.697   Mean   :2.442   Mean   : 2403  
 3rd Qu.: 584000   3rd Qu.:4.000   3rd Qu.:2.500   3rd Qu.: 2755  
 Max.   :9999900   Max.   :9.000   Max.   :9.500   Max.   :14415  
    Acre_Lot       zip_code  
 Min.   : 0.000   49419:  6  
 1st Qu.: 0.120   49423:137  
 Median : 0.250   49424:145  
 Mean   : 0.775   49434:  1  
 3rd Qu.: 0.440   49453:  6  
 Max.   :22.270   49464: 42  

3.0.1 Checking for Outliers using the Z Score

The z-score is a way of measuring how far something is from “normal” in terms of standard deviations (think of it like “steps” away from the middle).

A z-score of 0 = right at the average.

A z-score of +1 or −1 = close to average.

A z-score of +5 = way out there (like the mansion).

So when I say: “remove anything with a z-score above 3 or below −3,” I’m just saying:

“Take out the stuff that’s too far from the pack, because it doesn’t represent the normal market.” Which is an outlier.

Why It Matters?

By removing those extreme cases:

Your averages now tell a true story about the homes you’d actually buy.

You don’t get tricked into thinking the market is more expensive (or cheaper) than it really is.

# Create new z-score columns alongside your data
DB1$z_price <- scale(DB1$Price)
DB1$z_bed <- scale(DB1$Bed)
DB1$z_bath <- scale(DB1$Baths)
DB1$z_Square_Footage <- scale(DB1$Square_Footage)
DB1$z_Acre_Lot <- scale(DB1$Acre_Lot)


#-----------------------------------------------------

# variables I want to CHECK for outliers (still keep all other columns)
check_vars <- c("Price",  "Bed", "Baths", "Square_Footage", "Acre_Lot")

DB3 <- DB1 %>%
  # add z-score columns just for the vars you want to check
  mutate(across(all_of(check_vars), ~ as.numeric(scale(.x)), .names = "z_{.col}")) %>%
  # keep rows where ALL checked z-scores are within (-3, 3)
  filter(if_all(paste0("z_", check_vars), ~ .x > -3 & .x < 3)) 
summary(DB3)
     Price              Bed            Baths       Square_Footage
 Min.   : 189900   Min.   :1.000   Min.   :1.000   Min.   : 531  
 1st Qu.: 359900   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:1607  
 Median : 439678   Median :4.000   Median :2.500   Median :1960  
 Mean   : 514442   Mean   :3.622   Mean   :2.325   Mean   :2201  
 3rd Qu.: 559656   3rd Qu.:4.000   3rd Qu.:2.500   3rd Qu.:2619  
 Max.   :2599000   Max.   :6.000   Max.   :5.000   Max.   :5400  
    Acre_Lot       zip_code          z_price.V1        
 Min.   :0.0000   49419:  5   Min.   :-0.522995245337  
 1st Qu.:0.0900   49423:131   1st Qu.:-0.330521144857  
 Median :0.2300   49424:136   Median :-0.240197012204  
 Mean   :0.5158   49434:  1   Mean   :-0.155548873549  
 3rd Qu.:0.4000   49453:  6   3rd Qu.:-0.104357000539  
 Max.   :6.0300   49464: 41   Max.   : 2.204589198580  
          z_bed.V1                  z_bath.V1          z_Square_Footage 
 Min.   :-2.7505313288100   Min.   :-1.3768466771800   Min.   :-1.2948  
 1st Qu.:-0.7110834568420   1st Qu.:-0.4221196602870   1st Qu.:-0.5505  
 Median : 0.3086404791400   Median : 0.0552438481584   Median :-0.3066  
 Mean   :-0.0769426341532   Mean   :-0.1118333797980   Mean   :-0.1393  
 3rd Qu.: 0.3086404791400   3rd Qu.: 0.0552438481584   3rd Qu.: 0.1496  
 Max.   : 2.3480883511000   Max.   : 2.4420613903900   Max.   : 2.0733  
   z_Acre_Lot         z_Price            z_Bed             z_Baths        
 Min.   :-0.3906   Min.   :-0.5230   Min.   :-2.75053   Min.   :-1.37685  
 1st Qu.:-0.3452   1st Qu.:-0.3305   1st Qu.:-0.71108   1st Qu.:-0.42212  
 Median :-0.2747   Median :-0.2402   Median : 0.30864   Median : 0.05524  
 Mean   :-0.1306   Mean   :-0.1555   Mean   :-0.07694   Mean   :-0.11183  
 3rd Qu.:-0.1890   3rd Qu.:-0.1044   3rd Qu.: 0.30864   3rd Qu.: 0.05524  
 Max.   : 2.6483   Max.   : 2.2046   Max.   : 2.34809   Max.   : 2.44206  
DB3_ONE <- select(DB3,Price:zip_code)
# Replace 0 with NA, then NA with mean for multiple columns
cols <- c("Price", "Bed", "Baths", "Square_Footage", "Acre_Lot")  # choose your variables
for (col in cols) {
  DB3[[col]][DB3[[col]] == 0] <- NA
  DB3[[col]][is.na(DB3[[col]])] <- mean(DB3[[col]], na.rm = TRUE)
  
}

DB3_ONE <- DB3_ONE %>%
  filter(Price >= 150000 & Price <= 400000)

To keep my analysis focused on realistic investment opportunities, I filtered the dataset to only include homes priced between $150,000 and $400,000. By focusing on this price band, I can better compare typical homes—rather than luxury properties or distressed sales—that align with the kinds of investments I may be considering.

In this section of my R project, I built the financial backbone of my housing market analysis for Holland, MI. Step by step, here’s what I’m doing:

Filter to Target ZIP Codes → I only keep homes in 49423 and 49424, the areas I’m focusing on for potential investment.

Calculate Mortgage → Using a 30-year VA loan at 5.27%, I model monthly principal & interest for each property.

Add Property Taxes → Since taxes vary by ZIP, I assign 1.30% for 49423 and 1.04% for 49424 (monthly basis).

Add Insurance → Estimate annual homeowner’s insurance at 0.45% of home value.

Market Rent → I set the average Holland rent at $2,265/month, which becomes the baseline for rent comparisons.

GRM (Gross Rent Multiplier) → I calculate GRM = Price ÷ Annual Rent, to measure rent efficiency relative to purchase price.

Total Payment → I sum Mortgage + Taxes + Insurance to see the all-in monthly cost of owning.

One Percent Rule → I create a quick-check variable for the classic 1% rule (does monthly rent ≥ 1% of property value?).

# Filtering to only Holland Zip codes

DB3_ONE <- DB3_ONE %>%
  filter(zip_code %in% c("49423", "49424"))


# Creating new Variables for my analysis

loan_rate <- 0.0527  # 5.27% VA loan
n <- 30 * 12         # 30 years

DB3_ONE <- DB3_ONE %>%
  mutate(
    Mortgage = Price * ((loan_rate/12) * (1 + loan_rate/12)^n) / ((1 + loan_rate/12)^n - 1)
  )


#--------------------------------------------------------------------------------

DB3_ONE <- DB3_ONE %>%
  mutate(
    Taxes = case_when(
      zip_code == 49423 ~ Price * 0.0130 / 12,  # 1.30% annual, monthly
      zip_code == 49424 ~ Price * 0.0104 / 12,  # 1.04% annual, monthly
      TRUE ~ Price * 0.012 / 12            # fallback average (1.2%)
    )
  )

#-------------------------------------------------------------------------------

DB3_ONE <- DB3_ONE %>%
  mutate(
    Insurance = Price * 0.0045 / 12   # 0.45% annual, monthly
  )


# ------------------------------------------------------------------------------

DB3_ONE$Average_Rent <- 2265.32 # Holland Market Average Rent 

#-------------------------------------------------------------------------------


annual_rent <- 2265.32 * 12

DB3_ONE <- DB3_ONE %>%
  mutate(
    GRM = Price / annual_rent)

# ------------------------------------------------------------------------------

DB3_ONE <- DB3_ONE %>%
  mutate(
    Total_Payment = Mortgage + Taxes + Insurance
  )

# -----------------------------------------------------------------------------


DB3_ONE <- DB3_ONE %>%
  mutate(
    One_Percent = Price * .01
  )

4 Explore the Data

summary(DB3_ONE)
     Price             Bed            Baths       Square_Footage
 Min.   :189900   Min.   :1.000   Min.   :1.000   Min.   : 531  
 1st Qu.:295000   1st Qu.:3.000   1st Qu.:1.000   1st Qu.:1365  
 Median :339950   Median :3.000   Median :1.750   Median :1589  
 Mean   :328972   Mean   :3.279   Mean   :1.716   Mean   :1585  
 3rd Qu.:371586   3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:1783  
 Max.   :400000   Max.   :6.000   Max.   :3.500   Max.   :2673  
    Acre_Lot       zip_code     Mortgage        Taxes         Insurance     
 Min.   :0.0000   49419: 0   Min.   :1051   Min.   :190.6   Min.   : 71.21  
 1st Qu.:0.1500   49423:56   1st Qu.:1633   1st Qu.:296.3   1st Qu.:110.62  
 Median :0.2300   49424:48   Median :1881   Median :320.9   Median :127.48  
 Mean   :0.3086   49434: 0   Mean   :1821   Mean   :321.3   Mean   :123.36  
 3rd Qu.:0.3100   49453: 0   3rd Qu.:2057   3rd Qu.:346.6   3rd Qu.:139.34  
 Max.   :3.6900   49464: 0   Max.   :2214   Max.   :433.3   Max.   :150.00  
  Average_Rent       GRM         Total_Payment   One_Percent  
 Min.   :2265   Min.   : 6.986   Min.   :1328   Min.   :1899  
 1st Qu.:2265   1st Qu.:10.852   1st Qu.:2055   1st Qu.:2950  
 Median :2265   Median :12.506   Median :2337   Median :3400  
 Mean   :2265   Mean   :12.102   Mean   :2265   Mean   :3290  
 3rd Qu.:2265   3rd Qu.:13.669   3rd Qu.:2538   3rd Qu.:3716  
 Max.   :2265   Max.   :14.715   Max.   :2797   Max.   :4000  

Key Takeaways from Descriptive Statistics

Looking at the homes I selected in the $189,900 to $400,000 price range, the median property is about $339,950, with a typical total monthly payment (mortgage, taxes, and insurance) of around $2,337.

Since the average market rent is $2,265, most properties in this range are close to break-even, so the way I approach value-add is crucial.

The GRM spans from 6.99 to 14.7, with lower GRMs being stronger rental candidates. Right now, the 1% rule doesn’t hold in this market, so I need to rely more on GRM and actual cash flow.

This is where bathrooms come in: buying a 1-bathroom house at the lower end of the price range may allow me to save upfront, and then add a second bathroom and a flex room to boost square footage and functionality.That could push the monthly rent above the market average, which would lower the GRM and make the property more attractive as an investment.

On the other hand, buying a 2-bathroom home with more space already built in comes with a higher upfront mortgage, but might also come with fewer renovation costs and faster rental readiness.

The key for me is to compare these scenarios side by side—average price, mortgage, acres, square footage, and bathrooms—to see whether adding value to a 1-bath home is more cost-effective than buying a 2-bath home outright.

To guide my decision, I will use multiple regression, ANOVA, and other statistical analyses to evaluate how factors like bathrooms, square footage, and lot size influence both property value and rental potential. The key for me is to compare these scenarios side by side—average price, mortgage, acres, square footage, and bathrooms—to see whether adding value to a 1-bath home is more cost-effective than buying a 2-bath home outright.

4.1 Data Visualization

Takeaways from Charts

  • Price by Bathroom Count (Boxplot)

Clear upward trend: 1-bath homes average lower in price (~$286k) than 1.5, 2, or more bath homes (up to ~$400k). The jump in price from 1 bath → 2 baths is significant.

Takeaway: I can enter the market cheaper with 1 bath, but 2-bath homes carry a premium that may already price in the added value.

  • Square Footage vs Price (Colored by Baths + Trendlines)

Square footage is strongly tied to price, but bath count shifts the trend upward. For the same square footage, 2-bath homes often sell higher than 1-bath homes. This suggests bathrooms add independent value beyond just square footage.

Takeaway: Adding a bathroom may “reposition” a house into a higher price bracket.

  • GRM by Bathroom Count (Boxplot)

1-bath homes have lower GRMs (~11 median) than 2-bath homes (~13–14). Lower GRM = better rental value relative to price.

Takeaway: From an investor’s standpoint, 1-bath homes are better value buys. If you can add a bathroom without overspending, you could preserve this advantage while boosting rent.

  • Mean Lot Size by Bathroom Count (Bubble Plot)

1-bath and 1.5-bath homes have larger average lot sizes (~0.33–0.49 acres) compared to 2-bath and above (~0.28 or less).

Takeaway: More land = more flexibility for expansions (bathroom, flex room). This strengthens the case for targeting 1-bath homes.

  • Distribution of Total Payments vs Average Rent (Density Plot)

Red dashed line = average rent ($2,265). 1-bath homes cluster closer to or below this line, while 2-bath homes often push above it.

Takeaway: Owning a 1-bath home is more likely to match or beat rental income, leaving more room for positive cash flow after improvements.

  • Average House Price by Bathroom Count (Bar Chart)

1-bath house average = $286k, significantly lower than 2+ baths (ranging $334k–$400k). This confirms that entering with a 1-bath home saves ~$50k–$100k in upfront purchase price.

Takeaway: If renovation costs (bathroom + flex room) are less than this gap, you come out ahead.

  • Square Footage vs Price by Bathroom Count (Faceted Scatterplots)

Across facets, 1-bath homes show a lower slope but remain consistent with price per sq ft. 2-bath homes and above escalate faster.

Takeaway: Bathroom count interacts with square footage — 2-bath homes are “rewarded” more per additional square foot.

Overall Investment Insights

1-bath homes are cheaper, have lower GRM, and often come with larger lots — making them prime candidates for value-add projects.

2-bath homes are priced higher and already include the rent premium—you pay upfront but avoid renovation.

Given my goal, the numbers favor buying 1-bath homes and adding a bathroom/flex room if renovation costs are below the ~$50k+ premium the market already charges for 2-bath homes.

5 Statistical Test

5.1 T-Test

A t-test is a statistical tool used to determine if there is a significant difference between the means of two groups

5.1.1 Comparing Average Price by ZIP Code

t_test_price <- t.test(Price ~ factor(zip_code), data = DB3_ONE)

t_test_price

    Welch Two Sample t-test

data:  Price by factor(zip_code)
t = -4.3857, df = 100, p-value = 2.864e-05
alternative hypothesis: true difference in means between group 49423 and group 49424 is not equal to 0
95 percent confidence interval:
 -59200.90 -22322.32
sample estimates:
mean in group 49423 mean in group 49424 
           310158.9            350920.5 

Interpretation

49423 (Holland south side) is significantly more affordable on average than 49424 (Holland north side).

The ~$40k gap (midpoint of CI) is meaningful → I could enter the market cheaper in 49423.

The true difference in average prices is very likely between $22,000 and $59,000 cheaper.

This matters for my strategy because if 49423 homes also come with larger lots and cheaper taxes, they could be ideal candidates for 1-bath purchases and renovations.

On the other hand, 49424 may rent higher, so we’d want to compare rent-to-price ratios (GRM) next.

5.1.2 Comparing Average GRM by Zipcode

t_test_grm <- t.test(GRM ~ factor(zip_code), data = DB3_ONE)
t_test_grm

    Welch Two Sample t-test

data:  GRM by factor(zip_code)
t = -4.3857, df = 100, p-value = 2.864e-05
alternative hypothesis: true difference in means between group 49423 and group 49424 is not equal to 0
95 percent confidence interval:
 -2.1777974 -0.8211615
sample estimates:
mean in group 49423 mean in group 49424 
           11.40968            12.90916 

Interpretation

49423 isn’t just cheaper in price — it also gives me better rental efficiency.

A lower GRM means rent covers more of the property price, so I’ll get closer to cash flow or a stronger return.

With mean GRMs:

49423 → 11.4

49424 → 12.9

→ That’s about a 12% efficiency gap in favor of 49423.

5.1.3 Comparing Average Mortgage by ZIP Code**

t_test_Payment <- t.test(Total_Payment ~ factor(zip_code), data = DB3_ONE)
t_test_Payment

    Welch Two Sample t-test

data:  Total_Payment by factor(zip_code)
t = -3.2693, df = 100.78, p-value = 0.001476
alternative hypothesis: true difference in means between group 49423 and group 49424 is not equal to 0
95 percent confidence interval:
 -335.8264  -82.1807
sample estimates:
mean in group 49423 mean in group 49424 
           2168.868            2377.872 

Interpretation

Not only are homes in 49423 cheaper in purchase price, they also come with lower monthly payments.

That means investors in 49423 start with a lower financial burden each month compared to 49424.

5.1.3.1 Investment Takeaway

When I line up the three tests so far:

  • Price: 49423 homes are ~$22k–$59k cheaper.

  • GRM: 49423 has ~1.5 points lower GRM (better rental efficiency).

  • Total Payment: 49423 homes cost ~$82–$336 less per month to carry.

Together, that paints a clear picture: 49423 is the more cost-effective ZIP for my strategy (buying a 1-bath home, adding a bathroom/flex room, and pushing rents higher).

Evidence so far from Analysis

Average rent from actual Holland rental properties: using = $2,265.

Total payments in 49423 average $2,169, which is below average rent → this gives me wiggle room.

GRM in 49423 = 11.4, which is already better than 49424’s 12.9. If you can push rent above average, that GRM drops further, making the deal even stronger.

5.2 ANOVA

ANOVA (Analysis of Variance) is a statistical method used to determine if there are significant differences between the means of three or more groups by comparing the variance between groups.

Key things to know

If significant (p < 0.05), follow up with Tukey’s HSD to see which bathroom groups differ:

5.2.1 ANOVA: Price by Bathrooms

anova_price <- aov(Price ~ factor(Baths), data = DB3_ONE)
summary(anova_price)
              Df    Sum Sq   Mean Sq F value   Pr(>F)    
factor(Baths)  5 8.352e+10 1.670e+10   8.723 7.17e-07 ***
Residuals     98 1.877e+11 1.915e+09                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pr(>F) = 7.17e-07

Way below 0.05 → statistically significant.

This means average home prices do differ significantly by bathroom count.

TukeyHSD(anova_price)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Price ~ factor(Baths), data = DB3_ONE)

$`factor(Baths)`
              diff        lwr       upr     p adj
1.5-1    47914.286   12528.73  83299.85 0.0020939
2-1      52276.341   20223.34  84329.35 0.0001036
2.5-1    79679.670   36987.30 122372.04 0.0000061
3-1      72914.286  -56543.75 202372.33 0.5761877
3.5-1   112964.286   19858.72 206069.85 0.0081958
2-1.5     4362.056  -29159.78  37883.89 0.9989637
2.5-1.5  31765.385  -12040.52  75571.28 0.2918911
3-1.5    25000.000 -104829.51 154829.51 0.9933292
3.5-1.5  65050.000  -28571.39 158671.39 0.3388139
2.5-2    27403.329  -13757.50  68564.16 0.3872543
3-2      20637.944 -108323.13 149599.02 0.9972049
3.5-2    60687.944  -31725.38 153101.27 0.4029764
3-2.5    -6765.385 -138773.73 125242.96 0.9999895
3.5-2.5  33284.615  -63335.60 129904.83 0.9164354
3.5-3    40050.000 -115745.41 195845.41 0.9753555

Interpretation

Key Results (focusing on significant ones)

1.5 vs 1 bath → +$47,914 (p = 0.002)

1.5-bath homes are significantly more expensive than 1-bath homes.

2 vs 1 bath → +$52,276 (p < 0.001)

2-bath homes are significantly more expensive than 1-bath homes.

2.5 vs 1 bath → +$79,680 (p < 0.001)

2.5-bath homes are significantly more expensive than 1-bath homes.

3.5 vs 1 bath → +$112,964 (p = 0.008)

3.5-bath homes are significantly more expensive than 1-bath homes.

5.2.2 ANOVA: GRM by Bathrooms

anova_grm <- aov(GRM ~ factor(Baths), data = DB3_ONE)
summary(anova_grm)
              Df Sum Sq Mean Sq F value   Pr(>F)    
factor(Baths)  5    113  22.604   8.723 7.17e-07 ***
Residuals     98    254   2.591                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_grm)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = GRM ~ factor(Baths), data = DB3_ONE)

$`factor(Baths)`
              diff        lwr      upr     p adj
1.5-1    1.7626018  0.4608887 3.064315 0.0020939
2-1      1.9230668  0.7439470 3.102187 0.0001036
2.5-1    2.9311411  1.3606355 4.501647 0.0000061
3-1      2.6822659 -2.0800503 7.444582 0.5761877
3.5-1    4.1555676  0.7305339 7.580601 0.0081958
2-1.5    0.1604650 -1.0726881 1.393618 0.9989637
2.5-1.5  1.1685393 -0.4429292 2.780008 0.2918911
3-1.5    0.9196640 -3.8563173 5.695645 0.9933292
3.5-1.5  2.3929658 -1.0510433 5.836975 0.3388139
2.5-2    1.0080742 -0.5060913 2.522240 0.3872543
3-2      0.7591990 -3.9848356 5.503234 0.9972049
3.5-2    2.2325008 -1.1670676 5.632069 0.4029764
3-2.5   -0.2488752 -5.1050083 4.607258 0.9999895
3.5-2.5  1.2244265 -2.3298989 4.778752 0.9164354
3.5-3    1.4733018 -4.2578758 7.204479 0.9753555

Key Significant Differences

1.5 vs 1 bath → GRM higher by +1.76 (p = 0.002)

2 vs 1 bath → GRM higher by +1.92 (p < 0.001)

2.5 vs 1 bath → GRM higher by +2.93 (p < 0.001)

3.5 vs 1 bath → GRM higher by +4.16 (p = 0.008)

Interpretation: Properties with more bathrooms consistently have higher GRMs compared to 1-bath homes → meaning they’re less efficient rentals relative to price.

6 Model

6.1 Multiple Regression

Multiple regression is a statistical method that estimates how several independent variables (like square footage, bathrooms, and lot size) together explain or predict the value of a dependent variable (like home price).

rent_model <- lm(Price ~ factor(Baths) + Bed  + Square_Footage + Acre_Lot + factor(zip_code),
                 data = DB3_ONE)

summary(rent_model)

Call:
lm(formula = Price ~ factor(Baths) + Bed + Square_Footage + Acre_Lot + 
    factor(zip_code), data = DB3_ONE)

Residuals:
    Min      1Q  Median      3Q     Max 
-106734  -24617    5325   24218   87292 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           195018.93   22679.95   8.599 1.74e-13 ***
factor(Baths)1.5       18054.02   11579.82   1.559 0.122335    
factor(Baths)2         25969.40   11157.26   2.328 0.022080 *  
factor(Baths)2.5       51853.94   13768.51   3.766 0.000289 ***
factor(Baths)3         11367.15   43092.36   0.264 0.792522    
factor(Baths)3.5       41087.07   31672.07   1.297 0.197715    
Bed                     4909.85    6302.37   0.779 0.437908    
Square_Footage            49.38      12.77   3.868 0.000202 ***
Acre_Lot               21370.49    9693.01   2.205 0.029913 *  
factor(zip_code)49424  26943.37    8114.88   3.320 0.001281 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 37560 on 94 degrees of freedom
Multiple R-squared:  0.511, Adjusted R-squared:  0.4641 
F-statistic: 10.91 on 9 and 94 DF,  p-value: 1.849e-11
#install.packages("car")
#library(car)
#vif(rent_model)

6.1.1 Key Findings from the Coefficients

Key Regression Results

Bathrooms (reference = 1 bath)

1.5 baths: +$18,054 (p = 0.12 → not significant)

2 baths: +$25,969 (p = 0.022 → significant)

2.5 baths: +$51,853 (p < 0.001 → very significant)

3 baths: +$11,367 (p = 0.79 → not significant)

3.5 baths: +$41,087 (p = 0.20 → not significant)

Confirmed: the big jump is 1 → 2 baths (~$26K), then another ~$26K (51,853−25,969) when going 2 → 2.5 baths.

Square Footage

+$49.38 per sq.ft. (p < 0.001, highly significant)

Acre Lot

+$21,370 per acre (p = 0.029, significant)

Zip Code (49424 vs 49423 baseline)

+$26,943 premium (p = 0.001, very significant)

Bedrooms

+$490 per bed (not significant, p = 0.44)

Bedrooms don’t matter much in the dataset compared to bathrooms, sq.ft., and lot size.

Overall, the data suggest the best strategy is to target 1-bath homes in 49423 with decent square footage and lot size, then add a bathroom (and possibly a flex room) to capture both equity growth and stronger rental demand.

7 Example using the numbers from Regression Equation

Baseline Property (Made Up)

1 bath

1,277 sq.ft.

Price = $286,000

Add a Full Bath (50 sq.ft.)

  • Bathroom premium (1 → 2): +$25,969
    • Sq.ft. premium (50 × $49.38): +$2,469
    • Subtotal = +$28,438

Add a Half Bath

  • Bathroom premium (2 → 2.5): +$25,884 (51,853 − 25,969)
    • Subtotal = +$25,884

Add a Flex Room (500 sq.ft.)

  • Sq.ft. premium (500 × $49.38): +$24,690

Total Value Added

  • Full bath (with sq.ft.): +$28,438

  • Half bath: +$25,884

  • Flex room: +$24,690

  • Grand Total = +$79,012

Projected ARV

$286,000 + $79,012 = $365,012

In the example above, the regression told us:

  • Adding a full bathroom typically raises value by about $26K (1 → 2 baths).

  • Each extra square foot adds about $49.

  • And going from 2 baths to 2.5 adds another $26K.

By plugging those numbers into our “made up” property, we can estimate that renovating a $286K 1-bath home could realistically increase its value to around $365K.

The benefit: it gives investors and homeowners a data-driven estimate of how much renovations are worth in the market, instead of relying on gut feeling.

This code below is filtering the dataset to only include 1-bathroom homes in ZIP code 49423, then calculating the average monthly cost of ownership (Total_Payment = mortgage + taxes + insurance) for those properties. By doing this, I can see what the “typical” all-in mortgage payment looks like in 49423. Since earlier analysis showed that 49423 is more cost-effective compared to 49424, this step helps confirm how much lower the average monthly cost is when targeting entry-level 1-bath homes in that ZIP.

# Step 1: Filter to 49423 and 1-bath homes
one_bath_49423 <- DB3_ONE %>%
  filter(zip_code == "49423", Baths == 1)

# Step 2: Get average mortgage of those homes
avg_price_1bath_49423 <- mean(one_bath_49423$Total_Payment, na.rm = TRUE)


avg_price_1bath_49423 # Average Mortgage 
[1] 1991.574

We can use this number later in our analysis.

7.0.1 With my Numbers

  • Price = $286,000

GRM 8:

286,000÷(8×12)=286,000÷96= $2,979/month

GRM 9:

286,000÷(9×12)=286,000÷108=$2,648/month

I will take the Median of these two numbers and find the sweet spot

$2979 + $2648 = $5627/2 = $2813.50/month in rent - is that possible?

Market average rent in Holland: ~$2,265 (lower than this).

upgraded property (extra bath + flex space): positions me above average.

GRM 8–9 rents ($2,648–$2,979) are a stretch above market but not out of line if the house:

  • Has the added bathrooms (big deal for families)

  • Flex space makes it feel larger/modern.

  • Location is desirable.

  • You target the right tenants (families, professionals, etc.).

$2,813/month is possible

  • It’s not the “average” renter number, but it’s plausible with upgrades.

At that rent:

  • Mortgage = ~$1,991 (fixed). Spread = +$822/month before reserves.

8 Summary

By starting with a 1-bath home priced around $286,000 for example, I used GRM benchmarks to identify a target rent of about $2,813/month (the midpoint between GRM 8 and 9). While the current market average rent in Holland is lower (~$2,265), upgrades such as a full bath, half bath, and a flex space can position the property above average and make that target rent realistic.

The advantage of this approach:

Mortgage Stability → The VA loan keeps the monthly payment fixed at ~$1,991, even after renovations. At $2,813 rent, that leaves a spread of +$822/month before reserves.

DIY Cost Control → By completing much of the renovation work myself (with family/friend help), project costs are kept well below the ~$79K in added value estimated by regression. As long as the total DIY investment doesn’t exceed this value, I’m capturing equity at a discount while boosting rent potential.

✅ Conclusion: Buying a lower-cost 1-bath and strategically adding value through affordable DIY upgrades is more cost-effective than purchasing a 2-bath upfront, since the mortgage stays lower, equity increases, and rents can justify the improvements.

9 DIY ROI Scenario

DIY ROI Scenario

🛠️ Renovation Plan (Value-Add)

Full bath (50 sq.ft.) → +$28,438

Half bath (basement) → +$25,884

Flex room (500 sq.ft.) → +$24,690

Total Value Added = ~$79,012

💰 DIY Cost Assumptions Let’s say you can keep costs down with DIY work and family help:

Full bath addition (DIY + subs): ~$12,000

Half bath (basement): ~$5,000

Flex room (500 sq.ft.): ~$8,000

Total DIY Cost = ~$25,000

📊 ROI Calculation

Equity Gained: $79,012

DIY Costs: $25,000

Net Value Created: $54,012

ROI = (54,012/25,000) X 100 = 216%

9.0.1 Conclusion

Not only does the DIY strategy deliver a 216% ROI on renovations, but with rent around $2,813/month, the property can produce steady cash flow on top of added equity.

This is an example—results will vary property by property—but the beauty of the regression model is that we can plug any home’s numbers into the code and instantly update the projections as you can see below:

10 Code Chunks: Quick Property Value-Add Simulator

library(glue)
library(scales)

# ========= MANUAL INPUTS (from Realtor/Zillow) ==========================
price               <- 249000    # purchase price you type in
piti_month          <- 1737    # monthly PITI (typed in manually)
zip_code            <- 49423     # the house's ZIP (used for tax rate rules)
avg_rent            <- 2265.32   # your market rent Holland

# Value-add plan (same as before)
baths_current       <- 1
add_full_bath_sqft  <- 50
add_half_bath       <- TRUE
add_flex_sqft       <- 500

# Premiums from your regression
bath_premium_2_vs_1   <- 25969.40
bath_premium_2_5_vs_2 <- (51853.94 - 25969.40)
value_per_sqft        <- 49.38

# ========= TAX & INSURANCE RULES (recalc to match manual price) ===
# Property taxes: 49423 = 1.30%/yr, 49424 = 1.04%/yr, else 1.20%/yr
tax_rate_ann <- dplyr::case_when(
  zip_code == 49423 ~ 0.0130,
  zip_code == 49424 ~ 0.0104,
  TRUE              ~ 0.0120
)
tax_month <- price * tax_rate_ann / 12

# Insurance: 0.45%/yr (from your code)
ins_rate_ann <- 0.0045
ins_month    <- price * ins_rate_ann / 12

# (Optional) HOA if known, else 0
hoa_month <- 0

# ========= CALCULATIONS ================================================
# Value add
delta_full_bath <- if (baths_current == 1) bath_premium_2_vs_1 else 0   # 1 → 2 baths
delta_half_bath <- if (add_half_bath) bath_premium_2_5_vs_2 else 0      # + half bath
delta_sqft      <- value_per_sqft * (add_full_bath_sqft + add_flex_sqft) # added sqft value

value_add_total <- delta_full_bath + delta_half_bath + delta_sqft
arv_estimate    <- price + value_add_total

# GRM targets (what rent you'd need at GRM 7/8/9)
rent_grm7 <- price / (7 * 12)
rent_grm8 <- price / (8 * 12)
rent_grm9 <- price / (9 * 12)

# --- PITI vs GRM targets ------------------------------------

# coverage ratios (how many times rent covers PITI)
cov7 <- rent_grm7 / piti_month
cov8 <- rent_grm8 / piti_month
cov9 <- rent_grm9 / piti_month


# dollar gaps (rent target minus PITI)
gap7 <- rent_grm7 - piti_month
gap8 <- rent_grm8 - piti_month
gap9 <- rent_grm9 - piti_month

# implied GRM if you rented exactly at your PITI
grm_from_piti <- price / (piti_month * 12)


# ========= PRETTY OUTPUT ===============================================
# sq ft breakdown lines
sqft_lines <- c(
  if (add_full_bath_sqft > 0) glue("• {add_full_bath_sqft} sq ft for the new full bath"),
  if (add_flex_sqft > 0)      glue("• {add_flex_sqft} sq ft for the flex room")
)
sqft_lines <- paste(sqft_lines[nzchar(sqft_lines)], collapse = "\n")

knitr::asis_output(glue("
### Quick Property Value-Add Summary

**Purchase price:**  {dollar(price)}

**PITI:** {dollar(piti_month)}

**ZIP:**  {zip_code}  

**Planned improvements & value add**

- Converting from {baths_current} → 2 baths adds about:  {dollar(delta_full_bath)}.  
{if (add_half_bath) paste0('- Adding a half bath adds about:  ', dollar(delta_half_bath), '.') else ''}  
- Adding a total of {add_full_bath_sqft + add_flex_sqft} sq ft at:  ~{dollar(value_per_sqft)} per sq ft adds  {dollar(delta_sqft)}.  
{sqft_lines}

**Estimated value added (total):**  {dollar(value_add_total)}

**Projected ARV (after renos):**  {dollar(arv_estimate)}  

**Rent targets from GRM (given this price)**

- GRM 7 ⇒ ~{dollar(rent_grm7)} / month  
- GRM 8 ⇒ ~{dollar(rent_grm8)} / month  
- GRM 9 ⇒ ~{dollar(rent_grm9)} / month  

**Your PITI (entered):** {dollar(piti_month)} / month

**How targets compare to your PITI**

- GRM 7: target {dollar(rent_grm7)} → {if (gap7>=0) 'above' else 'below'} PITI by {dollar(abs(gap7))} (coverage: {round(cov7, 2)}×)

- GRM 8: target {dollar(rent_grm8)} → {if (gap8>=0) 'above' else 'below'} PITI by {dollar(abs(gap8))} (coverage: {round(cov8, 2)}×)

- GRM 9: target {dollar(rent_grm9)} → {if (gap9>=0) 'above' else 'below'} PITI by {dollar(abs(gap9))} (coverage: {round(cov9, 2)}×)

**Implied GRM at your PITI:** {round(grm_from_piti, 2)}

"))

10.0.1 Quick Property Value-Add Summary

Purchase price: $249,000

PITI: $1,737

ZIP: 49423

Planned improvements & value add

  • Converting from 1 → 2 baths adds about: $25,969.40.
  • Adding a half bath adds about: $25,884.54.
  • Adding a total of 550 sq ft at: ~$49.38 per sq ft adds $27,159.
    • 50 sq ft for the new full bath • 500 sq ft for the flex room

Estimated value added (total): $79,012.94

Projected ARV (after renos): $328,013

Rent targets from GRM (given this price)

  • GRM 7 ⇒ ~$2,964.29 / month
  • GRM 8 ⇒ ~$2,593.75 / month
  • GRM 9 ⇒ ~$2,305.56 / month

Your PITI (entered): $1,737 / month

How targets compare to your PITI

  • GRM 7: target $2,964.29 → above PITI by $1,227.29 (coverage: 1.71×)

  • GRM 8: target $2,593.75 → above PITI by $856.75 (coverage: 1.49×)

  • GRM 9: target $2,305.56 → above PITI by $568.56 (coverage: 1.33×)

Implied GRM at your PITI: 11.95