library(tidyverse)
library(ggplot2)
library(DT)
The core challenge of this project is to determine the most cost-effective entry point into the Holland, MI housing market using a VA loan. Specifically, the analysis seeks to answer whether it is better to purchase a more affordable 1-bathroom property and create value through DIY renovations (adding bathrooms and flex space), or to buy a higher-priced home that already has multiple bathrooms. By evaluating purchase prices, mortgage costs, rental income potential, and renovation value-adds, the objective is to identify a strategy that balances affordability, equity growth, and long-term rental cash flow stability.
DB1 <- read_csv("realtor.csv")
[1] "BrokerTitle_titleText__RL8Gk"
[2] "card-anchor href"
[3] "LazyPicturestyles__StyledLazyPicture-rui__pond98-0 src"
[4] "Tagstyles__StyledTag-rui__sc-9ogsft-0"
[5] "VisuallyHiddenstyles__StyledVisuallyHidden-rui__sc-1ohi4lq-0"
[6] "SavePropertyIconImage_saveIconImage__Zqz5o src"
[7] "VisuallyHiddenstyles__StyledVisuallyHidden-rui__sc-1ohi4lq-0 2"
[8] "base__StyledType-rui__sc-18muj27-0"
[9] "Price"
[10] "Bed"
[11] "Baths"
[12] "Square_Footage"
[13] "Acre_Lot"
[14] "zip_code"
# Clean up price column
DB1$Price <- gsub("\\$|,", "", DB1$Price) # remove $ and ,
DB1$Price <- as.numeric(DB1$Price) # convert to numeric
summary(DB1)
Price Bed Baths Square_Footage
Min. : 189900 Min. :1.000 Min. :1.000 Min. : 531
1st Qu.: 364900 1st Qu.:3.000 1st Qu.:2.000 1st Qu.: 1640
Median : 444900 Median :4.000 Median :2.500 Median : 2025
Mean : 651828 Mean :3.697 Mean :2.442 Mean : 2403
3rd Qu.: 584000 3rd Qu.:4.000 3rd Qu.:2.500 3rd Qu.: 2755
Max. :9999900 Max. :9.000 Max. :9.500 Max. :14415
Acre_Lot zip_code
Min. : 0.000 49419: 6
1st Qu.: 0.120 49423:137
Median : 0.250 49424:145
Mean : 0.775 49434: 1
3rd Qu.: 0.440 49453: 6
Max. :22.270 49464: 42
The z-score is a way of measuring how far something is from “normal” in terms of standard deviations (think of it like “steps” away from the middle).
A z-score of 0 = right at the average.
A z-score of +1 or −1 = close to average.
A z-score of +5 = way out there (like the mansion).
So when I say: “remove anything with a z-score above 3 or below −3,” I’m just saying:
“Take out the stuff that’s too far from the pack, because it doesn’t represent the normal market.” Which is an outlier.
Why It Matters?
By removing those extreme cases:
Your averages now tell a true story about the homes you’d actually buy.
You don’t get tricked into thinking the market is more expensive (or cheaper) than it really is.
# Create new z-score columns alongside your data
DB1$z_price <- scale(DB1$Price)
DB1$z_bed <- scale(DB1$Bed)
DB1$z_bath <- scale(DB1$Baths)
DB1$z_Square_Footage <- scale(DB1$Square_Footage)
DB1$z_Acre_Lot <- scale(DB1$Acre_Lot)
#-----------------------------------------------------
# variables I want to CHECK for outliers (still keep all other columns)
check_vars <- c("Price", "Bed", "Baths", "Square_Footage", "Acre_Lot")
DB3 <- DB1 %>%
# add z-score columns just for the vars you want to check
mutate(across(all_of(check_vars), ~ as.numeric(scale(.x)), .names = "z_{.col}")) %>%
# keep rows where ALL checked z-scores are within (-3, 3)
filter(if_all(paste0("z_", check_vars), ~ .x > -3 & .x < 3))
summary(DB3)
Price Bed Baths Square_Footage
Min. : 189900 Min. :1.000 Min. :1.000 Min. : 531
1st Qu.: 359900 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1607
Median : 439678 Median :4.000 Median :2.500 Median :1960
Mean : 514442 Mean :3.622 Mean :2.325 Mean :2201
3rd Qu.: 559656 3rd Qu.:4.000 3rd Qu.:2.500 3rd Qu.:2619
Max. :2599000 Max. :6.000 Max. :5.000 Max. :5400
Acre_Lot zip_code z_price.V1
Min. :0.0000 49419: 5 Min. :-0.522995245337
1st Qu.:0.0900 49423:131 1st Qu.:-0.330521144857
Median :0.2300 49424:136 Median :-0.240197012204
Mean :0.5158 49434: 1 Mean :-0.155548873549
3rd Qu.:0.4000 49453: 6 3rd Qu.:-0.104357000539
Max. :6.0300 49464: 41 Max. : 2.204589198580
z_bed.V1 z_bath.V1 z_Square_Footage
Min. :-2.7505313288100 Min. :-1.3768466771800 Min. :-1.2948
1st Qu.:-0.7110834568420 1st Qu.:-0.4221196602870 1st Qu.:-0.5505
Median : 0.3086404791400 Median : 0.0552438481584 Median :-0.3066
Mean :-0.0769426341532 Mean :-0.1118333797980 Mean :-0.1393
3rd Qu.: 0.3086404791400 3rd Qu.: 0.0552438481584 3rd Qu.: 0.1496
Max. : 2.3480883511000 Max. : 2.4420613903900 Max. : 2.0733
z_Acre_Lot z_Price z_Bed z_Baths
Min. :-0.3906 Min. :-0.5230 Min. :-2.75053 Min. :-1.37685
1st Qu.:-0.3452 1st Qu.:-0.3305 1st Qu.:-0.71108 1st Qu.:-0.42212
Median :-0.2747 Median :-0.2402 Median : 0.30864 Median : 0.05524
Mean :-0.1306 Mean :-0.1555 Mean :-0.07694 Mean :-0.11183
3rd Qu.:-0.1890 3rd Qu.:-0.1044 3rd Qu.: 0.30864 3rd Qu.: 0.05524
Max. : 2.6483 Max. : 2.2046 Max. : 2.34809 Max. : 2.44206
DB3_ONE <- select(DB3,Price:zip_code)
# Replace 0 with NA, then NA with mean for multiple columns
cols <- c("Price", "Bed", "Baths", "Square_Footage", "Acre_Lot") # choose your variables
for (col in cols) {
DB3[[col]][DB3[[col]] == 0] <- NA
DB3[[col]][is.na(DB3[[col]])] <- mean(DB3[[col]], na.rm = TRUE)
}
DB3_ONE <- DB3_ONE %>%
filter(Price >= 150000 & Price <= 400000)
To keep my analysis focused on realistic investment opportunities, I filtered the dataset to only include homes priced between $150,000 and $400,000. By focusing on this price band, I can better compare typical homes—rather than luxury properties or distressed sales—that align with the kinds of investments I may be considering.
In this section of my R project, I built the financial backbone of my housing market analysis for Holland, MI. Step by step, here’s what I’m doing:
Filter to Target ZIP Codes → I only keep homes in 49423 and 49424, the areas I’m focusing on for potential investment.
Calculate Mortgage → Using a 30-year VA loan at 5.27%, I model monthly principal & interest for each property.
Add Property Taxes → Since taxes vary by ZIP, I assign 1.30% for 49423 and 1.04% for 49424 (monthly basis).
Add Insurance → Estimate annual homeowner’s insurance at 0.45% of home value.
Market Rent → I set the average Holland rent at $2,265/month, which becomes the baseline for rent comparisons.
GRM (Gross Rent Multiplier) → I calculate GRM = Price ÷ Annual Rent, to measure rent efficiency relative to purchase price.
Total Payment → I sum Mortgage + Taxes + Insurance to see the all-in monthly cost of owning.
One Percent Rule → I create a quick-check variable for the classic 1% rule (does monthly rent ≥ 1% of property value?).
# Filtering to only Holland Zip codes
DB3_ONE <- DB3_ONE %>%
filter(zip_code %in% c("49423", "49424"))
# Creating new Variables for my analysis
loan_rate <- 0.0527 # 5.27% VA loan
n <- 30 * 12 # 30 years
DB3_ONE <- DB3_ONE %>%
mutate(
Mortgage = Price * ((loan_rate/12) * (1 + loan_rate/12)^n) / ((1 + loan_rate/12)^n - 1)
)
#--------------------------------------------------------------------------------
DB3_ONE <- DB3_ONE %>%
mutate(
Taxes = case_when(
zip_code == 49423 ~ Price * 0.0130 / 12, # 1.30% annual, monthly
zip_code == 49424 ~ Price * 0.0104 / 12, # 1.04% annual, monthly
TRUE ~ Price * 0.012 / 12 # fallback average (1.2%)
)
)
#-------------------------------------------------------------------------------
DB3_ONE <- DB3_ONE %>%
mutate(
Insurance = Price * 0.0045 / 12 # 0.45% annual, monthly
)
# ------------------------------------------------------------------------------
DB3_ONE$Average_Rent <- 2265.32 # Holland Market Average Rent
#-------------------------------------------------------------------------------
annual_rent <- 2265.32 * 12
DB3_ONE <- DB3_ONE %>%
mutate(
GRM = Price / annual_rent)
# ------------------------------------------------------------------------------
DB3_ONE <- DB3_ONE %>%
mutate(
Total_Payment = Mortgage + Taxes + Insurance
)
# -----------------------------------------------------------------------------
DB3_ONE <- DB3_ONE %>%
mutate(
One_Percent = Price * .01
)
summary(DB3_ONE)
Price Bed Baths Square_Footage
Min. :189900 Min. :1.000 Min. :1.000 Min. : 531
1st Qu.:295000 1st Qu.:3.000 1st Qu.:1.000 1st Qu.:1365
Median :339950 Median :3.000 Median :1.750 Median :1589
Mean :328972 Mean :3.279 Mean :1.716 Mean :1585
3rd Qu.:371586 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:1783
Max. :400000 Max. :6.000 Max. :3.500 Max. :2673
Acre_Lot zip_code Mortgage Taxes Insurance
Min. :0.0000 49419: 0 Min. :1051 Min. :190.6 Min. : 71.21
1st Qu.:0.1500 49423:56 1st Qu.:1633 1st Qu.:296.3 1st Qu.:110.62
Median :0.2300 49424:48 Median :1881 Median :320.9 Median :127.48
Mean :0.3086 49434: 0 Mean :1821 Mean :321.3 Mean :123.36
3rd Qu.:0.3100 49453: 0 3rd Qu.:2057 3rd Qu.:346.6 3rd Qu.:139.34
Max. :3.6900 49464: 0 Max. :2214 Max. :433.3 Max. :150.00
Average_Rent GRM Total_Payment One_Percent
Min. :2265 Min. : 6.986 Min. :1328 Min. :1899
1st Qu.:2265 1st Qu.:10.852 1st Qu.:2055 1st Qu.:2950
Median :2265 Median :12.506 Median :2337 Median :3400
Mean :2265 Mean :12.102 Mean :2265 Mean :3290
3rd Qu.:2265 3rd Qu.:13.669 3rd Qu.:2538 3rd Qu.:3716
Max. :2265 Max. :14.715 Max. :2797 Max. :4000
Key Takeaways from Descriptive Statistics
Looking at the homes I selected in the $189,900 to $400,000 price range, the median property is about $339,950, with a typical total monthly payment (mortgage, taxes, and insurance) of around $2,337.
Since the average market rent is $2,265, most properties in this range are close to break-even, so the way I approach value-add is crucial.
The GRM spans from 6.99 to 14.7, with lower GRMs being stronger rental candidates. Right now, the 1% rule doesn’t hold in this market, so I need to rely more on GRM and actual cash flow.
This is where bathrooms come in: buying a 1-bathroom house at the lower end of the price range may allow me to save upfront, and then add a second bathroom and a flex room to boost square footage and functionality.That could push the monthly rent above the market average, which would lower the GRM and make the property more attractive as an investment.
On the other hand, buying a 2-bathroom home with more space already built in comes with a higher upfront mortgage, but might also come with fewer renovation costs and faster rental readiness.
The key for me is to compare these scenarios side by side—average price, mortgage, acres, square footage, and bathrooms—to see whether adding value to a 1-bath home is more cost-effective than buying a 2-bath home outright.
To guide my decision, I will use multiple regression, ANOVA, and other statistical analyses to evaluate how factors like bathrooms, square footage, and lot size influence both property value and rental potential. The key for me is to compare these scenarios side by side—average price, mortgage, acres, square footage, and bathrooms—to see whether adding value to a 1-bath home is more cost-effective than buying a 2-bath home outright.
Takeaways from Charts
Clear upward trend: 1-bath homes average lower in price (~$286k) than 1.5, 2, or more bath homes (up to ~$400k). The jump in price from 1 bath → 2 baths is significant.
Takeaway: I can enter the market cheaper with 1 bath, but 2-bath homes carry a premium that may already price in the added value.
Square footage is strongly tied to price, but bath count shifts the trend upward. For the same square footage, 2-bath homes often sell higher than 1-bath homes. This suggests bathrooms add independent value beyond just square footage.
Takeaway: Adding a bathroom may “reposition” a house into a higher price bracket.
1-bath homes have lower GRMs (~11 median) than 2-bath homes (~13–14). Lower GRM = better rental value relative to price.
Takeaway: From an investor’s standpoint, 1-bath homes are better value buys. If you can add a bathroom without overspending, you could preserve this advantage while boosting rent.
1-bath and 1.5-bath homes have larger average lot sizes (~0.33–0.49 acres) compared to 2-bath and above (~0.28 or less).
Takeaway: More land = more flexibility for expansions (bathroom, flex room). This strengthens the case for targeting 1-bath homes.
Red dashed line = average rent ($2,265). 1-bath homes cluster closer to or below this line, while 2-bath homes often push above it.
Takeaway: Owning a 1-bath home is more likely to match or beat rental income, leaving more room for positive cash flow after improvements.
1-bath house average = $286k, significantly lower than 2+ baths (ranging $334k–$400k). This confirms that entering with a 1-bath home saves ~$50k–$100k in upfront purchase price.
Takeaway: If renovation costs (bathroom + flex room) are less than this gap, you come out ahead.
Across facets, 1-bath homes show a lower slope but remain consistent with price per sq ft. 2-bath homes and above escalate faster.
Takeaway: Bathroom count interacts with square footage — 2-bath homes are “rewarded” more per additional square foot.
Overall Investment Insights
1-bath homes are cheaper, have lower GRM, and often come with larger lots — making them prime candidates for value-add projects.
2-bath homes are priced higher and already include the rent premium—you pay upfront but avoid renovation.
Given my goal, the numbers favor buying 1-bath homes and adding a bathroom/flex room if renovation costs are below the ~$50k+ premium the market already charges for 2-bath homes.
A t-test is a statistical tool used to determine if there is a significant difference between the means of two groups
t_test_price <- t.test(Price ~ factor(zip_code), data = DB3_ONE)
t_test_price
Welch Two Sample t-test
data: Price by factor(zip_code)
t = -4.3857, df = 100, p-value = 2.864e-05
alternative hypothesis: true difference in means between group 49423 and group 49424 is not equal to 0
95 percent confidence interval:
-59200.90 -22322.32
sample estimates:
mean in group 49423 mean in group 49424
310158.9 350920.5
Interpretation
49423 (Holland south side) is significantly more affordable on average than 49424 (Holland north side).
The ~$40k gap (midpoint of CI) is meaningful → I could enter the market cheaper in 49423.
The true difference in average prices is very likely between $22,000 and $59,000 cheaper.
This matters for my strategy because if 49423 homes also come with larger lots and cheaper taxes, they could be ideal candidates for 1-bath purchases and renovations.
On the other hand, 49424 may rent higher, so we’d want to compare rent-to-price ratios (GRM) next.
t_test_grm <- t.test(GRM ~ factor(zip_code), data = DB3_ONE)
t_test_grm
Welch Two Sample t-test
data: GRM by factor(zip_code)
t = -4.3857, df = 100, p-value = 2.864e-05
alternative hypothesis: true difference in means between group 49423 and group 49424 is not equal to 0
95 percent confidence interval:
-2.1777974 -0.8211615
sample estimates:
mean in group 49423 mean in group 49424
11.40968 12.90916
Interpretation
49423 isn’t just cheaper in price — it also gives me better rental efficiency.
A lower GRM means rent covers more of the property price, so I’ll get closer to cash flow or a stronger return.
With mean GRMs:
49423 → 11.4
49424 → 12.9
→ That’s about a 12% efficiency gap in favor of 49423.
t_test_Payment <- t.test(Total_Payment ~ factor(zip_code), data = DB3_ONE)
t_test_Payment
Welch Two Sample t-test
data: Total_Payment by factor(zip_code)
t = -3.2693, df = 100.78, p-value = 0.001476
alternative hypothesis: true difference in means between group 49423 and group 49424 is not equal to 0
95 percent confidence interval:
-335.8264 -82.1807
sample estimates:
mean in group 49423 mean in group 49424
2168.868 2377.872
Interpretation
Not only are homes in 49423 cheaper in purchase price, they also come with lower monthly payments.
That means investors in 49423 start with a lower financial burden each month compared to 49424.
When I line up the three tests so far:
Price: 49423 homes are ~$22k–$59k cheaper.
GRM: 49423 has ~1.5 points lower GRM (better rental efficiency).
Total Payment: 49423 homes cost ~$82–$336 less per month to carry.
Together, that paints a clear picture: 49423 is the more cost-effective ZIP for my strategy (buying a 1-bath home, adding a bathroom/flex room, and pushing rents higher).
Evidence so far from Analysis
Average rent from actual Holland rental properties: using = $2,265.
Total payments in 49423 average $2,169, which is below average rent → this gives me wiggle room.
GRM in 49423 = 11.4, which is already better than 49424’s 12.9. If you can push rent above average, that GRM drops further, making the deal even stronger.
ANOVA (Analysis of Variance) is a statistical method used to determine if there are significant differences between the means of three or more groups by comparing the variance between groups.
Key things to know
If significant (p < 0.05), follow up with Tukey’s HSD to see which bathroom groups differ:
anova_price <- aov(Price ~ factor(Baths), data = DB3_ONE)
summary(anova_price)
Df Sum Sq Mean Sq F value Pr(>F)
factor(Baths) 5 8.352e+10 1.670e+10 8.723 7.17e-07 ***
Residuals 98 1.877e+11 1.915e+09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pr(>F) = 7.17e-07
Way below 0.05 → statistically significant.
This means average home prices do differ significantly by bathroom count.
TukeyHSD(anova_price)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Price ~ factor(Baths), data = DB3_ONE)
$`factor(Baths)`
diff lwr upr p adj
1.5-1 47914.286 12528.73 83299.85 0.0020939
2-1 52276.341 20223.34 84329.35 0.0001036
2.5-1 79679.670 36987.30 122372.04 0.0000061
3-1 72914.286 -56543.75 202372.33 0.5761877
3.5-1 112964.286 19858.72 206069.85 0.0081958
2-1.5 4362.056 -29159.78 37883.89 0.9989637
2.5-1.5 31765.385 -12040.52 75571.28 0.2918911
3-1.5 25000.000 -104829.51 154829.51 0.9933292
3.5-1.5 65050.000 -28571.39 158671.39 0.3388139
2.5-2 27403.329 -13757.50 68564.16 0.3872543
3-2 20637.944 -108323.13 149599.02 0.9972049
3.5-2 60687.944 -31725.38 153101.27 0.4029764
3-2.5 -6765.385 -138773.73 125242.96 0.9999895
3.5-2.5 33284.615 -63335.60 129904.83 0.9164354
3.5-3 40050.000 -115745.41 195845.41 0.9753555
Interpretation
Key Results (focusing on significant ones)
1.5 vs 1 bath → +$47,914 (p = 0.002)
1.5-bath homes are significantly more expensive than 1-bath homes.
2 vs 1 bath → +$52,276 (p < 0.001)
2-bath homes are significantly more expensive than 1-bath homes.
2.5 vs 1 bath → +$79,680 (p < 0.001)
2.5-bath homes are significantly more expensive than 1-bath homes.
3.5 vs 1 bath → +$112,964 (p = 0.008)
3.5-bath homes are significantly more expensive than 1-bath homes.
anova_grm <- aov(GRM ~ factor(Baths), data = DB3_ONE)
summary(anova_grm)
Df Sum Sq Mean Sq F value Pr(>F)
factor(Baths) 5 113 22.604 8.723 7.17e-07 ***
Residuals 98 254 2.591
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_grm)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = GRM ~ factor(Baths), data = DB3_ONE)
$`factor(Baths)`
diff lwr upr p adj
1.5-1 1.7626018 0.4608887 3.064315 0.0020939
2-1 1.9230668 0.7439470 3.102187 0.0001036
2.5-1 2.9311411 1.3606355 4.501647 0.0000061
3-1 2.6822659 -2.0800503 7.444582 0.5761877
3.5-1 4.1555676 0.7305339 7.580601 0.0081958
2-1.5 0.1604650 -1.0726881 1.393618 0.9989637
2.5-1.5 1.1685393 -0.4429292 2.780008 0.2918911
3-1.5 0.9196640 -3.8563173 5.695645 0.9933292
3.5-1.5 2.3929658 -1.0510433 5.836975 0.3388139
2.5-2 1.0080742 -0.5060913 2.522240 0.3872543
3-2 0.7591990 -3.9848356 5.503234 0.9972049
3.5-2 2.2325008 -1.1670676 5.632069 0.4029764
3-2.5 -0.2488752 -5.1050083 4.607258 0.9999895
3.5-2.5 1.2244265 -2.3298989 4.778752 0.9164354
3.5-3 1.4733018 -4.2578758 7.204479 0.9753555
Key Significant Differences
1.5 vs 1 bath → GRM higher by +1.76 (p = 0.002)
2 vs 1 bath → GRM higher by +1.92 (p < 0.001)
2.5 vs 1 bath → GRM higher by +2.93 (p < 0.001)
3.5 vs 1 bath → GRM higher by +4.16 (p = 0.008)
Interpretation: Properties with more bathrooms consistently have higher GRMs compared to 1-bath homes → meaning they’re less efficient rentals relative to price.
Multiple regression is a statistical method that estimates how several independent variables (like square footage, bathrooms, and lot size) together explain or predict the value of a dependent variable (like home price).
rent_model <- lm(Price ~ factor(Baths) + Bed + Square_Footage + Acre_Lot + factor(zip_code),
data = DB3_ONE)
summary(rent_model)
Call:
lm(formula = Price ~ factor(Baths) + Bed + Square_Footage + Acre_Lot +
factor(zip_code), data = DB3_ONE)
Residuals:
Min 1Q Median 3Q Max
-106734 -24617 5325 24218 87292
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 195018.93 22679.95 8.599 1.74e-13 ***
factor(Baths)1.5 18054.02 11579.82 1.559 0.122335
factor(Baths)2 25969.40 11157.26 2.328 0.022080 *
factor(Baths)2.5 51853.94 13768.51 3.766 0.000289 ***
factor(Baths)3 11367.15 43092.36 0.264 0.792522
factor(Baths)3.5 41087.07 31672.07 1.297 0.197715
Bed 4909.85 6302.37 0.779 0.437908
Square_Footage 49.38 12.77 3.868 0.000202 ***
Acre_Lot 21370.49 9693.01 2.205 0.029913 *
factor(zip_code)49424 26943.37 8114.88 3.320 0.001281 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 37560 on 94 degrees of freedom
Multiple R-squared: 0.511, Adjusted R-squared: 0.4641
F-statistic: 10.91 on 9 and 94 DF, p-value: 1.849e-11
#install.packages("car")
#library(car)
#vif(rent_model)
Key Regression Results
Bathrooms (reference = 1 bath)
1.5 baths: +$18,054 (p = 0.12 → not significant)
2 baths: +$25,969 (p = 0.022 → significant)
2.5 baths: +$51,853 (p < 0.001 → very significant)
3 baths: +$11,367 (p = 0.79 → not significant)
3.5 baths: +$41,087 (p = 0.20 → not significant)
Confirmed: the big jump is 1 → 2 baths (~$26K), then another ~$26K (51,853−25,969) when going 2 → 2.5 baths.
Square Footage
+$49.38 per sq.ft. (p < 0.001, highly significant)
Acre Lot
+$21,370 per acre (p = 0.029, significant)
Zip Code (49424 vs 49423 baseline)
+$26,943 premium (p = 0.001, very significant)
Bedrooms
+$490 per bed (not significant, p = 0.44)
Bedrooms don’t matter much in the dataset compared to bathrooms, sq.ft., and lot size.
Overall, the data suggest the best strategy is to target 1-bath homes in 49423 with decent square footage and lot size, then add a bathroom (and possibly a flex room) to capture both equity growth and stronger rental demand.
Baseline Property (Made Up)
1 bath
1,277 sq.ft.
Price = $286,000
Add a Full Bath (50 sq.ft.)
Add a Half Bath
Add a Flex Room (500 sq.ft.)
Total Value Added
Full bath (with sq.ft.): +$28,438
Half bath: +$25,884
Flex room: +$24,690
Grand Total = +$79,012
Projected ARV
$286,000 + $79,012 = $365,012
In the example above, the regression told us:
Adding a full bathroom typically raises value by about $26K (1 → 2 baths).
Each extra square foot adds about $49.
And going from 2 baths to 2.5 adds another $26K.
By plugging those numbers into our “made up” property, we can estimate that renovating a $286K 1-bath home could realistically increase its value to around $365K.
The benefit: it gives investors and homeowners a data-driven estimate of how much renovations are worth in the market, instead of relying on gut feeling.
This code below is filtering the dataset to only include 1-bathroom homes in ZIP code 49423, then calculating the average monthly cost of ownership (Total_Payment = mortgage + taxes + insurance) for those properties. By doing this, I can see what the “typical” all-in mortgage payment looks like in 49423. Since earlier analysis showed that 49423 is more cost-effective compared to 49424, this step helps confirm how much lower the average monthly cost is when targeting entry-level 1-bath homes in that ZIP.
# Step 1: Filter to 49423 and 1-bath homes
one_bath_49423 <- DB3_ONE %>%
filter(zip_code == "49423", Baths == 1)
# Step 2: Get average mortgage of those homes
avg_price_1bath_49423 <- mean(one_bath_49423$Total_Payment, na.rm = TRUE)
avg_price_1bath_49423 # Average Mortgage
[1] 1991.574
We can use this number later in our analysis.
GRM 8:
286,000÷(8×12)=286,000÷96= $2,979/month
GRM 9:
286,000÷(9×12)=286,000÷108=$2,648/month
I will take the Median of these two numbers and find the sweet spot
$2979 + $2648 = $5627/2 = $2813.50/month in rent - is that possible?
Market average rent in Holland: ~$2,265 (lower than this).
upgraded property (extra bath + flex space): positions me above average.
GRM 8–9 rents ($2,648–$2,979) are a stretch above market but not out of line if the house:
Has the added bathrooms (big deal for families)
Flex space makes it feel larger/modern.
Location is desirable.
You target the right tenants (families, professionals, etc.).
$2,813/month is possible
At that rent:
By starting with a 1-bath home priced around $286,000 for example, I used GRM benchmarks to identify a target rent of about $2,813/month (the midpoint between GRM 8 and 9). While the current market average rent in Holland is lower (~$2,265), upgrades such as a full bath, half bath, and a flex space can position the property above average and make that target rent realistic.
The advantage of this approach:
Mortgage Stability → The VA loan keeps the monthly payment fixed at ~$1,991, even after renovations. At $2,813 rent, that leaves a spread of +$822/month before reserves.
DIY Cost Control → By completing much of the renovation work myself (with family/friend help), project costs are kept well below the ~$79K in added value estimated by regression. As long as the total DIY investment doesn’t exceed this value, I’m capturing equity at a discount while boosting rent potential.
✅ Conclusion: Buying a lower-cost 1-bath and strategically adding value through affordable DIY upgrades is more cost-effective than purchasing a 2-bath upfront, since the mortgage stays lower, equity increases, and rents can justify the improvements.
DIY ROI Scenario
🛠️ Renovation Plan (Value-Add)
Full bath (50 sq.ft.) → +$28,438
Half bath (basement) → +$25,884
Flex room (500 sq.ft.) → +$24,690
Total Value Added = ~$79,012
💰 DIY Cost Assumptions Let’s say you can keep costs down with DIY work and family help:
Full bath addition (DIY + subs): ~$12,000
Half bath (basement): ~$5,000
Flex room (500 sq.ft.): ~$8,000
Total DIY Cost = ~$25,000
📊 ROI Calculation
Equity Gained: $79,012
DIY Costs: $25,000
Net Value Created: $54,012
ROI = (54,012/25,000) X 100 = 216%
Not only does the DIY strategy deliver a 216% ROI on renovations, but with rent around $2,813/month, the property can produce steady cash flow on top of added equity.
This is an example—results will vary property by property—but the beauty of the regression model is that we can plug any home’s numbers into the code and instantly update the projections as you can see below:
library(glue)
library(scales)
# ========= MANUAL INPUTS (from Realtor/Zillow) ==========================
price <- 249000 # purchase price you type in
piti_month <- 1737 # monthly PITI (typed in manually)
zip_code <- 49423 # the house's ZIP (used for tax rate rules)
avg_rent <- 2265.32 # your market rent Holland
# Value-add plan (same as before)
baths_current <- 1
add_full_bath_sqft <- 50
add_half_bath <- TRUE
add_flex_sqft <- 500
# Premiums from your regression
bath_premium_2_vs_1 <- 25969.40
bath_premium_2_5_vs_2 <- (51853.94 - 25969.40)
value_per_sqft <- 49.38
# ========= TAX & INSURANCE RULES (recalc to match manual price) ===
# Property taxes: 49423 = 1.30%/yr, 49424 = 1.04%/yr, else 1.20%/yr
tax_rate_ann <- dplyr::case_when(
zip_code == 49423 ~ 0.0130,
zip_code == 49424 ~ 0.0104,
TRUE ~ 0.0120
)
tax_month <- price * tax_rate_ann / 12
# Insurance: 0.45%/yr (from your code)
ins_rate_ann <- 0.0045
ins_month <- price * ins_rate_ann / 12
# (Optional) HOA if known, else 0
hoa_month <- 0
# ========= CALCULATIONS ================================================
# Value add
delta_full_bath <- if (baths_current == 1) bath_premium_2_vs_1 else 0 # 1 → 2 baths
delta_half_bath <- if (add_half_bath) bath_premium_2_5_vs_2 else 0 # + half bath
delta_sqft <- value_per_sqft * (add_full_bath_sqft + add_flex_sqft) # added sqft value
value_add_total <- delta_full_bath + delta_half_bath + delta_sqft
arv_estimate <- price + value_add_total
# GRM targets (what rent you'd need at GRM 7/8/9)
rent_grm7 <- price / (7 * 12)
rent_grm8 <- price / (8 * 12)
rent_grm9 <- price / (9 * 12)
# --- PITI vs GRM targets ------------------------------------
# coverage ratios (how many times rent covers PITI)
cov7 <- rent_grm7 / piti_month
cov8 <- rent_grm8 / piti_month
cov9 <- rent_grm9 / piti_month
# dollar gaps (rent target minus PITI)
gap7 <- rent_grm7 - piti_month
gap8 <- rent_grm8 - piti_month
gap9 <- rent_grm9 - piti_month
# implied GRM if you rented exactly at your PITI
grm_from_piti <- price / (piti_month * 12)
# ========= PRETTY OUTPUT ===============================================
# sq ft breakdown lines
sqft_lines <- c(
if (add_full_bath_sqft > 0) glue("• {add_full_bath_sqft} sq ft for the new full bath"),
if (add_flex_sqft > 0) glue("• {add_flex_sqft} sq ft for the flex room")
)
sqft_lines <- paste(sqft_lines[nzchar(sqft_lines)], collapse = "\n")
knitr::asis_output(glue("
### Quick Property Value-Add Summary
**Purchase price:** {dollar(price)}
**PITI:** {dollar(piti_month)}
**ZIP:** {zip_code}
**Planned improvements & value add**
- Converting from {baths_current} → 2 baths adds about: {dollar(delta_full_bath)}.
{if (add_half_bath) paste0('- Adding a half bath adds about: ', dollar(delta_half_bath), '.') else ''}
- Adding a total of {add_full_bath_sqft + add_flex_sqft} sq ft at: ~{dollar(value_per_sqft)} per sq ft adds {dollar(delta_sqft)}.
{sqft_lines}
**Estimated value added (total):** {dollar(value_add_total)}
**Projected ARV (after renos):** {dollar(arv_estimate)}
**Rent targets from GRM (given this price)**
- GRM 7 ⇒ ~{dollar(rent_grm7)} / month
- GRM 8 ⇒ ~{dollar(rent_grm8)} / month
- GRM 9 ⇒ ~{dollar(rent_grm9)} / month
**Your PITI (entered):** {dollar(piti_month)} / month
**How targets compare to your PITI**
- GRM 7: target {dollar(rent_grm7)} → {if (gap7>=0) 'above' else 'below'} PITI by {dollar(abs(gap7))} (coverage: {round(cov7, 2)}×)
- GRM 8: target {dollar(rent_grm8)} → {if (gap8>=0) 'above' else 'below'} PITI by {dollar(abs(gap8))} (coverage: {round(cov8, 2)}×)
- GRM 9: target {dollar(rent_grm9)} → {if (gap9>=0) 'above' else 'below'} PITI by {dollar(abs(gap9))} (coverage: {round(cov9, 2)}×)
**Implied GRM at your PITI:** {round(grm_from_piti, 2)}
"))
Purchase price: $249,000
PITI: $1,737
ZIP: 49423
Planned improvements & value add
Estimated value added (total): $79,012.94
Projected ARV (after renos): $328,013
Rent targets from GRM (given this price)
Your PITI (entered): $1,737 / month
How targets compare to your PITI
GRM 7: target $2,964.29 → above PITI by $1,227.29 (coverage: 1.71×)
GRM 8: target $2,593.75 → above PITI by $856.75 (coverage: 1.49×)
GRM 9: target $2,305.56 → above PITI by $568.56 (coverage: 1.33×)
Implied GRM at your PITI: 11.95