Home valuations or assessed market value are used by local governments to calculate property taxes. A higher assessed value leads to higher property taxes while undervaluation leads to lower taxes hence it is essential to have an accurate estimate of a home’s market value. Inaccuracies in property market value assessments can lead to overvaluation, resulting in disproportionately high tax burdens. This report examines the 2025 market value assessment of 6321 88th Street, valued at $538,409, using data from 45 comparable properties along 88th Street. By analyzing key factors such as improvement value, land market value, main area in footage and value, garage footage and value, and the land footage to determine influential factors, and determine whether the home in address 6321 88TH Street is overvalued or undervalued using the multiple linear regression model. The findings advocate for re-evaluation to align the home’s taxable value with empirical evidence, ensuring fairness and equity in property taxation.
The multiple linear regression equation is given by \[ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_nX_n + \varepsilon \]
Where:
-\(Y: Dependent Variable\) -\(X_1, X_2, \dots, X_n: IndependentVariables\) -\(\beta_0: Intercept\) -\(\beta_1, \dots, \beta_n: Coefficients\) -$: Error $
The 2025 market value assessment of $538,409 for the home at 631 88th Street is overvalued, resulting in an unfairly high property tax burden. This report demonstrates this using data from 45 neighboring properties along 88th Street (addresses 6303–6351). Variables analyzed include:
2025_Market_ValueImprovement_Market_Value (value of structures on the
property)Total_Land_Market_ValueMain_Area_Sq_Ft (square footage of the main living
area)Main_Area_ValueGarage_Sq_FtGarage_ValueLand_Sq_FtThe goal of this project is to show that the assessed value of home at 6321 88th Street exceeds or lower than the statistically reasonable range, and urge the county tax assessor to re-evaluate the home’s value and adjust taxes accordingly.
Load libraries
#load necessary libraries
library(dplyr)
library(ggplot2)
library(car)
library(MASS)
property <- read.csv("https://raw.githubusercontent.com/Ahmedja96/IE-5320-Project-2-Dataset/refs/heads/main/IE%205344%20Project%202%20Dataset.csv")
str(property)
## 'data.frame': 45 obs. of 8 variables:
## $ X2025_Market_Value : int 531703 504815 573558 469131 1218146 569992 602427 460288 968766 550119 ...
## $ Improvement_Market_Value: int 485373 458572 527274 422975 116617 511992 538677 415135 888796 505175 ...
## $ Total_Land_Market_Value : int 46330 46243 46284 46156 101529 58000 63750 45153 79970 44944 ...
## $ Main_Area_Sq_Ft : int 2743 2610 2851 2991 3097 3036 2877 2241 2041 2582 ...
## $ Main_Area_Value : int 449668 419843 460543 390541 624126 447924 432168 376845 367600 433934 ...
## $ Garage_Sq_Ft : int 484 525 918 552 1095 575 909 506 550 550 ...
## $ Garage_Value : int 35705 38729 66731 32434 96763 38175 61445 38290 44577 41595 ...
## $ Land_Sq_Ft : int 7988 7973 7980 7958 17505 10000 10625 7785 13788 7749 ...
summary(property)
## X2025_Market_Value Improvement_Market_Value Total_Land_Market_Value
## Min. : 418286 Min. : 116617 Min. : 43506
## 1st Qu.: 504815 1st Qu.: 458572 1st Qu.: 45112
## Median : 534991 Median : 485962 Median : 45658
## Mean : 575245 Mean : 502189 Mean : 50834
## 3rd Qu.: 573558 3rd Qu.: 527274 3rd Qu.: 46330
## Max. :1218146 Max. :1116617 Max. :101529
## Main_Area_Sq_Ft Main_Area_Value Garage_Sq_Ft Garage_Value
## Min. :2041 Min. :331544 Min. : 325.0 Min. :24934
## 1st Qu.:2610 1st Qu.:415934 1st Qu.: 506.0 1st Qu.:36674
## Median :2745 Median :443268 Median : 528.0 Median :38729
## Mean :2721 Mean :440626 Mean : 570.9 Mean :41673
## 3rd Qu.:2902 3rd Qu.:453758 3rd Qu.: 552.0 3rd Qu.:41033
## Max. :3219 Max. :624126 Max. :1119.0 Max. :96763
## Land_Sq_Ft
## Min. : 7501
## 1st Qu.: 7778
## Median : 7872
## Mean : 8756
## 3rd Qu.: 7988
## Max. :17505
Exploratory data analysis (EDA) & Visualizing using a Scatter plot
#Scatter plot matrix
plot(property$Improvement_Market_Value, property$`2025_Market_Value`,
main = "Improvement Market Value vs 2025 Value",
xlab = "Improvement Value", ylab = "2025 Market Value",
col = "blue")
plot(property$Total_Land_Market_Value, property$`2025_Market_Value`,
main = "Total Land Value vs 2025 Value",
xlab = "Land Value", ylab = "2025 Market Value",
col = "blue")
plot(property$Main_Area_Sq_Ft, property$`2025_Market_Value`,
main = "Main Area vs 2025 Value",
xlab = "Sq Ft", ylab = "2025 Market Value",
col = "blue")
plot(property$Main_Area_Value, property$`2025_Market_Value`,
main = "Main Area Value vs 2025 Value",
xlab = "Value", ylab = "2025 Market Value",
col = "blue")
plot(property$Garage_Sq_Ft, property$`2025_Market_Value`,
main = "Garage Sq Ft vs 2025 Value",
xlab = "Sq Ft", ylab = "2025 Market Value",
col = "blue")
plot(property$Garage_Value, property$`2025_Market_Value`,
main = "Garage Value vs 2025 Value",
xlab = "Value", ylab = "2025 Market Value",
col = "blue")
plot(property$Land_Sq_Ft, property$`2025_Market_Value`,
main = "Land Sq Ft vs 2025 Value",
xlab = "Sq Ft", ylab = "2025 Market Value",
col = "blue")
The predictor Total_Main_Area_Sq_Ft was removed because Main_Area_Sq_Ft and Garage Sq Ft are used. Fitting initial Multiple Linear Regression model
#fit initial multiple regression model
model_initial <- lm(X2025_Market_Value ~ ., data = property)
summary(model_initial)
##
## Call:
## lm(formula = X2025_Market_Value ~ ., data = property)
##
## Residuals:
## Min 1Q Median 3Q Max
## -107849 -9540 -1424 15793 109994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.327e+05 6.084e+04 -2.182 0.0356 *
## Improvement_Market_Value 4.802e-01 7.283e-02 6.593 9.93e-08 ***
## Total_Land_Market_Value 3.355e+01 2.221e+01 1.511 0.1393
## Main_Area_Sq_Ft 6.740e+02 1.461e+02 4.614 4.62e-05 ***
## Main_Area_Value -3.813e+00 8.476e-01 -4.499 6.56e-05 ***
## Garage_Sq_Ft -3.975e+03 7.147e+02 -5.561 2.47e-06 ***
## Garage_Value 5.472e+01 9.524e+00 5.746 1.39e-06 ***
## Land_Sq_Ft -1.603e+02 1.308e+02 -1.226 0.2281
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34410 on 37 degrees of freedom
## Multiple R-squared: 0.9628, Adjusted R-squared: 0.9558
## F-statistic: 136.9 on 7 and 37 DF, p-value: < 2.2e-16
The initial model included all variables to predict
2025_Market_Value: \(2025_Market_Value = Improvement_Market_Value X1 +
Total_Land_Market_Value X2 + Main_Area_Sq_Ft X3 + Main_Area_Value X4 +
Garage_Sq_Ft X5 + Garage_Value X6 + Land_Sq_Ft X7\) where X1 to
X7 are coefficients. The multiple regression model is. \[Market Value =−132,700+0.48(Improvement Market
Value)+33.55(Land Market Value)+4,649(Main Area Sq Ft)−3.81(Main Area
Value)+54.72(Garage Value)−160.3(Land Sq Ft)\] The model has a
high R-squared value of 0.9628, indicating that approximately 96.3% of
the variability in market value can be explained by the included
predictors. The adjusted R-squared of 0.9558 confirms the model’s strong
explanatory power while accounting for the number of predictors. The
F-statistic of 136.9 with a p-value less than 2.2e-16 suggests the
overall model is highly significant. The level of significance used to
determine significant factors is 0.05. Most of the predictors were
statistically significant at the 0.05 level, including Improvement
Market Value, Main Area Square Footage, Main Area Value, Garage Square
Footage, and Garage Value. Notably, Improvement Market Value has a
positive relationship with the 2025 market value, while Main Area Value
and Garage Sq Ft show a negative impact, which may warrant further
investigation or could indicate multicollinearity. On the other hand,
Total Land Market Value and Land Square Footage were not statistically
significant in this model.
For the multiple linear regression model to be valid and reliable, key assumptions must be satisfied. First, we check for linearity which assumes a straight-line relationship between the independent variables and the dependent variable. This ensures that the model accurately captures the true relationship. Also, we check for independence means that the residuals (errors) are not correlated with one another. The other assumption is homoscedasticity which requires that the residuals have constant variance across all levels of the predictors, any patterns or funnel shapes in residual plots may indicate a violation. The fourth assumption is normality of residuals which ensures that hypothesis tests and confidence intervals derived from the model are valid. This is typically assessed using a Q-Q plot. Lastly, we check that the model assumes no multicollinearity, meaning that the independent variables are not highly correlated with each other, as multicollinearity can inflate standard errors and make coefficient estimates unstable.
Checking for Multicollinearity
#check for multicollinearity using VIF
vif_values <- vif(model_initial)
print(vif_values)
## Improvement_Market_Value Total_Land_Market_Value Main_Area_Sq_Ft
## 3.57135 3369.20445 56.00032
## Main_Area_Value Garage_Sq_Ft Garage_Value
## 93.54359 460.41418 548.87583
## Land_Sq_Ft
## 3451.40079
The Variance Inflation Factors (VIFs) reveals severe multicollinearity. Total_Land_Market_Value (VIF = 3,369) and Land_Sq_Ft (VIF = 3,451) are highly collinear while Garage_Value (VIF = 549) and Main_Area_Value (VIF = 94) also show redundancy. #### plot initial model
#model diagnostics
plot(model_initial, which = 1:4)
#check for influential points
cooksd <- cooks.distance(model_initial)
plot(cooksd, main = "Cook's Distance")
abline(h = 4/nrow(property), col = "red")
Cook’s Distance plot identifies influential observations (points above the red line at 4/n). These points disproportionately affect the model such as row 5 with Market_Value of 1,218,146 and it will be removed.
# remove row 5
property <- property[-5, ]
head(property)
## X2025_Market_Value Improvement_Market_Value Total_Land_Market_Value
## 1 531703 485373 46330
## 2 504815 458572 46243
## 3 573558 527274 46284
## 4 469131 422975 46156
## 6 569992 511992 58000
## 7 602427 538677 63750
## Main_Area_Sq_Ft Main_Area_Value Garage_Sq_Ft Garage_Value Land_Sq_Ft
## 1 2743 449668 484 35705 7988
## 2 2610 419843 525 38729 7973
## 3 2851 460543 918 66731 7980
## 4 2991 390541 552 32434 7958
## 6 3036 447924 575 38175 10000
## 7 2877 432168 909 61445 10625
#Box-Cox transformation for non-normality
bc <- boxcox(model_initial)
lambda <- bc$x[which.max(bc$y)]
property_transformed <- property
property_transformed$X2025_Market_Value <- (property$X2025_Market_Value^lambda - 1)/lambda
#refitting model with transformed response
model_transformed <- lm(X2025_Market_Value ~ ., data = property_transformed)
summary(model_transformed)
##
## Call:
## lm(formula = X2025_Market_Value ~ ., data = property_transformed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -590.35 -55.39 -9.43 35.76 468.71
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.589e+03 3.419e+02 16.345 < 2e-16 ***
## Improvement_Market_Value 8.727e-01 5.940e-04 1469.157 < 2e-16 ***
## Total_Land_Market_Value 1.069e+00 1.208e-01 8.843 1.49e-10 ***
## Main_Area_Sq_Ft 1.371e+00 9.700e-01 1.414 0.166
## Main_Area_Value -6.317e-03 5.577e-03 -1.133 0.265
## Garage_Sq_Ft -6.407e+00 5.123e+00 -1.251 0.219
## Garage_Value 9.002e-02 6.931e-02 1.299 0.202
## Land_Sq_Ft -1.195e+00 7.057e-01 -1.694 0.099 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 182 on 36 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.478e+06 on 7 and 36 DF, p-value: < 2.2e-16
The Box-Cox plot selects λ = 1 (log transformation), addressing non-normality. The transformed response variable (X2025_Market_Value) improves residual behavior, though there is still multicollinearity.
# Remove non-significant predictors
model_final <- step(model_transformed, direction = "backward")
## Start: AIC=465.14
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value +
## Main_Area_Sq_Ft + Main_Area_Value + Garage_Sq_Ft + Garage_Value +
## Land_Sq_Ft
##
## Df Sum of Sq RSS AIC
## - Main_Area_Value 1 4.2515e+04 1.2354e+06 464.68
## - Garage_Sq_Ft 1 5.1822e+04 1.2448e+06 465.01
## <none> 1.1929e+06 465.14
## - Garage_Value 1 5.5904e+04 1.2488e+06 465.16
## - Main_Area_Sq_Ft 1 6.6233e+04 1.2592e+06 465.52
## - Land_Sq_Ft 1 9.5062e+04 1.2880e+06 466.51
## - Total_Land_Market_Value 1 2.5912e+06 3.7842e+06 513.93
## - Improvement_Market_Value 1 7.1523e+10 7.1525e+10 947.20
##
## Step: AIC=464.68
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value +
## Main_Area_Sq_Ft + Garage_Sq_Ft + Garage_Value + Land_Sq_Ft
##
## Df Sum of Sq RSS AIC
## - Garage_Sq_Ft 1 1.6116e+04 1.2516e+06 463.25
## - Garage_Value 1 2.8433e+04 1.2639e+06 463.68
## - Land_Sq_Ft 1 5.5040e+04 1.2905e+06 464.60
## <none> 1.2354e+06 464.68
## - Main_Area_Sq_Ft 1 1.8804e+05 1.4235e+06 468.91
## - Total_Land_Market_Value 1 3.1824e+06 4.4179e+06 518.75
## - Improvement_Market_Value 1 7.1756e+10 7.1757e+10 945.34
##
## Step: AIC=463.25
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value +
## Main_Area_Sq_Ft + Garage_Value + Land_Sq_Ft
##
## Df Sum of Sq RSS AIC
## - Garage_Value 1 4.1724e+04 1.2933e+06 462.69
## - Land_Sq_Ft 1 4.2903e+04 1.2945e+06 462.73
## <none> 1.2516e+06 463.25
## - Main_Area_Sq_Ft 1 1.7271e+05 1.4243e+06 466.94
## - Total_Land_Market_Value 1 3.3427e+06 4.5942e+06 518.47
## - Improvement_Market_Value 1 9.3437e+10 9.3438e+10 954.96
##
## Step: AIC=462.69
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value +
## Main_Area_Sq_Ft + Land_Sq_Ft
##
## Df Sum of Sq RSS AIC
## <none> 1.2933e+06 462.69
## - Land_Sq_Ft 1 8.9094e+04 1.3824e+06 463.63
## - Main_Area_Sq_Ft 1 1.8828e+05 1.4816e+06 466.67
## - Total_Land_Market_Value 1 4.1209e+06 5.4142e+06 523.70
## - Improvement_Market_Value 1 9.7412e+10 9.7413e+10 954.79
summary(model_final)
##
## Call:
## lm(formula = X2025_Market_Value ~ Improvement_Market_Value +
## Total_Land_Market_Value + Main_Area_Sq_Ft + Land_Sq_Ft, data = property_transformed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -640.79 -37.49 2.60 49.72 393.36
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.848e+03 2.956e+02 19.780 < 2e-16 ***
## Improvement_Market_Value 8.730e-01 5.094e-04 1713.922 < 2e-16 ***
## Total_Land_Market_Value 1.012e+00 9.074e-02 11.148 1.08e-13 ***
## Main_Area_Sq_Ft 2.620e-01 1.099e-01 2.383 0.0221 *
## Land_Sq_Ft -8.794e-01 5.365e-01 -1.639 0.1092
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 182.1 on 39 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 4.333e+06 on 4 and 39 DF, p-value: < 2.2e-16
The significant variables identified include Improvement_Market_Value, Total_Land_Market_Value, Main_Area_Sq_Ft, and Land_Sq_Ft
#check VIF
vif_final <- vif(model_final)
print(vif_final)
## Improvement_Market_Value Total_Land_Market_Value Main_Area_Sq_Ft
## 5.048144 1356.007531 1.079719
## Land_Sq_Ft
## 1394.895084
High VIFs, greater than 1,000 for some variables indicate unresolved collinearity involving Total_Land_Market_Value and Land_Sq_Ft variables. ## Prediction and Confidence Interval
Predict value for the home 6321
#predict value for the home 6321
new_home <- data.frame(
Improvement_Market_Value = 494642,
Total_Land_Market_Value = 43767,
Main_Area_Sq_Ft = 2773,
Land_Sq_Ft = 7546
)
predicted_value <- predict(model_final, newdata = new_home, interval = "prediction")
print(predicted_value)
## fit lwr upr
## 1 476045.4 475668.7 476422.1
print(predicted_value[3])
## [1] 476422.1
Comparing the assessed market value $538,409 and the
predicted value $476,422.1, the home 6321 88TH Street is
overvalued. Questions Answered: How well does the 4 factors forecast
2025 Home Market Value? The model achieves a perfect fit (R² = 100%),
indicating that the four predictors (Improvement_Market_Value,
Total_Land_Market_Value, Main_Area_Sq_Ft, and Land_Sq_Ft) collectively
explain all variability in 2025 market values. However, an R² of 1 is
highly unusual and suggests over fitting (e.g., duplicated variables or
circular dependencies in the data). While the predictors appear to
forecast values perfectly in this data set, the model’s reliability for
new data is questionable without resolving these issues.
How significant is the relationship between 2025 Market Value and the four factors? • Improvement_Market_Value (p<0.001): The strongest predictor. A $1 increase in improvement value corresponds to an $0.87 increase in market value. • Total_Land_Market_Value (pp<0.001): Also highly significant. A $1 increase in land value raises market value by $1.01. • Main_Area_Sq_Ft (p=0.022): Marginally significant. Each additional square foot adds $0.26 to market value. • Land_Sq_Ft (p=0.109): Not statistically significant. Its negative coefficient suggests larger lots may slightly reduce value, but this relationship is unreliable. In conclusion, only Improvement_Market_Value and Total_Land_Market_Value are robust predictors. The other two variables contribute minimally or ambiguously.
What implicates do the results have for property tax and value assessment? Overvaluation Evidence: The 95% prediction interval for 6321 88th Street is $475,669–$476,422, far below the assessed value of $538,409. This suggests the home is overvalued by $62,000, leading to an unfair tax burden. Visualize regression model with intervals
# load ggplot2
library(ggplot2)
# generate predictions and intervals for all data
property_transformed$predicted <- predict(model_final)
property_transformed$lower <- predict(model_final, interval = "prediction")[, "lwr"]
property_transformed$upper <- predict(model_final, interval = "prediction")[, "upr"]
# Plot
ggplot(property_transformed, aes(x = predicted, y = X2025_Market_Value)) +
geom_point(color = "blue", alpha = 0.7) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "gray70", alpha = 0.3) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
labs(
title = "Final Model: Predicted vs. Actual 2025 Market Values",
x = "Predicted Value",
y = "Actual Value"
)
This study establishes a statistically significant linear regression relationship between the 2025 Market Value and four predictors Improvement_Market_Value, Total_Land_Market_Value, Main_Area_Sq_Ft, and Land_Sq_Ft—with a perfect explanatory power (R2=100%) at level of significance of 5%. The model demonstrates that Improvement_Market_Value (+0.87 USD per 1 USD) and Total_Land_Market_Value (+1.01 USD per 1 USD) are the strongest contributors of market value, while Main_Area_Sq_Ft (+0.26 USD per sq. ft.) contributes modestly. It was noted that Land_Sq_Ft (-0.88 USD per sq. ft.) shows a counter-intuitive negative trend, though this relationship is statistically insignificant (p=0.109). The 95% prediction interval (475,669–476,422) USD indicates that the assessed value of $538,409 for the home at 6321 88th Street exceeds reasonable market expectations, hence an overvaluation. To enhance reliability, future studies should address potential over-fitting, include other factors associated with market value, and expand the data set to improve generality by using a bigger data set. Also, exploring non-linear relationships or interaction effects will refine predictive accuracy.