1 Introduction

Home valuations or assessed market value are used by local governments to calculate property taxes. A higher assessed value leads to higher property taxes while undervaluation leads to lower taxes hence it is essential to have an accurate estimate of a home’s market value. Inaccuracies in property market value assessments can lead to overvaluation, resulting in disproportionately high tax burdens. This report examines the 2025 market value assessment of 6321 88th Street, valued at $538,409, using data from 45 comparable properties along 88th Street. By analyzing key factors such as improvement value, land market value, main area in footage and value, garage footage and value, and the land footage to determine influential factors, and determine whether the home in address 6321 88TH Street is overvalued or undervalued using the multiple linear regression model. The findings advocate for re-evaluation to align the home’s taxable value with empirical evidence, ensuring fairness and equity in property taxation.

The multiple linear regression equation is given by \[ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \beta_nX_n + \varepsilon \]

Where:

-\(Y: Dependent Variable\) -\(X_1, X_2, \dots, X_n: IndependentVariables\) -\(\beta_0: Intercept\) -\(\beta_1, \dots, \beta_n: Coefficients\) -$: Error $

The 2025 market value assessment of $538,409 for the home at 631 88th Street is overvalued, resulting in an unfairly high property tax burden. This report demonstrates this using data from 45 neighboring properties along 88th Street (addresses 6303–6351). Variables analyzed include:

  • 2025_Market_Value
  • Improvement_Market_Value (value of structures on the property)
  • Total_Land_Market_Value
  • Main_Area_Sq_Ft (square footage of the main living area)
  • Main_Area_Value
  • Garage_Sq_Ft
  • Garage_Value
  • Land_Sq_Ft

The goal of this project is to show that the assessed value of home at 6321 88th Street exceeds or lower than the statistically reasonable range, and urge the county tax assessor to re-evaluate the home’s value and adjust taxes accordingly.

Load libraries

#load necessary libraries
library(dplyr)
library(ggplot2)
library(car) 
library(MASS) 

1.1 Load dataset

property <- read.csv("https://raw.githubusercontent.com/Ahmedja96/IE-5320-Project-2-Dataset/refs/heads/main/IE%205344%20Project%202%20Dataset.csv")
str(property)
## 'data.frame':    45 obs. of  8 variables:
##  $ X2025_Market_Value      : int  531703 504815 573558 469131 1218146 569992 602427 460288 968766 550119 ...
##  $ Improvement_Market_Value: int  485373 458572 527274 422975 116617 511992 538677 415135 888796 505175 ...
##  $ Total_Land_Market_Value : int  46330 46243 46284 46156 101529 58000 63750 45153 79970 44944 ...
##  $ Main_Area_Sq_Ft         : int  2743 2610 2851 2991 3097 3036 2877 2241 2041 2582 ...
##  $ Main_Area_Value         : int  449668 419843 460543 390541 624126 447924 432168 376845 367600 433934 ...
##  $ Garage_Sq_Ft            : int  484 525 918 552 1095 575 909 506 550 550 ...
##  $ Garage_Value            : int  35705 38729 66731 32434 96763 38175 61445 38290 44577 41595 ...
##  $ Land_Sq_Ft              : int  7988 7973 7980 7958 17505 10000 10625 7785 13788 7749 ...
summary(property)
##  X2025_Market_Value Improvement_Market_Value Total_Land_Market_Value
##  Min.   : 418286    Min.   : 116617          Min.   : 43506         
##  1st Qu.: 504815    1st Qu.: 458572          1st Qu.: 45112         
##  Median : 534991    Median : 485962          Median : 45658         
##  Mean   : 575245    Mean   : 502189          Mean   : 50834         
##  3rd Qu.: 573558    3rd Qu.: 527274          3rd Qu.: 46330         
##  Max.   :1218146    Max.   :1116617          Max.   :101529         
##  Main_Area_Sq_Ft Main_Area_Value   Garage_Sq_Ft     Garage_Value  
##  Min.   :2041    Min.   :331544   Min.   : 325.0   Min.   :24934  
##  1st Qu.:2610    1st Qu.:415934   1st Qu.: 506.0   1st Qu.:36674  
##  Median :2745    Median :443268   Median : 528.0   Median :38729  
##  Mean   :2721    Mean   :440626   Mean   : 570.9   Mean   :41673  
##  3rd Qu.:2902    3rd Qu.:453758   3rd Qu.: 552.0   3rd Qu.:41033  
##  Max.   :3219    Max.   :624126   Max.   :1119.0   Max.   :96763  
##    Land_Sq_Ft   
##  Min.   : 7501  
##  1st Qu.: 7778  
##  Median : 7872  
##  Mean   : 8756  
##  3rd Qu.: 7988  
##  Max.   :17505

Exploratory data analysis (EDA) & Visualizing using a Scatter plot

#Scatter plot matrix
plot(property$Improvement_Market_Value, property$`2025_Market_Value`,
     main = "Improvement Market Value vs 2025 Value",
     xlab = "Improvement Value", ylab = "2025 Market Value",
     col = "blue")

plot(property$Total_Land_Market_Value, property$`2025_Market_Value`,
     main = "Total Land Value vs 2025 Value",
     xlab = "Land Value", ylab = "2025 Market Value",
     col = "blue")

plot(property$Main_Area_Sq_Ft, property$`2025_Market_Value`,
     main = "Main Area vs 2025 Value",
     xlab = "Sq Ft", ylab = "2025 Market Value",
     col = "blue")

plot(property$Main_Area_Value, property$`2025_Market_Value`,
     main = "Main Area Value vs 2025 Value",
     xlab = "Value", ylab = "2025 Market Value",
     col = "blue")

plot(property$Garage_Sq_Ft, property$`2025_Market_Value`,
     main = "Garage Sq Ft vs 2025 Value",
     xlab = "Sq Ft", ylab = "2025 Market Value",
     col = "blue")

plot(property$Garage_Value, property$`2025_Market_Value`,
     main = "Garage Value vs 2025 Value",
     xlab = "Value", ylab = "2025 Market Value",
     col = "blue")

plot(property$Land_Sq_Ft, property$`2025_Market_Value`,
     main = "Land Sq Ft vs 2025 Value",
     xlab = "Sq Ft", ylab = "2025 Market Value",
     col = "blue")

The predictor Total_Main_Area_Sq_Ft was removed because Main_Area_Sq_Ft and Garage Sq Ft are used. Fitting initial Multiple Linear Regression model

#fit initial multiple regression model
model_initial <- lm(X2025_Market_Value ~ ., data = property)
summary(model_initial)
## 
## Call:
## lm(formula = X2025_Market_Value ~ ., data = property)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -107849   -9540   -1424   15793  109994 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -1.327e+05  6.084e+04  -2.182   0.0356 *  
## Improvement_Market_Value  4.802e-01  7.283e-02   6.593 9.93e-08 ***
## Total_Land_Market_Value   3.355e+01  2.221e+01   1.511   0.1393    
## Main_Area_Sq_Ft           6.740e+02  1.461e+02   4.614 4.62e-05 ***
## Main_Area_Value          -3.813e+00  8.476e-01  -4.499 6.56e-05 ***
## Garage_Sq_Ft             -3.975e+03  7.147e+02  -5.561 2.47e-06 ***
## Garage_Value              5.472e+01  9.524e+00   5.746 1.39e-06 ***
## Land_Sq_Ft               -1.603e+02  1.308e+02  -1.226   0.2281    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34410 on 37 degrees of freedom
## Multiple R-squared:  0.9628, Adjusted R-squared:  0.9558 
## F-statistic: 136.9 on 7 and 37 DF,  p-value: < 2.2e-16

1.2 Initial Regression Model

The initial model included all variables to predict 2025_Market_Value: \(2025_Market_Value = Improvement_Market_Value X1 + Total_Land_Market_Value X2 + Main_Area_Sq_Ft X3 + Main_Area_Value X4 + Garage_Sq_Ft X5 + Garage_Value X6 + Land_Sq_Ft X7\) where X1 to X7 are coefficients. The multiple regression model is. \[Market Value =−132,700+0.48(Improvement Market Value)+33.55(Land Market Value)+4,649(Main Area Sq Ft)−3.81(Main Area Value)+54.72(Garage Value)−160.3(Land Sq Ft)\] The model has a high R-squared value of 0.9628, indicating that approximately 96.3% of the variability in market value can be explained by the included predictors. The adjusted R-squared of 0.9558 confirms the model’s strong explanatory power while accounting for the number of predictors. The F-statistic of 136.9 with a p-value less than 2.2e-16 suggests the overall model is highly significant. The level of significance used to determine significant factors is 0.05. Most of the predictors were statistically significant at the 0.05 level, including Improvement Market Value, Main Area Square Footage, Main Area Value, Garage Square Footage, and Garage Value. Notably, Improvement Market Value has a positive relationship with the 2025 market value, while Main Area Value and Garage Sq Ft show a negative impact, which may warrant further investigation or could indicate multicollinearity. On the other hand, Total Land Market Value and Land Square Footage were not statistically significant in this model.

1.3 Assumption Checking

For the multiple linear regression model to be valid and reliable, key assumptions must be satisfied. First, we check for linearity which assumes a straight-line relationship between the independent variables and the dependent variable. This ensures that the model accurately captures the true relationship. Also, we check for independence means that the residuals (errors) are not correlated with one another. The other assumption is homoscedasticity which requires that the residuals have constant variance across all levels of the predictors, any patterns or funnel shapes in residual plots may indicate a violation. The fourth assumption is normality of residuals which ensures that hypothesis tests and confidence intervals derived from the model are valid. This is typically assessed using a Q-Q plot. Lastly, we check that the model assumes no multicollinearity, meaning that the independent variables are not highly correlated with each other, as multicollinearity can inflate standard errors and make coefficient estimates unstable.

Checking for Multicollinearity

#check for multicollinearity using VIF
vif_values <- vif(model_initial)
print(vif_values)
## Improvement_Market_Value  Total_Land_Market_Value          Main_Area_Sq_Ft 
##                  3.57135               3369.20445                 56.00032 
##          Main_Area_Value             Garage_Sq_Ft             Garage_Value 
##                 93.54359                460.41418                548.87583 
##               Land_Sq_Ft 
##               3451.40079

The Variance Inflation Factors (VIFs) reveals severe multicollinearity. Total_Land_Market_Value (VIF = 3,369) and Land_Sq_Ft (VIF = 3,451) are highly collinear while Garage_Value (VIF = 549) and Main_Area_Value (VIF = 94) also show redundancy. #### plot initial model

#model diagnostics
plot(model_initial, which = 1:4)

1.3.0.1 cook’s distance to find influential points

#check for influential points
cooksd <- cooks.distance(model_initial)
plot(cooksd, main = "Cook's Distance")
abline(h = 4/nrow(property), col = "red")

Cook’s Distance plot identifies influential observations (points above the red line at 4/n). These points disproportionately affect the model such as row 5 with Market_Value of 1,218,146 and it will be removed.

1.3.0.1.1 Removing row 5
# remove row 5
property <- property[-5, ]
head(property)
##   X2025_Market_Value Improvement_Market_Value Total_Land_Market_Value
## 1             531703                   485373                   46330
## 2             504815                   458572                   46243
## 3             573558                   527274                   46284
## 4             469131                   422975                   46156
## 6             569992                   511992                   58000
## 7             602427                   538677                   63750
##   Main_Area_Sq_Ft Main_Area_Value Garage_Sq_Ft Garage_Value Land_Sq_Ft
## 1            2743          449668          484        35705       7988
## 2            2610          419843          525        38729       7973
## 3            2851          460543          918        66731       7980
## 4            2991          390541          552        32434       7958
## 6            3036          447924          575        38175      10000
## 7            2877          432168          909        61445      10625

1.3.0.2 Box-Cox transformation

#Box-Cox transformation for non-normality
bc <- boxcox(model_initial)

lambda <- bc$x[which.max(bc$y)]
property_transformed <- property
property_transformed$X2025_Market_Value <- (property$X2025_Market_Value^lambda - 1)/lambda

#refitting model with transformed response
model_transformed <- lm(X2025_Market_Value ~ ., data = property_transformed)
summary(model_transformed)
## 
## Call:
## lm(formula = X2025_Market_Value ~ ., data = property_transformed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -590.35  -55.39   -9.43   35.76  468.71 
## 
## Coefficients:
##                            Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)               5.589e+03  3.419e+02   16.345  < 2e-16 ***
## Improvement_Market_Value  8.727e-01  5.940e-04 1469.157  < 2e-16 ***
## Total_Land_Market_Value   1.069e+00  1.208e-01    8.843 1.49e-10 ***
## Main_Area_Sq_Ft           1.371e+00  9.700e-01    1.414    0.166    
## Main_Area_Value          -6.317e-03  5.577e-03   -1.133    0.265    
## Garage_Sq_Ft             -6.407e+00  5.123e+00   -1.251    0.219    
## Garage_Value              9.002e-02  6.931e-02    1.299    0.202    
## Land_Sq_Ft               -1.195e+00  7.057e-01   -1.694    0.099 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 182 on 36 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 2.478e+06 on 7 and 36 DF,  p-value: < 2.2e-16

The Box-Cox plot selects λ = 1 (log transformation), addressing non-normality. The transformed response variable (X2025_Market_Value) improves residual behavior, though there is still multicollinearity.

1.3.0.3 Removing non-significant predictors

# Remove non-significant predictors
model_final <- step(model_transformed, direction = "backward")
## Start:  AIC=465.14
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value + 
##     Main_Area_Sq_Ft + Main_Area_Value + Garage_Sq_Ft + Garage_Value + 
##     Land_Sq_Ft
## 
##                            Df  Sum of Sq        RSS    AIC
## - Main_Area_Value           1 4.2515e+04 1.2354e+06 464.68
## - Garage_Sq_Ft              1 5.1822e+04 1.2448e+06 465.01
## <none>                                   1.1929e+06 465.14
## - Garage_Value              1 5.5904e+04 1.2488e+06 465.16
## - Main_Area_Sq_Ft           1 6.6233e+04 1.2592e+06 465.52
## - Land_Sq_Ft                1 9.5062e+04 1.2880e+06 466.51
## - Total_Land_Market_Value   1 2.5912e+06 3.7842e+06 513.93
## - Improvement_Market_Value  1 7.1523e+10 7.1525e+10 947.20
## 
## Step:  AIC=464.68
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value + 
##     Main_Area_Sq_Ft + Garage_Sq_Ft + Garage_Value + Land_Sq_Ft
## 
##                            Df  Sum of Sq        RSS    AIC
## - Garage_Sq_Ft              1 1.6116e+04 1.2516e+06 463.25
## - Garage_Value              1 2.8433e+04 1.2639e+06 463.68
## - Land_Sq_Ft                1 5.5040e+04 1.2905e+06 464.60
## <none>                                   1.2354e+06 464.68
## - Main_Area_Sq_Ft           1 1.8804e+05 1.4235e+06 468.91
## - Total_Land_Market_Value   1 3.1824e+06 4.4179e+06 518.75
## - Improvement_Market_Value  1 7.1756e+10 7.1757e+10 945.34
## 
## Step:  AIC=463.25
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value + 
##     Main_Area_Sq_Ft + Garage_Value + Land_Sq_Ft
## 
##                            Df  Sum of Sq        RSS    AIC
## - Garage_Value              1 4.1724e+04 1.2933e+06 462.69
## - Land_Sq_Ft                1 4.2903e+04 1.2945e+06 462.73
## <none>                                   1.2516e+06 463.25
## - Main_Area_Sq_Ft           1 1.7271e+05 1.4243e+06 466.94
## - Total_Land_Market_Value   1 3.3427e+06 4.5942e+06 518.47
## - Improvement_Market_Value  1 9.3437e+10 9.3438e+10 954.96
## 
## Step:  AIC=462.69
## X2025_Market_Value ~ Improvement_Market_Value + Total_Land_Market_Value + 
##     Main_Area_Sq_Ft + Land_Sq_Ft
## 
##                            Df  Sum of Sq        RSS    AIC
## <none>                                   1.2933e+06 462.69
## - Land_Sq_Ft                1 8.9094e+04 1.3824e+06 463.63
## - Main_Area_Sq_Ft           1 1.8828e+05 1.4816e+06 466.67
## - Total_Land_Market_Value   1 4.1209e+06 5.4142e+06 523.70
## - Improvement_Market_Value  1 9.7412e+10 9.7413e+10 954.79
summary(model_final)
## 
## Call:
## lm(formula = X2025_Market_Value ~ Improvement_Market_Value + 
##     Total_Land_Market_Value + Main_Area_Sq_Ft + Land_Sq_Ft, data = property_transformed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -640.79  -37.49    2.60   49.72  393.36 
## 
## Coefficients:
##                            Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)               5.848e+03  2.956e+02   19.780  < 2e-16 ***
## Improvement_Market_Value  8.730e-01  5.094e-04 1713.922  < 2e-16 ***
## Total_Land_Market_Value   1.012e+00  9.074e-02   11.148 1.08e-13 ***
## Main_Area_Sq_Ft           2.620e-01  1.099e-01    2.383   0.0221 *  
## Land_Sq_Ft               -8.794e-01  5.365e-01   -1.639   0.1092    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 182.1 on 39 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 4.333e+06 on 4 and 39 DF,  p-value: < 2.2e-16

The significant variables identified include Improvement_Market_Value, Total_Land_Market_Value, Main_Area_Sq_Ft, and Land_Sq_Ft

1.3.0.4 Forecast for Home 6321

#check VIF 
vif_final <- vif(model_final)
print(vif_final)
## Improvement_Market_Value  Total_Land_Market_Value          Main_Area_Sq_Ft 
##                 5.048144              1356.007531                 1.079719 
##               Land_Sq_Ft 
##              1394.895084

High VIFs, greater than 1,000 for some variables indicate unresolved collinearity involving Total_Land_Market_Value and Land_Sq_Ft variables. ## Prediction and Confidence Interval

Predict value for the home 6321

#predict value for the home 6321
new_home <- data.frame(
  Improvement_Market_Value = 494642,
  Total_Land_Market_Value = 43767,
  Main_Area_Sq_Ft = 2773,
  Land_Sq_Ft = 7546
)

predicted_value <- predict(model_final, newdata = new_home, interval = "prediction")
print(predicted_value)
##        fit      lwr      upr
## 1 476045.4 475668.7 476422.1

1.4 Conclusion

This study establishes a statistically significant linear regression relationship between the 2025 Market Value and four predictors Improvement_Market_Value, Total_Land_Market_Value, Main_Area_Sq_Ft, and Land_Sq_Ft—with a perfect explanatory power (R2=100%) at level of significance of 5%. The model demonstrates that Improvement_Market_Value (+0.87 USD per 1 USD) and Total_Land_Market_Value (+1.01 USD per 1 USD) are the strongest contributors of market value, while Main_Area_Sq_Ft (+0.26 USD per sq. ft.) contributes modestly. It was noted that Land_Sq_Ft (-0.88 USD per sq. ft.) shows a counter-intuitive negative trend, though this relationship is statistically insignificant (p=0.109). The 95% prediction interval (475,669–476,422) USD indicates that the assessed value of $538,409 for the home at 6321 88th Street exceeds reasonable market expectations, hence an overvaluation. To enhance reliability, future studies should address potential over-fitting, include other factors associated with market value, and expand the data set to improve generality by using a bigger data set. Also, exploring non-linear relationships or interaction effects will refine predictive accuracy.