2 Background and Purpose

Property taxes serve as a vital source of revenue for local and regional governments. Revenue from property taxes funds multiple operations such as public education, streets, roads, police, fire protection, and more services that benefit the local community. The first step in calculating property taxes is to determine the market value of the land and structures. An assessment ratio is then applied once exemptions are subtracted, resulting in the assessed value. The generic breakdown of the amount of property taxes owed can be found using the formula below. Market value assessments are not solely scientific and can be based on the subjective opinion of the assessor; therefore, the assessment can be appealed. This project concerns the assessed value of property at 6321 88th Street in Lubbock, Texas, specifically whether the assessment is justified given the assessed value of other homes in the neighborhood.

\[ \text{Property Taxes Owed}= \text{Market Value }X \text{ Assessment Ratio }=\text{Assessed Value }-\text{ Exemptions }=\text{ Taxable Value }X\text{ Tax Rates }= \text{ Property Tax Before Credits }-\text{ Homestead Credits & Circuit Breakers }\]

3 Exploratory Data Analysis

3.1 Data Collection

The data used for this project was collected at https://lubbockcad.org and saved as a CSV file. Below is a breakdown of the key variables that were collected:

Variable Explanation CSV Field Name
2025 Market Value Total Value of property appraisal (house + land) X2025MarketValue
Total Improvement Market Value Total Value of land appraisal TotalImprovMarketValue
Total Land Market Value Total Value of land appraisal TotalLandMarketValue
Homestead Cap Loss Represents a discount only in the current tax year if the appraised value from the previous year went up by more than 10% HomesteadCapLoss
Total Main Area (Sq. Ft.) Total square footage of house TotalMainAreaSqFt
Main Area (Sq. Ft.) Total square footage of heated house area MainAreaSqFt
Main Area (Value) Total value of heated house area MainAreaValue
Garage (Sq. Ft.) Total square footage of non-heated house area GarageSqFt
Garage (Value) Total value of non-heated house area GarageValue
Land (Sq. Ft.) Total square footage of land LandSqFt

A total of 42 observations were collected. Data related to the second main area, the second garage area, and any pool information were excluded from this analysis. Ten properties included either a second main area and/or a second garage area. Variables representing square footage were chosen as the predictors for the analysis.

3.2 Data Preparation

The data was imported into an R data frame, and column names were redefined for clarity. Data types initially identified as integers were converted to real numbers. The StreetNumber column was removed, as it served only as an identifier and was irrelevant to the analysis. A new data frame containing only the indicator variable and predictor variables was created. The property in question, 6321 88th Street or row 13 of the data set, was removed to test the forthcoming model’s predictability better. Summary statistics were then generated for the predictor variable (PV) data frame to understand the data distribution and range better.

#read in data
df<-read.csv("https://raw.githubusercontent.com/btarin12/IE5344/refs/heads/main/Project%202%20Tax%20Assessment.csv")

#column definition
colnames(df)<-c("StreetNumber","X2025MarketValue","TotalImprovMarketValue","TotalLandMarketValue",
                "HomesteadCapLoss","TotalMainAreaSqFt","MainAreaSqFt","MainAreaValue",
                "GarageSqFt","GarageValue","LandSqFt")

#convert values to real numbers and remove the street number column
#str(df)
df[2:11] <- lapply(df[2:11], as.numeric)
#str(df)
df<-df[,-1]

#create a data frame with just the square footage variables
pv<-df<-df %>%select(X2025MarketValue,TotalMainAreaSqFt,MainAreaSqFt,GarageSqFt,LandSqFt)

#remove 6321 88th Street data
pvf<-pv[-13,]

#summary statistics on data
kable(summary(pvf))
X2025MarketValue TotalMainAreaSqFt MainAreaSqFt GarageSqFt LandSqFt
Min. : 418286 Min. :2591 Min. :2041 Min. : 325.0 Min. : 7501
1st Qu.: 504815 1st Qu.:3064 1st Qu.:2582 1st Qu.: 506.0 1st Qu.: 7778
Median : 534991 Median :3272 Median :2743 Median : 528.0 Median : 7872
Mean : 570966 Mean :3278 Mean :2720 Mean : 557.8 Mean : 8723
3rd Qu.: 580969 3rd Qu.:3477 3rd Qu.:2910 3rd Qu.: 552.0 3rd Qu.: 8206
Max. :1218146 Max. :4011 Max. :3226 Max. :1119.0 Max. :17505

3.2.1 Observations Made on Summary Statistics

  • 2025 Market Value: Property Values range from approximately $418,286 to $1,218,146. The mean value of $570.966 is slightly higher than the median of $534,991, suggesting a slight right skew in the distribution.

  • Total Main Area (Sq. Ft.): The range for the total main area square footage ranges from 2,591 sq ft to 4,011 sq ft. The mean (3,278 sq ft) and the median (3,272 sq ft) are close, indicating that the data is fairly symmetrical.

  • Main Area (Sq. Ft.): The size of the main area ranges from 2,041 to 3,226 square feet. The mean (2,720 sq ft) is close to the median (2,743 sq ft), indicating a symmetric distribution.

  • Garage (Sq. Ft.): Garage sizes vary from 325 to 1,119 square feet. The mean garage size of 557.8 sq ft is slightly higher than the median of 528 sq ft, which suggests a minor right skew.

  • Land (Sq. Ft.): Land sizes range from 7,501 to 17,505 square feet. The mean (8,723 sq ft) is significantly higher than the median (7,872 sq ft), indicating a right-skewed distribution. The larger properties are pulling the mean upward.

3.2.2 Observations Made on Histograms

pv_long <- pivot_longer(pvf, cols = everything(), names_to = "Variable", values_to = "Value")

# Plot all histograms
ggplot(pv_long, aes(x = Value)) +
  geom_histogram(bins = 30, color = "black", fill = "lightblue") +
  facet_wrap(~ Variable, scales = "free", ncol = 3) +
  labs(title = "Histograms of Tax Assessment Variables", x = "Value", y = "Frequency") +
  theme_minimal()

  • 2025 Market Value: The distribution is right-skewed as indicated in the summary statistics, meaning most of the property values trend towards the lower end, with a few properties with high values pull the tail to the right. Most properties have market values between $400,000 and $600,000. A few outliers at the high end of the range could impact the regression model.

  • Total Main Area (Sq. Ft.): As the summary statistics observations indicated, the distribution is fairly symmetric with no strong skewness.

  • Main Area (Sq. Ft.): The distribution of the main area size appears fairly uniform with the exception of a slight peak around 2,500 to 4,000 square feet. The spread is broader, and the distribution looks closer to normal, as the summary statistics suggested.

  • Garage (Sq. Ft.): The distribution is right-skewed, with most garage sizes around 400 and 600 square feet. A few properties have larger garage sizes, indicating outliers, less common garage sizes, or properties with a second garage area.

  • Land (Sq. Ft.): The distribution is also right-skewed, heavily concentrated around 7,500 to 8,000 square feet. There are a few properties that have notably larger land sizes, which suggests outliers.

3.2.3 Observations Made on Box Plots

# Plot all boxplots
ggplot(pv_long, aes(x = Variable, y = Value)) +
  geom_boxplot(fill = "lightblue", color = "black", outlier.color = "red") +
  labs(title = "Boxplots of Tax Assessment Variables", x = "Variable", y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

  • 2025 Market Value: Market value shows a much larger spread compared to the predictor variables. Several high-value outliers (noted by the red dots) indicate a few properties assessed higher than the majority of the observations. Regardless of the outliers, the box is relatively compact, meaning most of the property values are concentrated within a tighter range, as seen in the summary statistics.

  • Total Main Area (Sq. Ft.), Garage (Sq. Ft.) , Main Area (Sq. Ft.) , Land (Sq. Ft.): The predictor variables have relatively small ranges compared to the indicator variables. There are outliers in each variable (as indicated with the red dots about and below the whiskers). All the narrow boxes suggest that the middle 50% of observations are fairly close in value.

3.2.4 Observations Made on Scatter Plots

pv_long <- pv %>%
  pivot_longer(cols = c(GarageSqFt, LandSqFt, MainAreaSqFt, TotalMainAreaSqFt),
               names_to = "Variable",
               values_to = "Value")

# scatter plot
ggplot(pv_long, aes(x = X2025MarketValue, y = Value, color = Variable)) +
  geom_point(size = 2) +
  labs(title = "Scatter Plot of Variables vs X2025MarketValue",
       x = "2025 Market Value",
       y = "Square Footage") +
  theme_minimal()

  • Overall Trend: All four variable appear to have a general positive relationship with the market value, indicating that larger properties tend to have higher market values.

  • Total Main Area (Sq. Ft.)/Main Area (Sq. Ft.): Both show a fairly tight, positive relationship with market value with higher main area having higher accessed values.

  • Garage (Sq. Ft.): There is little variation in garage size as market value increases, suggesting garage size does not strongly drive property value.

  • Land (Sq. Ft.): Land square footage has a much wider spread compared to the other variables. A few properties with especially large land areas stand out as outliers.

    Regression Model Fitting

    Multiple Regression Analysis

    Regression models are used to detail the relationship between dependent (response) and independent (predictor) variables. Multiple regression is specifically used to estimate the relationship between two or more predictor variables and one response variable. The value of the response variable at a particular value of the predictor variables can also be obtained using the regression model.

    The multiple regression model is given by

    \[ y=\beta _{o}+\beta _{1}*x_{1}+\text{...}+\beta_{n}*x_{n}+\varepsilon \]

    where

    \[ y=\text{response variable} \] \[ x_{n}=\text{predictor variable n} \] \[ b_{0} =\text{intercept (when y=0)} \] \[ b_{n} =\text{slope (change in y for one-unit increase in }x_{n}\text{)} \] \[ \varepsilon =\text{error term (random variation not explained by x)} \]

3.3 Model Selection

Multiple iterations of a multiple linear regression model were explored.

  • The first model included all square footage variables: Total Main Area Square Foot, Main Area Square Foot, Garage Square Foot, and Land Square Foot. However, the Garage Square Foot variable produced NaN values in the summary statistics, indicating issues with its contribution to the model.

  • As a result, the second model was developed, excluding the Garage Square Foot variable. Variance Inflation Factors (VIF) were then calculated for the remaining predictors. The VIF analysis revealed a high degree of multicollinearity between the Total Main Area Square Foot and Main Area Square Foot variables.

  • Third Model: To address the multicollinearity, the third model was created by removing the Main Area Square Foot variable.

Summary statistics for the first and second model can be found in the Appendix Section.

3.3.1 Model 3 Summary Statistics

Alpha is the significant level chosen before analyzing the data. The summary statistics refer to this as significant codes of 0.05, 0.01. and 0.1. The alpha affects the statistical significance of the variables. The p-values for each variable and the p-value of the model are compared to alpha. The p-value must be less than alpha to be considered statistically significant.

#third iteration removing the total main area LandSqFt,data=pv)
model3<-lm(X2025MarketValue~TotalMainAreaSqFt+LandSqFt,data=pvf)
summary(model3)
## 
## Call:
## lm(formula = X2025MarketValue ~ TotalMainAreaSqFt + LandSqFt, 
##     data = pvf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -165957  -44671    4158   39694   91204 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       48113.947  90846.093   0.530    0.599    
## TotalMainAreaSqFt    -9.861     28.759  -0.343    0.734    
## LandSqFt             63.647      4.907  12.970 1.57e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59390 on 38 degrees of freedom
## Multiple R-squared:  0.8288, Adjusted R-squared:  0.8197 
## F-statistic: 91.95 on 2 and 38 DF,  p-value: 2.746e-15
  • Coefficient p-values:

    • Total Main Area (Sq. Ft.) - the p-value for this variable is high and much greater than the significance codes, 0.001, 0.01, 0.05, and 0.1. This indicates that the Total Main Area (Sq. Ft.) is not statistically significant in predicting the Market Value.

    • Land (Sq. Ft.) - the p-value (1.57e-15) is extremely small which indicates this variable is a strong predictor of Market Value.

  • Coefficient Estimates:

    • Total Main Area (Sq. Ft.) - the estimate of the coefficient, -9.862, suggests that if the Land (Sq. Ft.) was held constant that an increase in one square foot of Total Main Area (Sq. Ft.) would decrease the Market Value by $9.86 which negates the scatter plot observations that indicate a positive relationship between Total Main Area (Sq. Ft.) and Market Value

    • Land (Sq. Ft.) - the estimate of the coefficient, 63.647, suggests that if the Total Main Area (Sq. Ft.) is held constant that an increase in one square foot of Land (Sq. Ft.) would increase the Market Value by $63.65.

  • Model p-value: the overall model p-value of 2.746e-15 is extremely small and much less than the lowest significant code threshold of .001 which indicates the models as a whole is highly statistically significant.

  • R-Squared: The Multiple R-Squared (0.8299) and Adjusted R-Squared (0.8197) are close to 1 meaning the model explain around 82% of the variance in the Market Value.

3.3.2 Model 3 Adequacy Observations

plot(model3)


-   Residuals vs Fitted Plot: Most properties are valued between \$400,000 to \$600,000. The spread of residuals between this range appears to be constant, indicating homoscedasticity. Points labeled 14, 16, and 22 appear to be outliers and potentially influence the model.

-   Normal Q-Q: Most of the points fall close to the dotted line, indicating that the residuals are approximately normally distributed. Points 14, 16, and 22 deviate from the dotted line suggesting these points represent outliers.

-   Scale-Location: Points 14 and 16 have higher standardized residuals suggesting they could be influential points or outliers.

Residuals vs Leverage: Point 14 has very high leverage and is near the outer Cook's distance curves suggesting that it may be very influential on the model. Point 6 has moderate-to-high leverage, while point 16 has moderate leverage.

### Points with Leverage

Points with leverage can be identified using the equation below, which results in a ratio of 0.098.

$$ \text{Threshold}=\frac{2p}{n}  $$

where

$$\text{p}=\text{number of predictor variables (2)}
$$ $$
\text{n}=\text{number of observations (41)}
$$


```r
#hat values > 0.098
hatvals<-hatvalues(model3)
hatvals[hatvals>0.098]
##         1         2         6        14        16        24        27 
## 0.1074954 0.1079482 0.3981804 0.5052404 0.1815329 0.1285348 0.1013083

Points 16 and 14 show high leverage, as the model adequacy plots indicate. Although Points 1, 2, 6, 24, and 27 weren’t identified as outliers, they also show high leverage. These observations will be removed from the model, and the effects of the removal will be evaluated.

3.3.3 Final Model Selection

3.3.3.1 Summary Statistics

#remove points with leverage
pvf2<- pvf[-c(1,2,6,14,16,24,27),]
model4<-lm(X2025MarketValue~TotalMainAreaSqFt+LandSqFt,data=pvf2)
summary(model4)
## 
## Call:
## lm(formula = X2025MarketValue ~ TotalMainAreaSqFt + LandSqFt, 
##     data = pvf2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -151640  -39476    2846   41506  131144 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.388e+04  1.089e+05   0.403    0.690    
## TotalMainAreaSqFt 1.063e+00  3.800e+01   0.028    0.978    
## LandSqFt          5.937e+01  6.219e+00   9.548 9.57e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61690 on 31 degrees of freedom
## Multiple R-squared:  0.7945, Adjusted R-squared:  0.7812 
## F-statistic: 59.93 on 2 and 31 DF,  p-value: 2.231e-11
  • Coefficient p-values:

    • Total Main Area (Sq. Ft.) - the p-value for this variable increased from 0.734 to 0.978 with the removal of the leverage points.

    • Land (Sq. Ft.) - the p-value for this variable increased from 1.57e-15 to 9.57e-11 but remains statistically significant.

  • Coefficient Estimates:

    • Total Main Area (Sq. Ft.) - the estimate of the coefficient,1.0, suggests that if the Land (Sq. Ft.) was held constant that an increase in one square foot of Total Main Area (Sq. Ft.) would increase the Market Value by $1.063 which aligns with previous observations made in the early stages of exploratory data analysis.

    • Land (Sq. Ft.) - the estimate of the coefficient, 5.94 suggests that if the Total Main Area (Sq. Ft.) is held constant that an increase in one square foot of Land (Sq. Ft.) would increase the Market Value by $5.94.

  • Model p-value: the overall model p-value went from 2.746e-15 to 2.231e-11. Both values are extremely small and much less than the lowest significant code threshold of .001 which indicates the models as a whole remains highly statistically significant.

  • R-Squared: The Multiple R-Squared (0.8299) and Adjusted R-Squared (0.8197) reduced to 0.7945 and .7812, respectively. Both values remain close to 1, meaning the model explains around 79% of the variance in the Market Value.

3.3.3.2 Model Prediction

A confidence interval estimates the range within which a parameter like the mean falls, while the prediction interval estimates the range within which a single future observation is likely to fall. Therefore, a prediction interval will be used to estimate the market value of the property, 6321 88th Street.

pv14<-df<-df %>%select(X2025MarketValue,TotalMainAreaSqFt,MainAreaSqFt,GarageSqFt,LandSqFt)
pv14<-pv14<-pv14[13,]

prediction<-predict(model4,newdata=pv14,interval="prediction")

actual_value <- pv14$X2025MarketValue

# Combine into a small data frame for easy comparison
comparison <- data.frame(
  Actual_Value = actual_value,
  Predicted_Value = prediction[, "fit"],
  Lower_Bound = prediction[, "lwr"],
  Upper_Bound = prediction[, "upr"]
)


# View the comparison
kable(comparison, caption = "Comparison of Actual and Predicted Values with Prediction Interval")
Comparison of Actual and Predicted Values with Prediction Interval
Actual_Value Predicted_Value Lower_Bound Upper_Bound
538409 495479.8 366499.4 624460.1

The predicted market value of $495,480 is slightly lower than the actual market value of $538,409. This suggests the assessor assessed the property too high. The difference between the actual market value and the predicted value is $42,929, which represents a prediction error of around 8%. The actual value falls within the $366,499 to $624,460 prediction interval. This indicates that the model successfully captured the actual value within its expected range. The model shows to be reasonably accurate with a p-value for the overall F-test of 2.23e-11. When compared to the lowest alpha, this suggests that 95% of the interval would capture the true value and there is 5% chance the value lies outside of the interval.

4 Appendix

## 
## Call:
## lm(formula = X2025MarketValue ~ TotalMainAreaSqFt + MainAreaSqFt + 
##     GarageSqFt + LandSqFt, data = pvf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -163371  -43578    4509   40244   90537 
## 
## Coefficients: (1 not defined because of singularities)
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       44329.188  92701.567   0.478    0.635    
## TotalMainAreaSqFt   -31.741     74.506  -0.426    0.673    
## MainAreaSqFt         28.310     88.744   0.319    0.752    
## GarageSqFt               NA         NA      NA       NA    
## LandSqFt             63.474      4.996  12.706  4.6e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60100 on 37 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.8154 
## F-statistic: 59.89 on 3 and 37 DF,  p-value: 2.859e-14
## 
## Call:
## lm(formula = X2025MarketValue ~ TotalMainAreaSqFt + MainAreaSqFt + 
##     LandSqFt, data = pvf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -163371  -43578    4509   40244   90537 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       44329.188  92701.567   0.478    0.635    
## TotalMainAreaSqFt   -31.741     74.506  -0.426    0.673    
## MainAreaSqFt         28.310     88.744   0.319    0.752    
## LandSqFt             63.474      4.996  12.706  4.6e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60100 on 37 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.8154 
## F-statistic: 59.89 on 3 and 37 DF,  p-value: 2.859e-14
## TotalMainAreaSqFt      MainAreaSqFt          LandSqFt 
##          7.280989          7.363087          1.124179

4.1 Code

#load packages
library(readr)
library(car)
library(ggplot2)
library(tidyr)
library(dplyr)
library(knitr)
library(broom)

#read in data
df<-read.csv("https://raw.githubusercontent.com/btarin12/IE5344/refs/heads/main/Project%202%20Tax%20Assessment.csv")

#column definition
colnames(df)<-c("StreetNumber","X2025MarketValue","TotalImprovMarketValue","TotalLandMarketValue",
                "HomesteadCapLoss","TotalMainAreaSqFt","MainAreaSqFt","MainAreaValue",
                "GarageSqFt","GarageValue","LandSqFt")

#convert values to real numbers and remove the street number column
#str(df)
df[2:11] <- lapply(df[2:11], as.numeric)
#str(df)
df<-df[,-1]

#create a data frame with just the square footage variables
pv<-df<-df %>%select(X2025MarketValue,TotalMainAreaSqFt,MainAreaSqFt,GarageSqFt,LandSqFt)

#remove 6321 88th Street data
pvf<-pv[-13,]

#summary statistics on data
kable(summary(pvf))

pv_long <- pivot_longer(pvf, cols = everything(), names_to = "Variable", values_to = "Value")

# Plot all histograms
ggplot(pv_long, aes(x = Value)) +
  geom_histogram(bins = 30, color = "black", fill = "lightblue") +
  facet_wrap(~ Variable, scales = "free", ncol = 3) +
  labs(title = "Histograms of Tax Assessment Variables", x = "Value", y = "Frequency") +
  theme_minimal()
# Plot all boxplots
ggplot(pv_long, aes(x = Variable, y = Value)) +
  geom_boxplot(fill = "lightblue", color = "black", outlier.color = "red") +
  labs(title = "Boxplots of Tax Assessment Variables", x = "Variable", y = "Value") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
pv_long <- pv %>%
  pivot_longer(cols = c(GarageSqFt, LandSqFt, MainAreaSqFt, TotalMainAreaSqFt),
               names_to = "Variable",
               values_to = "Value")

# scatter plot
ggplot(pv_long, aes(x = X2025MarketValue, y = Value, color = Variable)) +
  geom_point(size = 2) +
  labs(title = "Scatter Plot of Variables vs X2025MarketValue",
       x = "2025 Market Value",
       y = "Square Footage") +
  theme_minimal()
library(knitr)
#first model iteration using all the square footage variables
model<-lm(X2025MarketValue~TotalMainAreaSqFt+MainAreaSqFt+GarageSqFt+LandSqFt,data=pvf)
summary(model)

#second iteration removing the garage square footage since it returned NA values
#in the summary statistics
model2<-lm(X2025MarketValue~TotalMainAreaSqFt+MainAreaSqFt+LandSqFt,data=pvf)
summary(model2)
vif(model2)

#third iteration removing the total main area LandSqFt,data=pv)
model3<-lm(X2025MarketValue~TotalMainAreaSqFt+LandSqFt,data=pvf)
summary(model3)
plot(model3)
#hat values > 0.098
hatvals<-hatvalues(model3)
hatvals[hatvals>0.098]

#remove points with leverage
pvf2<- pvf[-c(1,2,6,14,16,24,27),]
model4<-lm(X2025MarketValue~TotalMainAreaSqFt+LandSqFt,data=pvf2)
summary(model4)

pv14<-df<-df %>%select(X2025MarketValue,TotalMainAreaSqFt,MainAreaSqFt,GarageSqFt,LandSqFt)
pv14<-pv14<-pv14[13,]

prediction<-predict(model4,newdata=pv14,interval="prediction")

actual_value <- pv14$X2025MarketValue

# Combine into a small data frame for easy comparison
comparison <- data.frame(
  Actual_Value = actual_value,
  Predicted_Value = prediction[, "fit"],
  Lower_Bound = prediction[, "lwr"],
  Upper_Bound = prediction[, "upr"]
)


# View the comparison
kable(comparison, caption = "Comparison of Actual and Predicted Values with Prediction Interval")