Data <- read.table("./Data/housing.txt", header = TRUE)
head(Data)
## P S A Ut Pol Fp
## 1 205.452 23.46 6 0 0 1
## 2 185.328 20.03 5 0 0 1
## 3 248.422 27.77 6 0 0 0
## 4 154.690 20.17 1 0 0 0
## 5 221.801 26.45 0 0 0 1
## 6 199.119 21.56 6 0 0 1
hist(Data$P)
The are bell shaped which indicating Housing Price is assumed to be
normally distributed.
pairs(Data)
cor(Data)
## P S A Ut Pol Fp
## P 1.00000000 0.594678480 -0.07985190 0.728744476 0.051889626 0.064752140
## S 0.59467848 1.000000000 -0.02718335 0.023370608 -0.004175866 0.098473362
## A -0.07985190 -0.027183352 1.00000000 -0.031958507 0.027663447 0.033123525
## Ut 0.72874448 0.023370608 -0.03195851 1.000000000 0.020482871 -0.007378108
## Pol 0.05188963 -0.004175866 0.02766345 0.020482871 1.000000000 -0.043068451
## Fp 0.06475214 0.098473362 0.03312353 -0.007378108 -0.043068451 1.000000000
To see the pairs plot we see the Housing price are linearly corrected with living area and age of the house.
fitF <- lm(P~S+Ut+I(S*Ut)+A+Pol+Fp+I(Pol*Fp), data = Data)
summary(fitF)
##
## Call:
## lm(formula = P ~ S + Ut + I(S * Ut) + A + Pol + Fp + I(Pol *
## Fp), data = Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.410 -10.193 0.112 10.577 44.933
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.24138 6.20934 3.904 0.000101 ***
## S 7.61632 0.24536 31.042 < 2e-16 ***
## Ut 27.46998 8.42541 3.260 0.001151 **
## I(S * Ut) 1.29851 0.33216 3.909 9.89e-05 ***
## A -0.18941 0.05123 -3.697 0.000230 ***
## Pol 5.06230 1.67022 3.031 0.002501 **
## Fp 1.93414 1.08628 1.781 0.075298 .
## I(Pol * Fp) -1.40931 2.39584 -0.588 0.556511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.23 on 992 degrees of freedom
## Multiple R-squared: 0.8706, Adjusted R-squared: 0.8697
## F-statistic: 953.6 on 7 and 992 DF, p-value: < 2.2e-16
Large value o F statistic (953.6) with small p-value implies that we may fail to reject the null hypothesis H0:β1= δ2 = γ1 = β3 =δ4 = δ5 = γ2 = 0 at 5% significance. In order word the regression model provides (1) a better fit than a model that contains no independent variables.
The goodness of fit statistic is R^2adj = 0.8697. It also indicates that the model fits the data well about 87% variation in process can be explained by the variables ‘Living area’
plot(fitF)
Residual Plots indicate no violations of assumptions
# Obtain the residuals from the model
residuals <- resid(fitF)
#print(residuals)
plot(residuals)
# Calculate the sum of squares of residuals
SS_residuals <- sum(residuals^2)
print("Sum of squares of residuals: ")
## [1] "Sum of squares of residuals: "
print(SS_residuals)
## [1] 230104.2
fitF2 <- lm(P~S+A, data = Data)
summary(fitF2)
##
## Call:
## lm(formula = P ~ S + A, data = Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -79.017 -30.271 3.693 30.292 72.867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.2309 9.4052 3.64 0.000287 ***
## S 8.5723 0.3671 23.35 < 2e-16 ***
## A -0.2853 0.1136 -2.51 0.012228 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33.85 on 997 degrees of freedom
## Multiple R-squared: 0.3577, Adjusted R-squared: 0.3564
## F-statistic: 277.6 on 2 and 997 DF, p-value: < 2.2e-16
# Obtain the residuals from the model
residuals2 <- resid(fitF2)
#print(residuals)
plot(residuals2)
# Calculate the sum of squares of residuals
SS_residuals2 <- sum(residuals2^2)
print("Sum of squares of Reduce model residuals: ")
## [1] "Sum of squares of Reduce model residuals: "
print(SS_residuals2)
## [1] 1142293
We may reject the null hypothesis Hypothesis H0 at 5% level of significance and conclude that indicator variables are jointly significance.
fitF3 <- lm(P~S+Ut+I(S*Ut)+A+Pol+Fp, data = Data)
summary(fitF3)
##
## Call:
## lm(formula = P ~ S + Ut + I(S * Ut) + A + Pol + Fp, data = Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.289 -10.141 0.148 10.565 44.783
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.5000 6.1917 3.957 8.13e-05 ***
## S 7.6122 0.2452 31.048 < 2e-16 ***
## Ut 27.4530 8.4226 3.259 0.001154 **
## I(S * Ut) 1.2994 0.3321 3.913 9.72e-05 ***
## A -0.1901 0.0512 -3.712 0.000217 ***
## Pol 4.3772 1.1967 3.658 0.000268 ***
## Fp 1.6492 0.9720 1.697 0.090056 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.23 on 993 degrees of freedom
## Multiple R-squared: 0.8706, Adjusted R-squared: 0.8698
## F-statistic: 1113 on 6 and 993 DF, p-value: < 2.2e-16
The goodness-of-fit statistic is 𝑅𝑎𝑑𝑗^2 = 0.8698, indicating that the model fits the data well 𝐹 ={(1142293−230184.2)/5}/{230184.2/(1000−7) = 787.23 At 5% level of significance the critical value is 𝐹0.05;4,993 = 2.372 Indicator variables are jointly significant at 5% level of significance All differential effects are also statistically significant. Fitted regression model is y_hat = 24.50 + 7.612 𝑥1 + 27.453 𝐷2 + 1.299 (𝐷2 ∗ 𝑥1) − 0.190 𝑥3 + 4.377 𝐷4 + 1.649 𝐷5 Based on the regression results, we estimate that - Location which is near the university increases house prices by $27,453 (27.453*1000) - The change in expected price per additional square foot is $89.12 ($76.12+12.99) for houses near the university and $76.12 for houses in other areas (no pools and fire places) controlling other variables Note for each 100 square feet $7612 - Houses depreciate $190.10 per year - A pool increases the value of a home by $4,377.20
fitF4 <- lm(P~S+Ut+I(S*Ut)+A+Pol, data = Data)
summary(fitF4)
##
## Call:
## lm(formula = P ~ S + Ut + I(S * Ut) + A + Pol, data = Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -49.352 -10.241 0.269 10.397 45.400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.13775 6.19389 3.897 0.000104 ***
## S 7.66031 0.24376 31.426 < 2e-16 ***
## Ut 28.37492 8.41298 3.373 0.000773 ***
## I(S * Ut) 1.26232 0.33164 3.806 0.000150 ***
## A -0.18697 0.05122 -3.650 0.000276 ***
## Pol 4.28786 1.19666 3.583 0.000356 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.24 on 994 degrees of freedom
## Multiple R-squared: 0.8702, Adjusted R-squared: 0.8695
## F-statistic: 1333 on 5 and 994 DF, p-value: < 2.2e-16