library(readxl)
## Warning: package 'readxl' was built under R version 4.4.1
df <- read_xlsx('./Apartments.xlsx')
df <- as.data.frame(df)
head(df)
## Age Distance Price Parking Balcony
## 1 7 28 1640 0 1
## 2 18 1 2800 1 0
## 3 7 28 1660 0 0
## 4 28 29 1850 0 1
## 5 18 18 1640 1 1
## 6 28 12 1770 0 1
Description:
df$Parking <- factor(df$Parking,
levels = c(0,1),
labels = c('No', 'Yes'))
df$Balcony <- factor(df$Balcony,
levels = c(0,1),
labels = c('No','Yes'))
shapiro.test(df$Price)
##
## Shapiro-Wilk normality test
##
## data: df$Price
## W = 0.94017, p-value = 0.0006513
# We assume normality regardless of the outcome of the Shapiro test.
# Thus, we proceed with the t-test.
t.test(df$Price, mu = 1900, alternative = 'two.sided')
##
## One Sample t-test
##
## data: df$Price
## t = 2.9022, df = 84, p-value = 0.004731
## alternative hypothesis: true mean is not equal to 1900
## 95 percent confidence interval:
## 1937.443 2100.440
## sample estimates:
## mean of x
## 2018.941
fit1 <- lm(Price ~ Age,
data = df)
summary(fit1)
##
## Call:
## lm(formula = Price ~ Age, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -623.9 -278.0 -69.8 243.5 776.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2185.455 87.043 25.108 <2e-16 ***
## Age -8.975 4.164 -2.156 0.034 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 369.9 on 83 degrees of freedom
## Multiple R-squared: 0.05302, Adjusted R-squared: 0.04161
## F-statistic: 4.647 on 1 and 83 DF, p-value: 0.03401
cor(df$Price, df$Age)
## [1] -0.230255
Regression coefficient \(b_1\): The coefficient suggests that increasing the age of an apartment by 1, results in an average decrease in price by 8.975 euro per \(m^2\) (p-value = 0.034).
Coefficient of correlation: The coefficient of correlation is -0.23 which signifies a weak negative correlation between the age and price of an apartment.
Coefficient of determination \(R^2\): Suggests that 5.3% of the variability in price can be explained by the age of an apartment.
library(car)
## Warning: package 'car' was built under R version 4.4.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.4.3
scatterplotMatrix(df[,c(1:3)], smooth = F)
fit2 <- lm(Price ~ Age + Distance,
data = df)
vif(fit2)
## Age Distance
## 1.001845 1.001845
df$stdresid <- round(rstandard(fit2), 3)
df$cooksd <- round(cooks.distance(fit2), 3)
hist(df$stdresid)
hist(df$cooksd)
head(df[order(-df$cooksd),], 10)
## Age Distance Price Parking Balcony stdresid cooksd
## 38 5 45 2180 Yes Yes 2.577 0.320
## 55 43 37 1740 No No 1.445 0.104
## 33 2 11 2790 Yes No 2.051 0.069
## 53 7 2 1760 No Yes -2.152 0.066
## 22 37 3 2540 Yes Yes 1.576 0.061
## 39 40 2 2400 No Yes 1.091 0.038
## 58 8 2 2820 Yes No 1.655 0.037
## 25 8 26 2300 Yes Yes 1.571 0.034
## 57 10 1 2810 No No 1.601 0.032
## 2 18 1 2800 Yes No 1.783 0.030
df <- df[-c(38,55,33,53,22),]
hist(df$cooksd)
fit2 <- lm(Price ~ Age + Distance,
data = df)
df$stdfitted <- scale(fit2$fitted.values)
scatterplot(y = df$stdresid, x = df$stdfitted,
boxplots = F,
regLine = F,
smooth = F,
ylab = 'Standardized Residuals',
xlab = 'Standardized fitted values')
library(olsrr)
## Warning: package 'olsrr' was built under R version 4.4.3
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
ols_test_breusch_pagan(fit2)
##
## Breusch Pagan Test for Heteroskedasticity
## -----------------------------------------
## Ho: the variance is constant
## Ha: the variance is not constant
##
## Data
## ---------------------------------
## Response : Price
## Variables: fitted values of Price
##
## Test Summary
## ----------------------------
## DF = 1
## Chi2 = 1.738591
## Prob > Chi2 = 0.1873174
hist(df$stdresid)
shapiro.test(df$stdresid)
##
## Shapiro-Wilk normality test
##
## data: df$stdresid
## W = 0.93418, p-value = 0.0004761
fit2 <- lm(Price ~ Age + Distance,
data = df)
summary(fit2)
##
## Call:
## lm(formula = Price ~ Age + Distance, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -411.50 -203.69 -45.24 191.11 492.56
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2502.467 75.024 33.356 < 2e-16 ***
## Age -8.674 3.221 -2.693 0.00869 **
## Distance -24.063 2.692 -8.939 1.57e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 256.8 on 77 degrees of freedom
## Multiple R-squared: 0.5361, Adjusted R-squared: 0.524
## F-statistic: 44.49 on 2 and 77 DF, p-value: 1.437e-13
Age: An increase of one year in the age of an apartment, while controlling for all other variables, results in an average decrease of the price per square meter of the apartment by 8.674 Euro (p-value = 0.001)
Distance: An increase of 1km in the distance from the city center, while controlling for all other variables, results in an average decrease in the price per square meter of an apartment by 24.063 Euro (p-value < 0.001)
Coefficient of Determination: The \(R^2\) tells us that 53.6% of the variability in price per square meter of an apartment can be explained by linear effects of Age and Distance.
fit3 <- lm(Price ~ Age + Distance + Parking + Balcony,
data = df)
anova(fit2, fit3)
## Analysis of Variance Table
##
## Model 1: Price ~ Age + Distance
## Model 2: Price ~ Age + Distance + Parking + Balcony
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 77 5077362
## 2 75 4791128 2 286234 2.2403 0.1135
summary(fit3)
##
## Call:
## lm(formula = Price ~ Age + Distance + Parking + Balcony, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -390.93 -198.19 -53.64 186.73 518.34
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2393.316 93.930 25.480 < 2e-16 ***
## Age -7.970 3.191 -2.498 0.0147 *
## Distance -21.961 2.830 -7.762 3.39e-11 ***
## ParkingYes 128.700 60.801 2.117 0.0376 *
## BalconyYes 6.032 57.307 0.105 0.9165
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 252.7 on 75 degrees of freedom
## Multiple R-squared: 0.5623, Adjusted R-squared: 0.5389
## F-statistic: 24.08 on 4 and 75 DF, p-value: 7.764e-13
Parking: Keeping all other variables constant, an apartment with parking costs on average 128.7 Euro per square meter more compared to apartments without parking. (p-value = 0.04)
Balcony: Based on the p-value, we cannot say that having a balcony has a significant impact on the price per square meter of an apartment.
F-statistic:
\(H_0\): \(\rho^2 = 0\) (All betas equal to zero)
\(H_0\): \(\rho^2 > 0\) (At least one beta is different from 0)
df$fittedvals <- fit3$fitted.values
df$Price[2] - df$fittedvals[2]
## [1] 443.4026