Nam Anh Le
Housing Prices
mydata <- read.table("./Housing.csv", header=TRUE,sep = ",", dec=".")
head(mydata)
## price area bedrooms bathrooms stories mainroad guestroom basement
## 1 13300000 7420 4 2 3 yes no no
## 2 12250000 8960 4 4 4 yes no no
## 3 12250000 9960 3 2 2 yes no yes
## 4 12215000 7500 4 2 2 yes no yes
## 5 11410000 7420 4 1 2 yes yes yes
## 6 10850000 7500 3 3 1 yes no yes
## hotwaterheating airconditioning parking furnishingstatus
## 1 no yes 2 furnished
## 2 no yes 3 furnished
## 3 no no 2 semi-furnished
## 4 no yes 3 furnished
## 5 no yes 2 furnished
## 6 no yes 2 semi-furnished
Unit of Observation: one house
The sample size is 545
Definition of Variable:
Data Source: https://www.kaggle.com/datasets/yasserh/housing-prices-dataset/data
mydatanew <- mydata
mydatanew$mainroad <- factor(mydata$mainroad,
levels = c("yes","no"),
labels = c("yes","no"))
mydatanew$basement <- factor(mydata$basement,
levels = c("yes","no"),
labels = c("yes","no"))
mydatanew$airconditioning <- factor(mydata$airconditioning,
levels = c("yes","no"),
labels = c("yes","no"))
mydatanew$hotwaterheating <- factor(mydata$hotwaterheating,
levels = c("yes","no"),
labels = c("yes","no"))
mydatanew$guestroom <- factor(mydata$guestroom,
levels = c("yes","no"),
labels = c("yes","no"))
mydatanew$airconditioning <- factor(mydata$airconditioning,
levels = c("yes","no"),
labels = c("yes","no"))
mydatanew$furnishingstatus <- factor(mydata$furnishingstatus,
levels = c("furnished","semi-furnished","unfurnished"),
labels = c("furnished","semi-furnished","unfurnished"))
mydatanew$price <- mydatanew$price / 1000000
library(psych)
describeBy(mydatanew$price,group = mydatanew$furnishingstatus)
##
## Descriptive statistics by group
## group: furnished
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 140 5.5 2.12 5.08 5.28 2.02 1.75 13.3 11.55 1.06 1.34 0.18
## ------------------------------------------------------------
## group: semi-furnished
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 227 4.91 1.6 4.58 4.71 1.19 1.77 12.25 10.48 1.42 2.85 0.11
## ------------------------------------------------------------
## group: unfurnished
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 178 4.01 1.72 3.43 3.78 1.14 1.75 10.15 8.4 1.29 1.25 0.13
Research Question: Is furnishing status and house price related?
Conditions and assumptions:
Homoskedasticity test:
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
leveneTest(mydatanew$price,group=mydatanew$furnishingstatus)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 7.4278 0.000657 ***
## 542
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reject \(H_0\) at p value < 0.001
Normality test:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
mydatanew %>%
group_by(furnishingstatus) %>%
shapiro_test(price)
## # A tibble: 3 × 4
## furnishingstatus variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 furnished price 0.930 1.94e- 6
## 2 semi-furnished price 0.901 4.01e-11
## 3 unfurnished price 0.875 5.20e-11
Reject \(H_0\) for all 3 groups since p-value is < 0.001
Parametric Test:
ANOVA_Results <- aov(price~furnishingstatus,
data = mydatanew)
summary(ANOVA_Results)
## Df Sum Sq Mean Sq F value Pr(>F)
## furnishingstatus 2 179.8 89.90 28.27 2.09e-12 ***
## Residuals 542 1723.4 3.18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Reject \(H_0\) at p value < 0.001
library(effectsize)
##
## Attaching package: 'effectsize'
## The following objects are masked from 'package:rstatix':
##
## cohens_d, eta_squared
## The following object is masked from 'package:psych':
##
## phi
eta_squared(ANOVA_Results)
## For one-way between subjects designs, partial eta squared is equivalent
## to eta squared. Returning eta squared.
## # Effect Size for ANOVA
##
## Parameter | Eta2 | 95% CI
## --------------------------------------
## furnishingstatus | 0.09 | [0.06, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
interpret_eta_squared(0.09, rules = "cohen1992")
## [1] "small"
## (Rules: cohen1992)
There are small differences between the mean price for all level of furnishing
library(onewaytests)
##
## Attaching package: 'onewaytests'
## The following object is masked from 'package:psych':
##
## describe
welch.test(price ~ furnishingstatus,
data = mydatanew)
##
## Welch's Heteroscedastic F Test (alpha = 0.05)
## -------------------------------------------------------------
## data : price and furnishingstatus
##
## statistic : 25.89152
## num df : 2
## denom df : 311.3324
## p.value : 3.965734e-11
##
## Result : Difference is statistically significant.
## -------------------------------------------------------------
Reject \(H_0\) at p-value < 0.001
pairwise.t.test(x = mydatanew$price, g = mydatanew$furnishingstatus,
p.adj = "bonf")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: mydatanew$price and mydatanew$furnishingstatus
##
## furnished semi-furnished
## semi-furnished 0.0068 -
## unfurnished 2.1e-12 2.3e-06
##
## P value adjustment method: bonferroni
The furnishing status does significantly affect house prices in all pairwise comparisons:
There is a significant difference between furnished and semi-furnished houses (p = 0.007).
There is a highly significant difference between semi-furnished and unfurnished houses (p < 0.001).
There is a highly significant difference between furnished and unfurnished houses (p < 0.001).
Non-Parametric Test:
kruskal.test(price ~ furnishingstatus,
data = mydatanew)
##
## Kruskal-Wallis rank sum test
##
## data: price by furnishingstatus
## Kruskal-Wallis chi-squared = 69.583, df = 2, p-value = 7.767e-16
Reject \(H_0\) at p value < 0.001
kruskal_effsize(price ~ furnishingstatus,
data = mydatanew)
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 price 545 0.125 eta2[H] moderate
The effect size suggests that there are moderate differences between the distribution of prices
library(rstatix)
groups_nonpar <- wilcox_test(price ~ furnishingstatus,
paired = FALSE,
p.adjust.method = "bonferroni",
data = mydatanew)
groups_nonpar
## # A tibble: 3 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 price furnished semi-… 140 227 18238. 1.7 e- 2 5.2 e- 2 ns
## 2 price furnished unfur… 140 178 18366. 4 e-13 1.20e-12 ****
## 3 price semi-furnis… unfur… 227 178 28260. 5.55e-12 1.67e-11 ****
There is no significant difference between furnished and semi-furnished houses after adjusting for multiple comparisons (p.adj = 0.052).
There is a highly significant difference between furnished and unfurnished houses (p.adj = 1.20e-12).
There is a highly significant difference between semi-furnished and unfurnished houses (p.adj = 1.67e-11).
Most Suitable Test:
Conclusion:
There is strong statistical evidence (p < 0.001)
that house prices differ based on furnishing status. The effect size
suggests there are moderate differences between the distribution of
location of prices. Post-hoc analysis reveals that the distribution
location of unfurnished homes’ prices is significantly different from to
both furnished and semi-furnished homes. However, there is no
significant difference between the distribution location of the prices
of furnished and semi-furnished homes.