Fitzroy or St.Kilda: Testing Airbnb’s price

MATH1324 Assignment 3

Nely Niawati (s3757886), Kathleen Magbual (s3768288), Karen Gonzalez (s3697003)

Last updated: 31 May, 2019

Introduction

Problem Statement

Data preprocessing

Data import to R

setwd("C:/Users/karen/OneDrive/Documents/MC242/Intro to Statistics/Assignments/A3")

airbnb <- read_excel("airbnb.xlsx")

head(airbnb)

Data import to R cont..

# Set variable "suburb"" as factor ( )
airbnb$suburb <- airbnb$suburb %>% factor(levels=c("Fitzroy", "St Kilda"))

airbnb

Descriptive Statistics

# Descriptive Statistics
desc_stats <- airbnb %>% group_by(suburb) %>% summarise(Min = min(price,na.rm = TRUE),
                                        Q1 = quantile(price,probs = .25,na.rm = TRUE),
                                        Median = median(price, na.rm = TRUE),
                                        Q3 = quantile(price,probs = .75,na.rm = TRUE),
                                        IQR = IQR(price, na.rm=TRUE),
                                        Max = max(price,na.rm = TRUE),
                                        Mean = mean(price, na.rm = TRUE),
                                        SD = sd(price, na.rm = TRUE),
                                        n = n(),
                                        Missing = sum(is.na(price))) 
kable(desc_stats)
suburb Min Q1 Median Q3 IQR Max Mean SD n Missing
Fitzroy 64 100 129 150 50 695 146.3655 85.82725 145 0
St Kilda 62 107 130 174 67 450 146.2552 60.43152 145 0

Visualization

# Box-plot:
boxplot(price ~ suburb, data = airbnb, ylab = "Suburb", xlab="Daily Price of an Entire Apartment", 
        main = "Entire Apartment Daily Price Comparison", horizontal = TRUE)

Hypothesis Testing

\[H_0: \mu_1 = \mu_2 (\mu_1 - \mu_2 = 0)\]

\[H_A: \mu_1 \ne \mu_2 (\mu_1 - \mu_2 \ne 0)\]

Normality Assumption

airbnb_fitzroy <- airbnb %>% filter(suburb == "Fitzroy")
airbnb_fitzroy$price %>% qqPlot(dist="norm")

## [1] 69 82
airbnb_sk <- airbnb %>% filter(suburb == "St Kilda")
airbnb_sk$price %>% qqPlot(dist="norm")

## [1]  42 121

Homogeneity of Variance Assumption

\[H_0: \sigma_1^2 = \sigma_2^2 \]

\[H_A: \sigma_1^2 \ne \sigma_2^2 \]

leveneTest(price ~ suburb, data=airbnb)

Independent Two-Sample T-test

result <- t.test(price ~ suburb,
       data = airbnb,
       var.equal = TRUE,
       alternative = "two.sided"
       )
result
## 
##  Two Sample t-test
## 
## data:  price by suburb
## t = 0.012658, df = 288, p-value = 0.9899
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -17.04700  17.26769
## sample estimates:
##  mean in group Fitzroy mean in group St Kilda 
##               146.3655               146.2552

Discussion

References