The Great Australian Dream
Kenneth Tsang (s3750542)
Ahmad Hasnain (s3712538)
Goran Stojkoski (s3017862)
“The Great Australian Dream is a belief that in Australia, home-ownership can lead to a better life and is an expression of success and security. Although this standard of living is enjoyed by many in the existing Australian population, rising house prices compared to average wages are making it increasingly difficult for many to achieve the”great Australian Dream“, especially for those living in large cities.”
— Wikipedia on the ‘Australian Dream’[1]
[1] Wikipedia, ‘The Australian Dream’, (https://en.wikipedia.org/wiki/Australian_Dream), (accessed 27 October 2018).
[2] [3] C. Butt, and T. Jacks, ‘Cars continue to rule Melbourne roads, census shows’, (https://blogs.crikey.com.au/theurbanist/2018/03/19/jobs-centre-city/), 24 October 2017, (accessed 27 October 2018).
Travel time by private vehicle was used as it provided a more interesting dimension than just kilometers from the CBD; it takes into account existing infrastructure and congestion. Two suburbs both with the same distance from CBD can have varying travel times due to access to better roads/freeways and less congestion.
-The Housing Dataset is sourced from: Melbourne Housing Market
par(mfrow =c(1,2))
hist(Dataset$median_price, main="Median property prices", xlab = "Median Price",col = "lightblue")
hist(Dataset$Travel_Time, main = "Travel time to RMIT", col = "lightblue")p1 <-ggplot(Dataset, aes(x=Time_category, y=median_price, fill=Time_category)) +
geom_boxplot() + scale_y_continuous(labels = dollar) + coord_flip() +
scale_alpha_manual(values=c("0-15 mins","15-30 mins","30-45 mins","45-60 mins","60+ mins")) +
scale_fill_manual(values=c("#336600","#9ACD32", "#FFFF00","#FF6600","red"))+ theme(legend.position = "none") +
ggtitle("Property price against time category") + xlab("Time categories") + ylab("Median Price") +
theme(plot.title = element_text(colour = "Black",size = 14, face = "bold", hjust = 0.5 ) )
p2 <- barchart(col=c("#336600","#9ACD32", "#FFFF00","#FF6600","red"), main="Travel Duration",
xlab="Population percentage", x = percentage)
grid.arrange( p1, p2, nrow=1)ggplot(Dataset, aes(x=Travel_Time, y=median_price, col=Time_category)) + scale_y_continuous(trans='log',
limits=c(400000, 3500000),breaks=c(400000,500000,600000,700000,800000,900000,1000000,1200000,1400000,1600000,1800000,2000000,2250000,2500000,2750000,3000000,3500000))+ geom_point()+geom_smooth(method = lm,se=FALSE) + scale_color_manual(values=c('#336600','#9ACD32', '#FFFF00',"#FF6600","red"))+
labs(title="Melbourne:Property price against travel time", x="Travel time (in mins)",y="median property price (log scale)")+
theme( plot.title = element_text(colour = "Black",size = 14, face = "bold", hjust = 0.5 )) Dataset %>% group_by(Time_category) %>% summarise(Min = min(median_price,na.rm = TRUE),
Q1 = quantile(median_price,probs = .25,na.rm=TRUE),
Median = median(median_price, na.rm = TRUE),
Q3 = quantile(median_price,probs = .75,na.rm=TRUE),
Max = max(median_price,na.rm = TRUE),
Mean = mean(median_price, na.rm = TRUE),
SD = sd(median_price, na.rm = TRUE),
IQR=IQR(median_price,na.rm = TRUE),
n = n(),
Missing = sum(is.na(median_price))) ->table_stats
knitr::kable(table_stats)| Time_category | Min | Q1 | Median | Q3 | Max | Mean | SD | IQR | n | Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| 0-15 mins | 680000 | 1230062 | 1411250 | 1587500 | 3339000 | 1490203.1 | 572479.5 | 357437.5 | 16 | 0 |
| 15-30 mins | 650000 | 982500 | 1220000 | 1645500 | 2812500 | 1302472.5 | 416519.0 | 663000.0 | 91 | 0 |
| 30-45 mins | 362500 | 642000 | 773500 | 1030000 | 2300000 | 865817.8 | 316246.5 | 388000.0 | 133 | 0 |
| 45-60 mins | 381000 | 550000 | 640000 | 743000 | 1320000 | 669101.7 | 188585.6 | 193000.0 | 59 | 0 |
| 60+ mins | 407000 | 510000 | 667500 | 737500 | 912500 | 642442.3 | 168632.9 | 227500.0 | 13 | 0 |
cor.test(Dataset$log_median_price,Dataset$Travel_Time)##
## Pearson's product-moment correlation
##
## data: Dataset$log_median_price and Dataset$Travel_Time
## t = -14.617, df = 310, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7001423 -0.5680117
## sample estimates:
## cor
## -0.6387628
The hypothesis test for Pearson’s correlation is as follows: \[H_0: r = 0 \]
\[H_A: r \ne 0\]
log_lm <- lm(log_median_price ~ Travel_Time, data = Dataset)
log_lm %>% summary()##
## Call:
## lm(formula = log_median_price ~ Travel_Time, data = Dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83460 -0.22031 -0.03803 0.21610 1.02360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.425999 0.052243 276.13 <2e-16 ***
## Travel_Time -0.019994 0.001368 -14.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3142 on 310 degrees of freedom
## Multiple R-squared: 0.408, Adjusted R-squared: 0.4061
## F-statistic: 213.7 on 1 and 310 DF, p-value: < 2.2e-16
\(H_0: \alpha = 0\)
\(H_A: \alpha \ne 0\)
and the hypothesis test for the slope is:
\(H_0: \beta = 0\)
\(H_A: \beta \ne 0\)
Travel Time: As the google API algorithm is not open source, there is no way to verify that one travel time is truly independent to another. It is assumed that suburb’s travel time is independent to another suburb.
House Prices: There is potentailly a small amount of dependence between individual sale prices, as the real estate agents set valuations based on recent sales of similiar properties in the area. This risk is reduced by taking the median of the suburb.
plot(log_lm,which=c(1))invisible(qqPlot(log_lm,which=c(2)))plot(log_lm,which=c(3))plot(log_lm,which=c(5))House and Sold Image: https://pixabay.com
Time and architect Image: https://www.pexels.com/