Suburbs Selection

D13 dataset contains street level sale records. It contains selected suburbs sale records. The original idea is to plot all the addresses in the following map, however due to request limit of Google API, we ran out of time to get geocode of all addresses, instead we rollup to suburb.

# Extract Postcode from dataset and lookup the geocode
d13_postcode <- d13 %>% distinct(postcode) %>% mutate(postcode_key = paste(postcode,"NSW",sep=" , ")) 
for(i in 1:nrow(d13_postcode))
{
  result <- geocode(as.character(d13_postcode$postcode_key[i]), output = "latlona", source = "google")
  d13_postcode$lon[i]<- as.numeric(result[1])
  d13_postcode$lat[i] <- as.numeric(result[2])
}
# Here is the one I prepare earlier
d13_postcode <- read.csv("c:/temp/d13_postcode.csv")
sydney <- get_map(location=c(lon=mean(d13_postcode$lon, na.rm = TRUE),
                             lat=mean(d13_postcode$lat, na.rm = TRUE)),
                  zoom=11, color='color')

ggmap(sydney) +  geom_point(aes(x=lon, y=lat, color="darkred", size=10), data = d13_postcode) + 
theme(legend.position="none") + ggtitle("Sydney Housing - Suburbs Selection") + xlab("Longitude") + ylab("Latitude")

Number of property sold by property type and LGA

This is why data visualisation is so important, as it turns out we have non-residential property type in our dataset. Since we are only interested in House and Unit, other property type will be removed during data cleaning.

ggplot(d13) + geom_bar(mapping = aes(x = LGA, fill=LGA)) + facet_wrap(~property_type, nrow=2 ) +
 theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank()) + ylab("Number of Sale")

After data cleaning

Highest sales - by LGA and property type

Top 3 LGAs by number of property sold:
1. Blacktown
2. Baulkham Hills
3. Hornsby

ggplot(d13, mapping = aes(x=LGA)) + geom_bar(aes(fill=property_type)) + ylab("Number of Sale") + coord_flip()

Price Variation

This boxplot shows:
1. within same LGA house is more expensive than unit
2. there are more outlier for house in comparison to unit
3. property price in expensive LGA varies widely

Top 3 LGA for House/Unit Price
1. Woolahra
2. Mosman
3. Willoughby

d13 <- na.omit(d13)  
d13 %>%
  ggplot(aes(x=LGA, y=sale_price, fill=LGA)) + geom_boxplot() + ylim(0,5000000) +  facet_wrap(~property_type, nrow=2 ) + coord_flip() +theme(legend.position="none") + ylab("Sale Price ($)")

CPI

The Consumer Price Index (CPI) measures quarterly changes in the price of a ‘basket’ of goods and services (food, alchohol/tobacco, clothing, housing, household equipment, health, transport, communication, recreation, education, financial services).

Food, house and health are basic necessities, this graph shows health and housing are the major contributors to household expenditure. If house price keeps rising, we may have to trade off other basic necessities which is not ideal.