Ibrahim Pinzon Perez
4/12/2023
Melbourne, Australia. For those who know the city, the first mentions of its characteristics might be the fact that they have wild penguins (Yes, they really do), the fact that they are the city with the most cafes per capita, or the phenomena that may occur where it feels like you’re experiencing all 4 seasons in just one day. What many won’t mention however, is the complexity of its housing market. Michael Yardney, head writer of propertyudpate.com.au, which has been rated #1 in the property investment blog category for the past 7 years, states that one in three Melbourne suburbs has a median house price of at least $1 million, and it is estimated that by 2030, Melbourne will overtake Sydney as Australia’s largest capital city (Melbourne Property Forecast).
The dataset was created by Tony Pino, a Telecoms Engineer at Telstra in Melbourne, and was scraped from publicly available results posted every week from Domain.com.au, and looks at many variables that aren’t much different from other housing market analysis across the world. They are however, variables that will require unit translations to get a better understanding of the data, such as land size (in square meters), distance from CBD (in km) , price (in Australian Dollars), and more. There are also variables that aren’t of much use to us, such as real estate agent name, scraped bedroom data, and type of sale.
To understand the data, here are the list of variables and their meaning in the context of our data
Rooms: Number of rooms
Price: Price in dollars (AUD)
Type: br - bedroom(s); h - house, cottage, villa, semi, terrace; u - unit, duplex; t - townhouse; dev site - development site; o res - other residential.
Date: Date sold
Distance: Distance from CBD (in km)
Regionname: General Region (West, North West, North, North east …etc)
Propertycount: Number of properties that exist in the suburb.
Bathroom: Number of Bathrooms
Car: Number of carspots
Landsize: Land Size (square meters)
BuildingArea: Building Size (square meters)
CouncilArea: Governing council for the area
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.4 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Rows: 13580 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Suburb, Address, Type, Method, SellerG, Date, CouncilArea, Regionname
## dbl (13): Rooms, Price, Distance, Postcode, Bedroom2, Bathroom, Car, Landsiz...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We are going to remove columns that will not be of use to us in this data exploration
We have listed the variable names, meanings, and units. Now it’s time to get a better understanding of our dataset with some simple observations.
Let us first look at the relationships between our quantitative variables.
## corrplot 0.92 loaded
melbquant <- melbh %>% select(c(Rooms, Price, Distance, Bathroom, Car, Landsize, BuildingArea))
melbmat <- cor(melbquant, use = "pairwise.complete.obs")
corrplot(melbmat, method = "color")We used a correlation matrix heatmap to have an easy visual component on understanding relationships between the quantitative variables in our dataset. We can see some fairly strong positive correlation, as antipicated with a housing market, with many of our variables. Let’s plot just some of the stronger ones.
ggplot(melbh, aes(x = Rooms, y = Price)) +
geom_point() +
ggtitle("Price vs. Number of Rooms") +
ylab("Price (in AUD)") +
xlab("Number of Rooms") +
theme_minimal()ggplot(melbh, aes(x = Distance, y = Price)) +
geom_point() +
xlab("Distance (in km)") +
ylab("Price (in AUD)") +
ggtitle("Distance from CBD vs Price")ggplot(melbh, aes(x = Rooms, y = Bathroom)) +
geom_point() +
xlab("Number of Rooms") +
ylab("Number of Bathrooms") +
ggtitle("Number of Bathrooms vs. Number of Rooms")It is said that no two houses are built the same, and in our dataset, we see a lot of variation within our observations. For starters, the ranges of our variables extend further than many of us anticipated. There are houses with up to 10 rooms, land sizes exceeding 20,000 feet squared (Maybe they own the beach?), and prices hitting nearly $10 million Australian dollars. Speaking on the correlation explicitly, I stated previously that positive correlation is expected when it comes to quantitative variables in any housing market dataset. Usually, more anything equals a more expensive house, as was the case for 2 of the relationships that we plotted. The more rooms it had, the more expensive it was, and the more bathrooms it had. Also, we were able to see a negative correlation on distance from CBD vs. price, meaning that the further away a house was from the commercial business district, the cheaper it was on the market.
If you google the top neighborhoods in Melbourne, I can almost guarantee you’ll be amazed by how beautiful the architecture and interior design is. In my opinion, the one that stands out the most is Brighton. I will be providing some graphics to portray the beauty that is the housing market within Brighton, a suburb built on the south-east coast of Melbourne’s central business district.
In the code below, we are going to be filtering to look at only the Brighton community, which includes Brighton East, and looking at a 3D model of the housing market by sold date, price, and distance from the commercial business district.
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
brighton <- melbh %>%
filter(Suburb %in% c("Brighton", "Brighton East"))
brighton$MonthYear <- format(as.Date(brighton$Date, "%d/%m/%Y"), "%b %Y")
plot_ly(brighton, x = ~Date, y = ~Price, z = ~Distance, color = ~Suburb,
type = "scatter3d", mode = "markers") %>%
add_markers(y = ~Price, z = ~Distance, text = ~paste("Date: ", Date, "<br>", "Price: ", comma(Price), "<br>","Distance: ", Distance, "km", "<br>", "Rooms", Rooms, "<br>", "Postal Code", Postcode), showlegend = FALSE,
hovertemplate = "%{text}") %>%
layout(scene = list(hoverlabel = list(bgcolor = "white", align = "left")),
title = "Brighton House Prices by Date, Price, and Distance from CBD")## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
It’s need not be said that the housing market in Melbourne is incredibly diverse. This however is no surprise, given the fact that the city of Melbourne has an area of 3,858 miles squared. You can fit about 7.5 Montgomery Counties in that! This data exploration was more than just a mere project. I am a DACA recipient, and while living as one may pose many disadvantages, I take pride in knowing that I always manage to overcome any obstacle in my way. While i’m not eligible for financial aid, i’m still a first generation college student. While I can’t be employed federally, i’ve networked with dozens of politicians to provide those with my status a pathway to citizenship, and while i’m also at a disadvantage when it comes to home ownership, I know I can make it work. It’s not just a luxury, it’s a milestone.
As stated, the dataset was created by Tony Pino, a Telecoms Engineer at Telstra in Melbourne, and was scraped from publicly available results posted every week from Domain.com.au. There were many interesting findings in our data exploration, but not all were quite surprising results. We saw positive correlation from nearly all quantitative variables, but not all were as strong as price vs rooms, price vs bathrooms, rooms vs bathrooms, and land size vs building area, which is to be expected.
Numbers usually never tell the complete story. There is so much more within any data that cannot be calculated, formally observed, or referenced. We see that in our 3D rendering of the Brighton neighborhood plot. We plotted price vs. date sold vs. distance from the commercial business district and had an interesting observation: As the years progress, the prices of houses with similar characteristics decrease.
There is so much more that could be done with this dataset, and I provided just one of the possibilities for getting insight on what the data can provide.