This dataset is about the selling price of real estate in Melbourne. We can obtain the dataset from https://www.kaggle.com/anthonypino/melbourne-housing-market . The type of real estate contain house, unit, and townhouse. The variables in the dataset can be seen below.
house <- read.csv("MELBOURNE_HOUSE_PRICES_LESS.csv")
head(house)
## Suburb Address Rooms Type Price Method SellerG Date
## 1 Abbotsford 49 Lithgow St 3 h 1490000 S Jellis 1/04/2017
## 2 Abbotsford 59A Turner St 3 h 1220000 S Marshall 1/04/2017
## 3 Abbotsford 119B Yarra St 3 h 1420000 S Nelson 1/04/2017
## 4 Aberfeldie 68 Vida St 3 h 1515000 S Barry 1/04/2017
## 5 Airport West 92 Clydesdale Rd 2 h 670000 S Nelson 1/04/2017
## 6 Airport West 4/32 Earl St 2 t 530000 S Jellis 1/04/2017
## Postcode Regionname Propertycount Distance
## 1 3067 Northern Metropolitan 4019 3.0
## 2 3067 Northern Metropolitan 4019 3.0
## 3 3067 Northern Metropolitan 4019 3.0
## 4 3040 Western Metropolitan 1543 7.5
## 5 3042 Western Metropolitan 3464 10.4
## 6 3042 Western Metropolitan 3464 10.4
## CouncilArea
## 1 Yarra City Council
## 2 Yarra City Council
## 3 Yarra City Council
## 4 Moonee Valley City Council
## 5 Moonee Valley City Council
## 6 Moonee Valley City Council
tail(house)
## Suburb Address Rooms Type Price Method SellerG
## 63018 Preston 229 Murray Rd 3 h 808000 S RW
## 63019 Roxburgh Park 3 Carr Pl 3 h 566000 S Raine
## 63020 Roxburgh Park 9 Parker Ct 3 h 500000 S Raine
## 63021 Roxburgh Park 5 Parkinson Wy 3 h 545000 S Raine
## 63022 Thomastown 3/1 Travers St 3 u NA PI Barry
## 63023 Williams Landing 1 Diadem Wy 4 h NA SP Aussie
## Date Postcode Regionname Propertycount Distance
## 63018 31/03/2018 3072 Northern Metropolitan 14577 8.4
## 63019 31/03/2018 3064 Northern Metropolitan 5833 20.6
## 63020 31/03/2018 3064 Northern Metropolitan 5833 20.6
## 63021 31/03/2018 3064 Northern Metropolitan 5833 20.6
## 63022 31/03/2018 3074 Northern Metropolitan 7955 15.3
## 63023 31/03/2018 3027 Western Metropolitan 1999 17.6
## CouncilArea
## 63018 Darebin City Council
## 63019 Hume City Council
## 63020 Hume City Council
## 63021 Hume City Council
## 63022 Whittlesea City Council
## 63023 Wyndham City Council
str(house)
## 'data.frame': 63023 obs. of 13 variables:
## $ Suburb : chr "Abbotsford" "Abbotsford" "Abbotsford" "Aberfeldie" ...
## $ Address : chr "49 Lithgow St" "59A Turner St" "119B Yarra St" "68 Vida St" ...
## $ Rooms : int 3 3 3 3 2 2 2 3 6 3 ...
## $ Type : chr "h" "h" "h" "h" ...
## $ Price : int 1490000 1220000 1420000 1515000 670000 530000 540000 715000 NA 1925000 ...
## $ Method : chr "S" "S" "S" "S" ...
## $ SellerG : chr "Jellis" "Marshall" "Nelson" "Barry" ...
## $ Date : chr "1/04/2017" "1/04/2017" "1/04/2017" "1/04/2017" ...
## $ Postcode : int 3067 3067 3067 3040 3042 3042 3042 3042 3021 3206 ...
## $ Regionname : chr "Northern Metropolitan" "Northern Metropolitan" "Northern Metropolitan" "Western Metropolitan" ...
## $ Propertycount: int 4019 4019 4019 1543 3464 3464 3464 3464 1899 3280 ...
## $ Distance : num 3 3 3 7.5 10.4 10.4 10.4 10.4 14 3 ...
## $ CouncilArea : chr "Yarra City Council" "Yarra City Council" "Yarra City Council" "Moonee Valley City Council" ...
dim(house)
## [1] 63023 13
names(house)
## [1] "Suburb" "Address" "Rooms" "Type"
## [5] "Price" "Method" "SellerG" "Date"
## [9] "Postcode" "Regionname" "Propertycount" "Distance"
## [13] "CouncilArea"
# Renaming all of column house to be more easily understand
names(house) <- c("Suburb","Address","Rooms","Type",
"Price","Method_of_Sales","Agent",
"Date_of_Sold","Postcode","Region",
"Property_Count","Distance", "Council_Area")
From our inspection we can conclude:
The dataset contain 63023 of rows and 13 columns
Each of column name: “Suburb”,“Address”,“Rooms”,“Type”, “Price”,“Method_of_Sales”,“Agent”, “Date_of_Sold”,“Postcode”,“Region”, “Property_Count”,“Distance”, “Council_Area”
summary(house)
## Suburb Address Rooms Type
## Length:63023 Length:63023 Min. : 1.000 Length:63023
## Class :character Class :character 1st Qu.: 3.000 Class :character
## Mode :character Mode :character Median : 3.000 Mode :character
## Mean : 3.111
## 3rd Qu.: 4.000
## Max. :31.000
##
## Price Method_of_Sales Agent Date_of_Sold
## Min. : 85000 Length:63023 Length:63023 Length:63023
## 1st Qu.: 620000 Class :character Class :character Class :character
## Median : 830000 Mode :character Mode :character Mode :character
## Mean : 997898
## 3rd Qu.: 1220000
## Max. :11200000
## NA's :14590
## Postcode Region Property_Count Distance
## Min. :3000 Length:63023 Min. : 39 Min. : 0.00
## 1st Qu.:3056 Class :character 1st Qu.: 4380 1st Qu.: 7.00
## Median :3107 Mode :character Median : 6795 Median :11.40
## Mean :3126 Mean : 7618 Mean :12.68
## 3rd Qu.:3163 3rd Qu.:10412 3rd Qu.:16.70
## Max. :3980 Max. :21650 Max. :64.10
##
## Council_Area
## Length:63023
## Class :character
## Mode :character
##
##
##
##
Check missing value
colSums(is.na(house))
## Suburb Address Rooms Type Price
## 0 0 0 0 14590
## Method_of_Sales Agent Date_of_Sold Postcode Region
## 0 0 0 0 0
## Property_Count Distance Council_Area
## 0 0 0
There’re some missing value at column Price. So we need to replace that missing value
Replace missing value Price with 0
house$Price[is.na(house$Price)] <- 0
Re-checking missing value
colSums(is.na(house))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method_of_Sales Agent Date_of_Sold Postcode Region
## 0 0 0 0 0
## Property_Count Distance Council_Area
## 0 0 0
Great!! We’re already done with this missing value. Next, we need to change the data type for each column
Change the data type
house$Rooms <- as.factor(house$Rooms)
house$Type <- as.factor(house$Type)
house$Method_of_Sales <- as.factor(house$Method_of_Sales)
house$Region <- as.factor(house$Region)
house$Date_of_Sold <- as.Date(house$Date_of_Sold, format = "%d/%m/%Y")
# renaming levels of factor
levels(house$Type) <- list(House = "h", Townhouse = "t", Unit = "u")
levels(house$Method_of_Sales) <- list(`property sold` = "S",
`property sold prior` = "SP",
`property passed in` = "PI",
`sold prior not disclosed` = "PN",
`sold not disclosed` = "SN",
`no bid` = "NB",
`vendor bid` = "VB",
`withdrawn prior to auction` = "W",
`sold after auction` = "SA",
`sold after auction price not disclosed` = "SS")
str(house)
## 'data.frame': 63023 obs. of 13 variables:
## $ Suburb : chr "Abbotsford" "Abbotsford" "Abbotsford" "Aberfeldie" ...
## $ Address : chr "49 Lithgow St" "59A Turner St" "119B Yarra St" "68 Vida St" ...
## $ Rooms : Factor w/ 14 levels "1","2","3","4",..: 3 3 3 3 2 2 2 3 6 3 ...
## $ Type : Factor w/ 3 levels "House","Townhouse",..: 1 1 1 1 1 2 3 1 1 1 ...
## $ Price : num 1490000 1220000 1420000 1515000 670000 ...
## $ Method_of_Sales: Factor w/ 10 levels "property sold",..: 1 1 1 1 1 1 1 2 3 1 ...
## $ Agent : chr "Jellis" "Marshall" "Nelson" "Barry" ...
## $ Date_of_Sold : Date, format: "2017-04-01" "2017-04-01" ...
## $ Postcode : int 3067 3067 3067 3040 3042 3042 3042 3042 3021 3206 ...
## $ Region : Factor w/ 8 levels "Eastern Metropolitan",..: 3 3 3 7 7 7 7 7 7 6 ...
## $ Property_Count : int 4019 4019 4019 1543 3464 3464 3464 3464 1899 3280 ...
## $ Distance : num 3 3 3 7.5 10.4 10.4 10.4 10.4 14 3 ...
## $ Council_Area : chr "Yarra City Council" "Yarra City Council" "Yarra City Council" "Moonee Valley City Council" ...
summary(house)
## Suburb Address Rooms Type
## Length:63023 Length:63023 3 :27950 House :45053
## Class :character Class :character 4 :15747 Townhouse: 6315
## Mode :character Mode :character 2 :13248 Unit :11655
## 5 : 3434
## 1 : 2111
## 6 : 424
## (Other): 109
## Price Method_of_Sales Agent
## Min. : 0 property sold :34063 Length:63023
## 1st Qu.: 363000 property passed in : 9790 Class :character
## Median : 697500 property sold prior : 8916 Mode :character
## Mean : 766882 vendor bid : 5956
## 3rd Qu.: 1065000 sold not disclosed : 2674
## Max. :11200000 sold prior not disclosed: 651
## (Other) : 973
## Date_of_Sold Postcode Region
## Min. :2016-01-28 Min. :3000 Southern Metropolitan :17559
## 1st Qu.:2016-12-03 1st Qu.:3056 Northern Metropolitan :16781
## Median :2017-09-03 Median :3107 Western Metropolitan :11717
## Mean :2017-08-04 Mean :3126 Eastern Metropolitan :10396
## 3rd Qu.:2018-03-10 3rd Qu.:3163 South-Eastern Metropolitan: 5212
## Max. :2018-10-13 Max. :3980 Eastern Victoria : 564
## (Other) : 794
## Property_Count Distance Council_Area
## Min. : 39 Min. : 0.00 Length:63023
## 1st Qu.: 4380 1st Qu.: 7.00 Class :character
## Median : 6795 Median :11.40 Mode :character
## Mean : 7618 Mean :12.68
## 3rd Qu.:10412 3rd Qu.:16.70
## Max. :21650 Max. :64.10
##
Summary:
The first real estate sold occur at 2016-01-28
Type of real estate contain house, unit, and townhouse. House is most for selling among the others
The maximum price about 11200000 AUD
The method of sales commonly used by the agent property is selling the real estate directly to consumer (S)
The most property count is 21650
The most closest distance to Central Business Development is about 0 km
table(house$Type, house$Rooms)
##
## 1 2 3 4 5 6 7 8 9 10 11
## House 142 4503 22017 14479 3389 419 60 26 4 7 1
## Townhouse 36 1529 3554 1152 41 0 1 1 1 0 0
## Unit 1933 7216 2379 116 4 5 0 2 0 0 0
##
## 12 16 31
## House 4 1 1
## Townhouse 0 0 0
## Unit 0 0 0
Answer: house with 3 numbers of rooms are the most type of house for selling
house_high <- aggregate(Price~Type+Rooms, house, FUN = max)
house_high[order(house_high$Price, decreasing = T),][1,]
## Type Rooms Price
## 10 House 4 11200000
Answer : House, Cottage, Villa, Semi, and Terrace with 4 numbers of rooms are type of real estate which is has highest price
sub_high <- aggregate(Property_Count~Suburb+Region, house, sum)
sub_high[order(sub_high$Property_Count, decreasing = T),][1,]
## Suburb Region Property_Count
## 149 Reservoir Northern Metropolitan 26867650
Answer: Northern Metropolitan has the most property count
total_price <- aggregate(Price~Suburb+Type, house, sum)
total_price[order(total_price$Price, decreasing = T),][1,]
## Suburb Type Price
## 24 Balwyn North House 615319800
Answer: The highest of total price in Balwyn North with type of real estate are House, Cottage, Villa, Semi, and Terrace
house$year = as.numeric(format(house$Date_of_Sold, "%Y"))
unit_18 <- aggregate(Price~Type+year, house, mean)
unit_18[(unit_18$Type == "u" & unit_18$year == 2018),]
## [1] Type year Price
## <0 rows> (or 0-length row.names)
Answer : Rate price of unit in 2018 is 495973.3 AUD
council <- aggregate(Distance~Council_Area, house, min)
council[order(council$Distance, decreasing = F),][1,]
## Council_Area Distance
## 19 Melbourne City Council 0
Answer : Melbourne City Council is about closest to CDB
pie(xtabs(~Type, house))
Answer : Based on pie chart, we could know that type real estate with house is highest percentage among the others. It means type of real estate selling is dominated by house
year_total <- xtabs(Price~year, house)
options(scipen = 999)
barplot(year_total)
title(main = "Total Price Household Market per Year",
ylab = "Total Price")
Answer : The total higher price is on 2017
townhouse <- house[house$Type == "Townhouse",]
townhouse_2 <- aggregate(Price~Type+Suburb, townhouse, mean)
# Display Top 5 Rate price of Townhouse
townhouse_2[order(townhouse_2$Price, decreasing = T), ][1:5,]
## Type Suburb Price
## 152 Townhouse Kooyong 1565500
## 37 Townhouse Brighton 1288916
## 241 Townhouse Toorak 1283929
## 52 Townhouse Canterbury 1230250
## 4 Townhouse Albert Park 1226000
Answer: the highest rate price for townhouse at Kooyoung. The rate price is 1565500 AUD
barplot(prop.table(table(house$Method_of_Sales)),
ylim = c(0,1))
title(xlab = "Method of Sales",
main = "Proportion Method of Sales Household Market",
ylab = "Proportions")
Answer : Based on pie chart above, we could know that the method for selling is commonly used by selling the property directly
barplot(xtabs(~Type + Method_of_Sales, house),
ylim = c(0,50000), ylab = "Total", xlab = "Method of Sales")
table(droplevels(house$Method_of_Sales), house$Type)
##
## House Townhouse Unit
## property sold 24865 3212 5986
## property sold prior 6036 840 2040
## property passed in 6882 1159 1749
## sold prior not disclosed 446 85 120
## sold not disclosed 2149 192 333
## vendor bid 3976 738 1242
## withdrawn prior to auction 335 43 106
## sold after auction 308 37 71
## sold after auction price not disclosed 56 9 8
Answer: Based from our analysis, we could know that selling the house by directly to consumer is most used by the agent property in Melbourne
mean_house <- house[(house$Type == "House"),]
mean_high <- aggregate(Price~Agent+Type, mean_house, mean)
mean_high[order(mean_high$Price, decreasing = T),][1,]
## Agent Type Price
## 120 For House 3780000
Answer : The agent property who has highest rate price for house is Agent For
From our report we can conclude that house with 3 numbers of rooms are the most selling. Balwyn North has highest total of price for house selling. Suburb region has the most property count is Northern Metropolitan. Melbourne City Council is about the most closest to CDB. It’s about 0 km. The highest total price occur at 2017. The method of sales is commonly used by agent real estate is selling directly to consumer. Usually, this method is used for selling house. The property agent who has highest rate price for house is Agent For