Price of a property is one of the most important decision criterion when people buy homes. Real state firms need to be consistent in their pricing in order to attract buyers . Having a predictive model for the same will be great tool to have , which in turn can also be used to tweak development of properties , putting more emphasis on qualities which increase the value of the property.
To Build a machine learning predictive model and predict the accurate prices of the proterties.
The evalution metric will be RMSE.
There exist two datasets, housing_train.csv and housing_test.csv . We will use data housing_train to build predictive model for response variable “Price”. Housing_test data contains all other factors except “Price” which we can use for testing purpose.
Suburb : categorical :: Which subsurb the property is located in
Address : categorical :: short address
Rooms : numeric :: Number of Rooms
Type : categorical :: type of the property
Price : numeric :: This is the target variable, price of the property
Method : categorical :: method for selling
SellerG : categorical :: Name of the seller
Distance : numeric :: distance from the city center
Postcode : categorical :: postcode of the property
Bedroom2 : Numeric :: numbers of secondary bedrooms (this is different from rooms)
Bathroom : numeric :: number of bathrooms
Car : numeric :: number of parking spaces
Landsize : numeric :: landsize
BuildingArea : numeric :: buildup area
YearBuilt : numeric :: year of building
CouncilArea : numeric :: council area to which the propery belongs
We will build a linear regression model to predict the response variable “Price”
1.Imputing NA values in the datasets.
2.Data Preparation.
3.Model Building.
4.Perfomance measurement of the model.
5:Predicting Real Estate Prices for the final Test Dataset.
loading library dplyr
library(dplyr)
setwd("C:\\Users\\INS15R\\Documents\\R latest\\R EDVANCER\\Industry Based Projects\\Industry-Based-Projects-Edvancer-Eduventures")
getwd()
## [1] "C:/Users/INS15R/Documents/R latest/R EDVANCER/Industry Based Projects/Industry-Based-Projects-Edvancer-Eduventures"
Reading train and test datasets:
train=read.csv("housing_train.csv",stringsAsFactors = FALSE,header = T )
#7536 obs,16 variables
test=read.csv("housing_test.csv",stringsAsFactors = FALSE,header = T )
#1885 obs,15 variables
Lets first impute the NA values of train dataset.
Replacing the NA values(1559 obs) of Bedroom2 variable with its median:3
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 1559
## Bathroom Car Landsize BuildingArea YearBuilt
## 1559 1559 1564 4209 3717
## CouncilArea
## 0
train$Bedroom2[is.na(train$Bedroom2)]=median(train$Bedroom2,na.rm=T)
Similarly ,all other NA values are imputed as follows:
For variable Bathroom:
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 1559 1559 1564 4209 3717
## CouncilArea
## 0
train$Bathroom[is.na(train$Bathroom)]=round(mean(train$Bathroom,na.rm=T),0)
For variable Car:
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 0 1559 1564 4209 3717
## CouncilArea
## 0
train$Car[is.na(train$Car)]=round(mean(train$Car,na.rm=T),0)
For variable Lansize:
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 0 0 1564 4209 3717
## CouncilArea
## 0
train$Landsize[is.na(train$Landsize)]=round(mean(train$Landsize,na.rm=T),0)
For Variable BuildingArea:
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 0 0 0 4209 3717
## CouncilArea
## 0
train$BuildingArea[is.na(train$BuildingArea)]=round(mean(train$BuildingArea,na.rm=T),0)
For variable YearBuilt:
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 0 0 0 0 3717
## CouncilArea
## 0
train$YearBuilt[is.na(train$YearBuilt)]=round(mean(train$YearBuilt,na.rm=T),0)
Thus all Na values of dataset train is succesfully imputed.
apply(train,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 0
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 0 0 0 0 0
## CouncilArea
## 0
Now,lets impute the na values of test dataset.
apply(test,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Method
## 0 0 0 0 0
## SellerG Distance Postcode Bedroom2 Bathroom
## 0 0 0 419 419
## Car Landsize BuildingArea YearBuilt CouncilArea
## 419 421 1060 943 0
test$Bedroom2[is.na(test$Bedroom2)]=median(test$Bedroom2,na.rm=T)
apply(test,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Method
## 0 0 0 0 0
## SellerG Distance Postcode Bedroom2 Bathroom
## 0 0 0 0 419
## Car Landsize BuildingArea YearBuilt CouncilArea
## 419 421 1060 943 0
test$Bathroom[is.na(test$Bathroom)]=round(mean(test$Bathroom,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Method
## 0 0 0 0 0
## SellerG Distance Postcode Bedroom2 Bathroom
## 0 0 0 0 0
## Car Landsize BuildingArea YearBuilt CouncilArea
## 419 421 1060 943 0
test$Car[is.na(test$Car)]=round(mean(test$Car,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Method
## 0 0 0 0 0
## SellerG Distance Postcode Bedroom2 Bathroom
## 0 0 0 0 0
## Car Landsize BuildingArea YearBuilt CouncilArea
## 0 421 1060 943 0
test$Landsize[is.na(test$Landsize)]=round(mean(test$Landsize,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Method
## 0 0 0 0 0
## SellerG Distance Postcode Bedroom2 Bathroom
## 0 0 0 0 0
## Car Landsize BuildingArea YearBuilt CouncilArea
## 0 0 1060 943 0
test$BuildingArea[is.na(test$BuildingArea)]=round(mean(test$BuildingArea,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Method
## 0 0 0 0 0
## SellerG Distance Postcode Bedroom2 Bathroom
## 0 0 0 0 0
## Car Landsize BuildingArea YearBuilt CouncilArea
## 0 0 0 943 0
test$YearBuilt[is.na(test$YearBuilt)]=round(median(test$YearBuilt,na.rm=T),0)
Thus,Na values of test datasets is imputed with its mean/median successfully.
test$Price=NA
train$data='train'
test$data='test'
all_data=rbind(train,test)
apply(all_data,2,function(x)sum(is.na(x)))
## Suburb Address Rooms Type Price
## 0 0 0 0 1885
## Method SellerG Distance Postcode Bedroom2
## 0 0 0 0 0
## Bathroom Car Landsize BuildingArea YearBuilt
## 0 0 0 0 0
## CouncilArea data
## 0 0
Lets see the structure and datatypes of the combined dataset.
glimpse(all_data)
## Observations: 9,421
## Variables: 17
## $ Suburb <chr> "Brunswick", "Reservoir", "Newport", "Brighton Ea...
## $ Address <chr> "52 Evans St", "85 Radford Rd", "99 Anderson St",...
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Type <chr> "h", "h", "h", "u", "h", "h", "h", "h", "h", "u",...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data <chr> "train", "train", "train", "train", "train", "tra...
t=table(all_data$Suburb)
View(t)
t1=round(tapply(all_data$Price,all_data$Suburb,mean,na.rm=T),0)
View(t1)
t1=sort(t1)
all_data=all_data %>%
mutate(
sub_1=as.numeric(Suburb%in%c("Campbellfield","Jacana")),
sub_2=as.numeric(Suburb%in%c("Kealba","Brooklyn","Albion","Sunshine West","Ripponlea","Fawkner")),
sub_3=as.numeric(Suburb%in%c("Glenroy","Southbank","Sunshine North","Keilor Park","Heidelberg West","Reservoir","Braybrook","Kingsbury","Gowanbrae","Hadfield","Watsonia","Footscray","South Kingsville","Balaclava","Melbourne","Maidstone","Sunshine")),
sub_4=as.numeric(Suburb%in%c("Airport West","Heidelberg Heights","Pascoe Vale","West Footscray","Altona North","Williamstown North","Brunswick West","Keilor East","Oak Park","Maribyrnong","Altona","Flemington","Coburg North","Yallambie","Avondale Heights","Bellfield")),
sub_5=as.numeric(Suburb%in%c("Strathmore Heights","Glen Huntly","Kensington","Essendon North","St Kilda","Preston","North Melbourne","Coburg","Kingsville","Collingwood","Brunswick East","Gardenvale","Thornbury","Niddrie","West Melbourne","Viewbank")),
sub_6=as.numeric(Suburb%in%c("Spotswood","Carnegie","Elwood","Heidelberg","Moorabbin","Oakleigh","Rosanna","Docklands","Yarraville","Cremorne","Seddon","Brunswick","Oakleigh South","Ascot Vale","Windsor","Caulfield","Essendon West","Newport")),
sub_7=as.numeric(Suburb%in%c("Chadstone","South Yarra","Essendon","Bentleigh East","Murrumbeena","Hughesdale","Fairfield","Ashwood","Clifton Hill","Caulfield North","Abbotsford","Carlton","Prahran","Fitzroy","Ivanhoe","Hampton East","Caulfield East")),
sub_8=as.numeric(Suburb%in%c("Richmond","Travancore","Templestowe Lower","Ormond","Caulfield South","Moonee Ponds","Hawthorn","Box Hill","Bulleen","Burnley","Burwood","Strathmore","Port Melbourne","Fitzroy North","Alphington")),
sub_9=as.numeric(Suburb%in%c("Doncaster","South Melbourne","Northcote","Aberfeldie","Elsternwick","Bentleigh","Kooyong","Parkville")),
sub_10=as.numeric(Suburb%in%c("Williamstown","East Melbourne","Seaholme")),
sub_11=as.numeric(Suburb%in%c("Malvern East","Carlton North","Hawthorn East","Surrey Hills")),
sub_12=as.numeric(Suburb%in%c("Princes Hill","Mont Albert","Armadale","Kew East","Glen Iris","Ashburton")),
sub_13=as.numeric(Suburb%in%c("Brighton East","Eaglemont","Hampton")),
sub_14=as.numeric(Suburb%in%c("Toorak","Ivanhoe East","Camberwell","Balwyn North","Kew")),
sub_15=as.numeric(Suburb%in%c("Brighton","Middle Park")),
sub_16=as.numeric(Suburb%in%c("Albert Park","Balwyn","Malvern"))
) %>%
select(-Suburb)
glimpse(all_data)
## Observations: 9,421
## Variables: 32
## $ Address <chr> "52 Evans St", "85 Radford Rd", "99 Anderson St",...
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Type <chr> "h", "h", "h", "u", "h", "h", "h", "h", "h", "u",...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
all_data=all_data %>%
select(-Address)
glimpse(all_data)
## Observations: 9,421
## Variables: 31
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Type <chr> "h", "h", "h", "u", "h", "h", "h", "h", "h", "u",...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
table(all_data$Type)
##
## h t u
## 5916 1048 2457
all_data=all_data %>%
mutate(Type_t=as.numeric(Type=="t"),
type_u=as.numeric(Type=="u"))
all_data=all_data %>%
select(-Type)
glimpse(all_data) #9421obs and 16 variables
## Observations: 9,421
## Variables: 32
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Type_t <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ type_u <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1...
table(all_data$Method)
##
## PI S SA SP VB
## 1235 6103 35 1162 886
all_data=all_data %>%
mutate(Method_PI=as.numeric(Method=="PI"),
Method_SA=as.numeric(Method=="SA"),
Method_SP=as.numeric(Method=="SP"),
Method_VB=as.numeric(Method=="VB")) %>%
select(-Method)
glimpse(all_data)
## Observations: 9,421
## Variables: 35
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ SellerG <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Type_t <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ type_u <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1...
## $ Method_PI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0...
## $ Method_SA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Method_SP <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
## $ Method_VB <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
t=table(all_data$SellerG)
sort(t)
##
## AIME Airport Allan
## 1 1 1
## Appleby Batty Blue
## 1 1 1
## Bustin Buxton/Find CASTRAN
## 1 1 1
## Century Clairmont Coventry
## 1 1 1
## Del Direct Elite
## 1 1 1
## Fletchers/Fletchers Fletchers/One Geoff
## 1 1 1
## Ham hockingstuart/Advantage hockingstuart/Barry
## 1 1 1
## hockingstuart/Buxton hockingstuart/Village Homes
## 1 1 1
## Hooper Iconek iOne
## 1 1 1
## iTRAK Joe Johnston
## 1 1 1
## Joseph Karen Lucas
## 1 1 1
## Luxe Luxton Mandy
## 1 1 1
## Mason Meadows Naison
## 1 1 1
## Nardella North Oak
## 1 1 1
## One Parkinson Private/Tiernan's
## 1 1 1
## Professionals Property Propertyau
## 1 1 1
## Prowse R&H Reach
## 1 1 1
## S&L Steveway Tiernan's
## 1 1 1
## Vic Weast Win
## 1 1 1
## Zahn Allens Australian
## 1 2 2
## Besser Buxton/Advantage Calder
## 2 2 2
## Changing Charlton Crane
## 2 2 2
## David Dixon Galldon
## 2 2 2
## Grantham JMRE Ken
## 2 2 2
## LJ Nguyen RE
## 2 2 2
## Red Redina Ross
## 2 2 2
## Scott Sweeney/Advantage VICPROP
## 2 2 2
## Walsh Wood Ascend
## 2 2 3
## ASL Assisi Bayside
## 3 3 3
## Compton Garvey Hamilton
## 3 3 3
## Jason Kelly Leased
## 3 3 3
## Maddison New Owen
## 3 3 3
## Thomas Weda Anderson
## 3 3 4
## First Morrison Nicholson
## 4 4 4
## O'Brien Prof. Raine&Horne
## 4 4 4
## D'Aprano Domain Holland
## 5 5 5
## Matthew Parkes Bekdon
## 5 5 6
## FN Re Sotheby's
## 6 6 6
## HAR Morleys Pagan
## 7 7 7
## W.B. William Christopher
## 7 7 9
## O'Donoghues Chambers J
## 9 10 10
## Gunn&Co Hunter Pride
## 11 11 11
## Trimson Brace Castran
## 11 12 13
## Darren Melbourne Rodney
## 14 14 14
## Tim Whiting Caine
## 14 14 15
## Haughton Lindellas MICM
## 15 15 15
## GL Beller Harrington
## 16 17 17
## Paul Purplebricks Abercromby's
## 17 17 18
## Barlow Wilson Philip
## 18 18 19
## Buckingham Walshe Edward
## 20 20 22
## McDonald Alexkarbon RW
## 24 25 25
## Bells C21 Considine
## 26 26 26
## Eview Frank Thomson
## 27 27 27
## Burnham Peter Dingle
## 28 28 29
## YPA Moonee LITTLE
## 31 33 34
## Nick Harcourts Cayzer
## 34 41 43
## Collins Chisholm Rendina
## 44 53 58
## Raine Love Douglas
## 62 69 73
## Williams Village Stockdale
## 88 93 101
## Kay Hodges McGrath
## 103 104 117
## Noel Gary Jas
## 119 147 163
## Miles Greg Sweeney
## 167 173 175
## RT Fletchers Woodards
## 183 191 213
## Brad Biggin Ray
## 274 292 361
## Buxton Marshall Barry
## 481 539 660
## hockingstuart Jellis Nelson
## 874 995 1194
all_data=all_data %>%
mutate(Gnelson=as.numeric(SellerG=="Nelson"),
GJellis=as.numeric(SellerG=="Jellis"),
Ghstuart=as.numeric(SellerG=="hockingstuart"),
Gbarry=as.numeric(SellerG=="Barry"),
GMarshall=as.numeric(SellerG=="Marshall"),
GWoodards=as.numeric(SellerG=="Woodards"),
GBrad=as.numeric(SellerG=="Brad"),
GBiggin=as.numeric(SellerG=="Biggin"),
GRay=as.numeric(SellerG=="Ray"),
GFletchers=as.numeric(SellerG=="Fletchers"),
GRT=as.numeric(SellerG=="RT"),
GSweeney=as.numeric(SellerG=="Sweeney"),
GGreg=as.numeric(SellerG=="Greg"),
GNoel=as.numeric(SellerG=="Noel"),
GGary=as.numeric(SellerG=="Gary"),
GJas=as.numeric(SellerG=="Jas"),
GMiles=as.numeric(SellerG=="Miles"),
GMcGrath=as.numeric(SellerG=="McGrath"),
GHodges=as.numeric(SellerG=="Hodges"),
GKay=as.numeric(SellerG=="Kay"),
GStockdale=as.numeric(SellerG=="Stockdale"),
GLove=as.numeric(SellerG=="Love"),
GDouglas=as.numeric(SellerG=="Douglas"),
GWilliams=as.numeric(SellerG=="Williams"),
GVillage=as.numeric(SellerG=="Village"),
GRaine=as.numeric(SellerG=="Raine"),
GRendina=as.numeric(SellerG=="Rendina"),
GChisholm=as.numeric(SellerG=="Chisholm"),
GCollins=as.numeric(SellerG=="Collins"),
GLITTLE=as.numeric(SellerG=="LITTLE"),
GNick=as.numeric(SellerG=="Nick"),
GHarcourts=as.numeric(SellerG=="Harcourts"),
GCayzer=as.numeric(SellerG=="Cayzer"),
GMoonee=as.numeric(SellerG=="Moonee"),
GYPA=as.numeric(SellerG=="YPA")
) %>%
select(-SellerG)
glimpse(all_data)
## Observations: 9,421
## Variables: 69
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Type_t <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ type_u <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1...
## $ Method_PI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0...
## $ Method_SA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Method_SP <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
## $ Method_VB <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Gnelson <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0...
## $ GJellis <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ Ghstuart <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Gbarry <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ GMarshall <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
## $ GWoodards <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GBrad <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GBiggin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1...
## $ GRay <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GFletchers <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GRT <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GSweeney <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GGreg <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GNoel <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GGary <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GJas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GMiles <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GMcGrath <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GHodges <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GKay <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GStockdale <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GLove <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GDouglas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GWilliams <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GVillage <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GRaine <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GRendina <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GChisholm <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GCollins <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GLITTLE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GNick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GHarcourts <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GCayzer <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GMoonee <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GYPA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
table(all_data$CouncilArea)
##
## Banyule Bayside Boroondara Brimbank
## 1985 359 311 803 238
## Darebin Glen Eira Hobsons Bay Hume Kingston
## 648 608 286 14 65
## Manningham Maribyrnong Melbourne Monash Moonee Valley
## 181 478 324 132 678
## Moreland Port Phillip Stonnington Whitehorse Yarra
## 765 441 511 139 455
all_data=all_data %>%
mutate(CA_Banyule=as.numeric(CouncilArea=="Banyule"),
CA_Bayside=as.numeric(CouncilArea=="Bayside"),
CA_Boroondara=as.numeric(CouncilArea=="Boroondara"),
CA_Brimbank=as.numeric(CouncilArea=="Brimbank"),
CA_Darebin=as.numeric(CouncilArea=="Darebin"),
CA_Glen_Eira=as.numeric(CouncilArea=="Glen Eira"),
CA_Monash=as.numeric(CouncilArea=="Monash"),
CA_Melbourne=as.numeric(CouncilArea=="Melbourne"),
CA_Maribyrnong=as.numeric(CouncilArea=="Maribyrnong"),
CA_Manningham=as.numeric(CouncilArea=="Manningham"),
CA_Kingston=as.numeric(CouncilArea=="Kingston"),
CA_Hume=as.numeric(CouncilArea=="Hume"),
CA_HobsonsB=as.numeric(CouncilArea=="Hobsons Bay"),
CA_MoonValley=as.numeric(CouncilArea=="Moonee Valley"),
CA_Moreland=as.numeric(CouncilArea=="Moreland"),
CA_PortP=as.numeric(CouncilArea=="Port Phillip"),
CA_Stonnington=as.numeric(CouncilArea=="Stonnington"),
CA_Whitehorse=as.numeric(CouncilArea=="Whitehorse"),
CA_Yarra=as.numeric(CouncilArea=="Yarra")) %>%
select(-CouncilArea)
glimpse(all_data)
## Observations: 9,421
## Variables: 87
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3,...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 30200...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12....
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127,...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3,...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1,...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2,...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0,...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961,...
## $ data <chr> "train", "train", "train", "train", "train", "t...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Type_t <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ type_u <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ Method_PI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,...
## $ Method_SA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SP <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_VB <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gnelson <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ GJellis <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ Ghstuart <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gbarry <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ GMarshall <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ GWoodards <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBrad <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBiggin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ GRay <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GFletchers <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRT <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GSweeney <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGreg <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNoel <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGary <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GJas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMiles <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMcGrath <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHodges <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GKay <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GStockdale <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLove <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GDouglas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWilliams <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GVillage <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRaine <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRendina <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GChisholm <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCollins <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLITTLE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHarcourts <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCayzer <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMoonee <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GYPA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ CA_Banyule <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Bayside <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Boroondara <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Brimbank <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Darebin <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Glen_Eira <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ CA_Monash <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Melbourne <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Maribyrnong <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Manningham <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Kingston <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Hume <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_HobsonsB <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_MoonValley <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Moreland <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,...
## $ CA_PortP <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Stonnington <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ CA_Whitehorse <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ CA_Yarra <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
Separating test and train:
train=all_data %>%
filter(data=='train') %>%
select(-data)
#thus train has total obs as 7536 and 70 variables (69+price)
test=all_data %>%
filter(data=='test') %>%
select(-data,-Price)#thus test data has original obs 1885 and added new dummy variables totalling to 69 variables
Lets view the structure of test n train datasets:
glimpse(train) #7536 obs and 86 variables.
## Observations: 7,536
## Variables: 86
## $ Rooms <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3,...
## $ Price <int> 1650000, 791000, 785000, 755000, 2500000, 30200...
## $ Distance <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12....
## $ Postcode <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127,...
## $ Bedroom2 <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3,...
## $ Bathroom <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1,...
## $ Car <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2,...
## $ Landsize <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0,...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80...
## $ YearBuilt <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961,...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_3 <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ sub_6 <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,...
## $ sub_7 <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_11 <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,...
## $ sub_13 <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Type_t <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ type_u <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ Method_PI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,...
## $ Method_SA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SP <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_VB <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gnelson <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ GJellis <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ Ghstuart <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gbarry <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ GMarshall <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ GWoodards <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBrad <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBiggin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ GRay <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GFletchers <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRT <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GSweeney <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGreg <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNoel <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGary <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GJas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMiles <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMcGrath <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHodges <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GKay <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GStockdale <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLove <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GDouglas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWilliams <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GVillage <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRaine <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRendina <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GChisholm <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCollins <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLITTLE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHarcourts <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCayzer <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMoonee <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GYPA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ CA_Banyule <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Bayside <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Boroondara <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Brimbank <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Darebin <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Glen_Eira <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ CA_Monash <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Melbourne <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Maribyrnong <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Manningham <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Kingston <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Hume <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_HobsonsB <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_MoonValley <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Moreland <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,...
## $ CA_PortP <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Stonnington <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ CA_Whitehorse <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ CA_Yarra <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
glimpse(test) #1885 obs and 85 variables.
## Observations: 1,885
## Variables: 85
## $ Rooms <int> 1, 2, 1, 4, 3, 3, 3, 1, 1, 2, 3, 1, 3, 2, 3, 3,...
## $ Distance <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2....
## $ Postcode <int> 3067, 3067, 3067, 3067, 3067, 3067, 3067, 3067,...
## $ Bedroom2 <dbl> 1, 3, 3, 3, 2, 3, 3, 3, 1, 2, 3, 1, 3, 3, 3, 2,...
## $ Bathroom <dbl> 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1,...
## $ Car <dbl> 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1,...
## $ Landsize <dbl> 0, 461, 461, 461, 138, 0, 4290, 461, 0, 98, 461...
## $ BuildingArea <dbl> 151, 151, 151, 151, 105, 151, 27, 151, 151, 128...
## $ YearBuilt <dbl> 1965, 1965, 1965, 1965, 1890, 2010, 1965, 1965,...
## $ sub_1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_2 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_3 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_4 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ sub_5 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_6 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_7 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ sub_8 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_9 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_10 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_11 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_12 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_13 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_14 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_15 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_16 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Type_t <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,...
## $ type_u <dbl> 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0,...
## $ Method_PI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SP <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0,...
## $ Method_VB <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gnelson <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,...
## $ GJellis <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,...
## $ Ghstuart <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gbarry <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMarshall <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWoodards <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBrad <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ GBiggin <dbl> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,...
## $ GRay <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GFletchers <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GSweeney <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGreg <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ GNoel <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGary <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GJas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMiles <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMcGrath <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHodges <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GKay <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GStockdale <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLove <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GDouglas <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWilliams <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GVillage <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRaine <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRendina <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GChisholm <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCollins <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLITTLE <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNick <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHarcourts <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCayzer <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMoonee <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GYPA <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Banyule <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Bayside <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Boroondara <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Brimbank <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Darebin <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Glen_Eira <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Monash <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Melbourne <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Maribyrnong <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Manningham <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Kingston <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Hume <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_HobsonsB <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_MoonValley <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ CA_Moreland <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_PortP <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Stonnington <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Whitehorse <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Yarra <dbl> 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0,...
set.seed(123)
s=sample(1:nrow(train),0.75*nrow(train))
train_75=train[s,] #5652
test_25=train[-s,] #1884
library(car)
## Warning: package 'car' was built under R version 3.3.3
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
LRf=lm(Price ~ .,data=train_75)
summary(LRf)
##
## Call:
## lm(formula = Price ~ ., data = train_75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1662340 -193462 -21807 150940 3847200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.372e+05 7.479e+05 0.852 0.394246
## Rooms 2.256e+05 9.758e+03 23.116 < 2e-16 ***
## Distance -3.843e+04 2.660e+03 -14.445 < 2e-16 ***
## Postcode 1.346e+03 2.005e+02 6.714 2.08e-11 ***
## Bedroom2 -4.562e+04 1.113e+04 -4.099 4.21e-05 ***
## Bathroom 1.585e+05 1.065e+04 14.878 < 2e-16 ***
## Car 6.976e+04 7.054e+03 9.890 < 2e-16 ***
## Landsize 7.163e+00 3.886e+00 1.843 0.065310 .
## BuildingArea 4.549e+02 6.187e+01 7.352 2.24e-13 ***
## YearBuilt -1.719e+03 2.087e+02 -8.235 < 2e-16 ***
## sub_1 -1.088e+06 1.607e+05 -6.769 1.43e-11 ***
## sub_2 -1.026e+06 9.489e+04 -10.809 < 2e-16 ***
## sub_3 -9.449e+05 8.868e+04 -10.656 < 2e-16 ***
## sub_4 -9.259e+05 8.855e+04 -10.456 < 2e-16 ***
## sub_5 -8.619e+05 8.791e+04 -9.804 < 2e-16 ***
## sub_6 -8.078e+05 8.786e+04 -9.194 < 2e-16 ***
## sub_7 -7.538e+05 8.713e+04 -8.651 < 2e-16 ***
## sub_8 -7.713e+05 8.672e+04 -8.893 < 2e-16 ***
## sub_9 -6.876e+05 8.906e+04 -7.720 1.37e-14 ***
## sub_10 -5.407e+05 1.026e+05 -5.268 1.43e-07 ***
## sub_11 -4.600e+05 8.803e+04 -5.226 1.80e-07 ***
## sub_12 -4.807e+05 8.764e+04 -5.485 4.33e-08 ***
## sub_13 -3.529e+05 9.646e+04 -3.658 0.000256 ***
## sub_14 -3.220e+05 8.600e+04 -3.744 0.000183 ***
## sub_15 -4.508e+04 9.946e+04 -0.453 0.650437
## sub_16 -1.503e+05 9.067e+04 -1.658 0.097411 .
## Type_t -2.415e+05 1.770e+04 -13.645 < 2e-16 ***
## type_u -4.015e+05 1.515e+04 -26.509 < 2e-16 ***
## Method_PI -1.191e+05 1.537e+04 -7.746 1.12e-14 ***
## Method_SA -5.995e+04 8.179e+04 -0.733 0.463596
## Method_SP -4.376e+04 1.577e+04 -2.775 0.005544 **
## Method_VB -8.951e+04 1.758e+04 -5.093 3.65e-07 ***
## Gnelson -2.796e+04 2.077e+04 -1.346 0.178350
## GJellis 8.341e+04 2.156e+04 3.869 0.000111 ***
## Ghstuart -4.208e+04 2.057e+04 -2.045 0.040873 *
## Gbarry -2.244e+04 2.433e+04 -0.922 0.356379
## GMarshall 2.343e+05 2.623e+04 8.931 < 2e-16 ***
## GWoodards -1.405e+03 3.648e+04 -0.039 0.969285
## GBrad -1.072e+04 3.482e+04 -0.308 0.758266
## GBiggin -3.286e+04 3.243e+04 -1.013 0.310919
## GRay -3.854e+04 2.893e+04 -1.332 0.182864
## GFletchers -7.255e+04 3.877e+04 -1.871 0.061394 .
## GRT 6.609e+04 3.748e+04 1.763 0.077878 .
## GSweeney -3.095e+04 4.384e+04 -0.706 0.480230
## GGreg 1.225e+04 3.979e+04 0.308 0.758218
## GNoel -6.138e+04 4.607e+04 -1.332 0.182887
## GGary 2.250e+04 4.305e+04 0.523 0.601335
## GJas -1.155e+04 4.300e+04 -0.269 0.788210
## GMiles 5.773e+04 4.600e+04 1.255 0.209538
## GMcGrath 8.733e+04 4.537e+04 1.925 0.054290 .
## GHodges -6.915e+04 5.054e+04 -1.368 0.171308
## GKay 4.588e+05 4.982e+04 9.208 < 2e-16 ***
## GStockdale -2.027e+04 5.267e+04 -0.385 0.700401
## GLove -8.090e+03 5.853e+04 -0.138 0.890067
## GDouglas -8.696e+04 6.364e+04 -1.366 0.171866
## GWilliams 5.151e+04 5.514e+04 0.934 0.350244
## GVillage 3.106e+04 5.869e+04 0.529 0.596665
## GRaine -1.486e+05 6.267e+04 -2.371 0.017797 *
## GRendina -4.041e+02 6.693e+04 -0.006 0.995183
## GChisholm 3.283e+04 6.953e+04 0.472 0.636858
## GCollins 2.181e+05 7.278e+04 2.996 0.002743 **
## GLITTLE -1.246e+05 7.579e+04 -1.644 0.100222
## GNick 3.151e+04 9.411e+04 0.335 0.737771
## GHarcourts -1.778e+04 7.775e+04 -0.229 0.819166
## GCayzer -5.045e+04 7.335e+04 -0.688 0.491568
## GMoonee -1.097e+05 8.480e+04 -1.294 0.195805
## GYPA -1.072e+05 9.894e+04 -1.083 0.278712
## CA_Banyule -1.345e+05 3.341e+04 -4.026 5.75e-05 ***
## CA_Bayside -1.619e+05 4.886e+04 -3.314 0.000926 ***
## CA_Boroondara -9.435e+04 2.701e+04 -3.494 0.000480 ***
## CA_Brimbank -5.146e+04 4.044e+04 -1.272 0.203279
## CA_Darebin -8.003e+04 2.555e+04 -3.132 0.001742 **
## CA_Glen_Eira -2.365e+04 2.909e+04 -0.813 0.416168
## CA_Monash -4.768e+04 4.569e+04 -1.044 0.296687
## CA_Melbourne -9.328e+03 3.490e+04 -0.267 0.789258
## CA_Maribyrnong -9.527e+04 3.341e+04 -2.852 0.004367 **
## CA_Manningham -9.730e+04 4.275e+04 -2.276 0.022875 *
## CA_Kingston -1.910e+05 6.450e+04 -2.961 0.003084 **
## CA_Hume -9.056e+04 1.809e+05 -0.501 0.616592
## CA_HobsonsB -6.407e+04 3.956e+04 -1.619 0.105404
## CA_MoonValley -5.413e+04 2.719e+04 -1.991 0.046514 *
## CA_Moreland -1.158e+05 2.443e+04 -4.739 2.20e-06 ***
## CA_PortP -9.684e+04 3.581e+04 -2.705 0.006860 **
## CA_Stonnington -6.407e+04 2.982e+04 -2.149 0.031700 *
## CA_Whitehorse -2.575e+04 4.507e+04 -0.571 0.567739
## CA_Yarra -1.164e+05 3.162e+04 -3.680 0.000236 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 370000 on 5566 degrees of freedom
## Multiple R-squared: 0.6914, Adjusted R-squared: 0.6866
## F-statistic: 146.7 on 85 and 5566 DF, p-value: < 2.2e-16
In order to take care of multi collinearity,we remove variables whose VIF>5,as follows:
a=vif(LRf)
sort(a,decreasing = T)[1:3]
## sub_3 sub_7 sub_8
## 36.00506 35.58604 32.98879
Removing variable sub_3,Postcode
LRf=lm(Price ~ .-Postcode-sub_3,data=train_75)
summary(LRf)
##
## Call:
## lm(formula = Price ~ . - Postcode - sub_3, data = train_75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1674852 -198669 -26909 150628 3836158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.866e+06 4.137e+05 9.345 < 2e-16 ***
## Rooms 2.277e+05 9.893e+03 23.013 < 2e-16 ***
## Distance -3.406e+04 2.593e+03 -13.133 < 2e-16 ***
## Bedroom2 -4.694e+04 1.129e+04 -4.156 3.29e-05 ***
## Bathroom 1.564e+05 1.082e+04 14.458 < 2e-16 ***
## Car 7.201e+04 7.163e+03 10.053 < 2e-16 ***
## Landsize 6.351e+00 3.946e+00 1.609 0.107598
## BuildingArea 4.603e+02 6.284e+01 7.326 2.72e-13 ***
## YearBuilt -1.742e+03 2.120e+02 -8.217 2.57e-16 ***
## sub_1 -2.075e+05 1.374e+05 -1.511 0.130859
## sub_2 -1.179e+05 3.799e+04 -3.103 0.001924 **
## sub_4 -6.069e+03 2.328e+04 -0.261 0.794341
## sub_5 8.810e+04 2.245e+04 3.924 8.82e-05 ***
## sub_6 1.423e+05 2.396e+04 5.939 3.05e-09 ***
## sub_7 2.094e+05 2.452e+04 8.541 < 2e-16 ***
## sub_8 1.950e+05 2.572e+04 7.581 3.99e-14 ***
## sub_9 2.950e+05 2.927e+04 10.077 < 2e-16 ***
## sub_10 3.789e+05 5.808e+04 6.524 7.43e-11 ***
## sub_11 4.560e+05 3.622e+04 12.592 < 2e-16 ***
## sub_12 4.518e+05 3.545e+04 12.744 < 2e-16 ***
## sub_13 6.588e+05 4.791e+04 13.749 < 2e-16 ***
## sub_14 5.878e+05 3.265e+04 18.005 < 2e-16 ***
## sub_15 9.648e+05 5.414e+04 17.820 < 2e-16 ***
## sub_16 7.646e+05 4.175e+04 18.315 < 2e-16 ***
## Type_t -2.417e+05 1.797e+04 -13.448 < 2e-16 ***
## type_u -3.962e+05 1.536e+04 -25.786 < 2e-16 ***
## Method_PI -1.226e+05 1.561e+04 -7.852 4.89e-15 ***
## Method_SA -6.103e+04 8.307e+04 -0.735 0.462539
## Method_SP -5.110e+04 1.600e+04 -3.194 0.001409 **
## Method_VB -9.258e+04 1.785e+04 -5.188 2.20e-07 ***
## Gnelson -5.174e+04 2.086e+04 -2.480 0.013163 *
## GJellis 9.423e+04 2.188e+04 4.307 1.68e-05 ***
## Ghstuart -3.412e+04 2.087e+04 -1.635 0.102088
## Gbarry -4.847e+04 2.458e+04 -1.972 0.048676 *
## GMarshall 2.669e+05 2.649e+04 10.075 < 2e-16 ***
## GWoodards 6.422e+03 3.704e+04 0.173 0.862365
## GBrad -3.629e+04 3.522e+04 -1.030 0.302837
## GBiggin -1.909e+04 3.285e+04 -0.581 0.561153
## GRay -4.027e+04 2.936e+04 -1.372 0.170205
## GFletchers -3.266e+04 3.920e+04 -0.833 0.404799
## GRT 9.239e+04 3.801e+04 2.431 0.015107 *
## GSweeney -5.954e+04 4.442e+04 -1.341 0.180107
## GGreg 8.703e+03 4.041e+04 0.215 0.829491
## GNoel -4.855e+04 4.678e+04 -1.038 0.299470
## GGary 3.866e+04 4.361e+04 0.887 0.375343
## GJas -3.798e+04 4.351e+04 -0.873 0.382775
## GMiles 4.045e+04 4.653e+04 0.869 0.384626
## GMcGrath 1.018e+05 4.604e+04 2.210 0.027112 *
## GHodges -5.632e+04 5.130e+04 -1.098 0.272371
## GKay 4.870e+05 5.055e+04 9.633 < 2e-16 ***
## GStockdale -6.112e+04 5.340e+04 -1.145 0.252404
## GLove -2.331e+04 5.944e+04 -0.392 0.694925
## GDouglas -1.426e+05 6.439e+04 -2.214 0.026859 *
## GWilliams 6.003e+04 5.600e+04 1.072 0.283760
## GVillage 2.473e+03 5.941e+04 0.042 0.966800
## GRaine -1.677e+05 6.362e+04 -2.636 0.008422 **
## GRendina -3.314e+04 6.782e+04 -0.489 0.625105
## GChisholm 6.223e+04 7.054e+04 0.882 0.377668
## GCollins 1.963e+05 7.381e+04 2.660 0.007835 **
## GLITTLE -1.289e+05 7.698e+04 -1.674 0.094166 .
## GNick 5.440e+04 9.555e+04 0.569 0.569162
## GHarcourts -3.790e+04 7.893e+04 -0.480 0.631142
## GCayzer -2.627e+04 7.445e+04 -0.353 0.724150
## GMoonee -1.301e+05 8.611e+04 -1.510 0.131029
## GYPA -1.412e+05 1.005e+05 -1.406 0.159908
## CA_Banyule -1.519e+05 3.391e+04 -4.480 7.62e-06 ***
## CA_Bayside -1.310e+05 4.941e+04 -2.652 0.008013 **
## CA_Boroondara -2.486e+04 2.688e+04 -0.925 0.355181
## CA_Brimbank -1.321e+05 4.013e+04 -3.291 0.001004 **
## CA_Darebin -1.115e+05 2.580e+04 -4.322 1.57e-05 ***
## CA_Glen_Eira 3.886e+04 2.824e+04 1.376 0.168950
## CA_Monash -4.451e+03 4.602e+04 -0.097 0.922957
## CA_Melbourne -5.553e+04 3.514e+04 -1.580 0.114066
## CA_Maribyrnong -1.804e+05 3.243e+04 -5.562 2.79e-08 ***
## CA_Manningham -1.255e+05 4.304e+04 -2.917 0.003552 **
## CA_Kingston -1.064e+05 6.425e+04 -1.655 0.097912 .
## CA_Hume -1.012e+05 1.837e+05 -0.551 0.581728
## CA_HobsonsB -1.582e+05 3.810e+04 -4.150 3.37e-05 ***
## CA_MoonValley -1.191e+05 2.565e+04 -4.644 3.50e-06 ***
## CA_Moreland -1.515e+05 2.456e+04 -6.169 7.35e-10 ***
## CA_PortP 2.208e+04 3.123e+04 0.707 0.479515
## CA_Stonnington 1.269e+04 2.892e+04 0.439 0.660879
## CA_Whitehorse -1.870e+04 4.575e+04 -0.409 0.682699
## CA_Yarra -1.139e+05 3.211e+04 -3.548 0.000392 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 375800 on 5568 degrees of freedom
## Multiple R-squared: 0.6815, Adjusted R-squared: 0.6767
## F-statistic: 143.5 on 83 and 5568 DF, p-value: < 2.2e-16
a=vif(LRf)
sort(a,decreasing = T)[1:3]
## Bedroom2 Rooms Distance
## 3.770476 3.705437 3.549597
summary(LRf)
##
## Call:
## lm(formula = Price ~ . - Postcode - sub_3, data = train_75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1674852 -198669 -26909 150628 3836158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.866e+06 4.137e+05 9.345 < 2e-16 ***
## Rooms 2.277e+05 9.893e+03 23.013 < 2e-16 ***
## Distance -3.406e+04 2.593e+03 -13.133 < 2e-16 ***
## Bedroom2 -4.694e+04 1.129e+04 -4.156 3.29e-05 ***
## Bathroom 1.564e+05 1.082e+04 14.458 < 2e-16 ***
## Car 7.201e+04 7.163e+03 10.053 < 2e-16 ***
## Landsize 6.351e+00 3.946e+00 1.609 0.107598
## BuildingArea 4.603e+02 6.284e+01 7.326 2.72e-13 ***
## YearBuilt -1.742e+03 2.120e+02 -8.217 2.57e-16 ***
## sub_1 -2.075e+05 1.374e+05 -1.511 0.130859
## sub_2 -1.179e+05 3.799e+04 -3.103 0.001924 **
## sub_4 -6.069e+03 2.328e+04 -0.261 0.794341
## sub_5 8.810e+04 2.245e+04 3.924 8.82e-05 ***
## sub_6 1.423e+05 2.396e+04 5.939 3.05e-09 ***
## sub_7 2.094e+05 2.452e+04 8.541 < 2e-16 ***
## sub_8 1.950e+05 2.572e+04 7.581 3.99e-14 ***
## sub_9 2.950e+05 2.927e+04 10.077 < 2e-16 ***
## sub_10 3.789e+05 5.808e+04 6.524 7.43e-11 ***
## sub_11 4.560e+05 3.622e+04 12.592 < 2e-16 ***
## sub_12 4.518e+05 3.545e+04 12.744 < 2e-16 ***
## sub_13 6.588e+05 4.791e+04 13.749 < 2e-16 ***
## sub_14 5.878e+05 3.265e+04 18.005 < 2e-16 ***
## sub_15 9.648e+05 5.414e+04 17.820 < 2e-16 ***
## sub_16 7.646e+05 4.175e+04 18.315 < 2e-16 ***
## Type_t -2.417e+05 1.797e+04 -13.448 < 2e-16 ***
## type_u -3.962e+05 1.536e+04 -25.786 < 2e-16 ***
## Method_PI -1.226e+05 1.561e+04 -7.852 4.89e-15 ***
## Method_SA -6.103e+04 8.307e+04 -0.735 0.462539
## Method_SP -5.110e+04 1.600e+04 -3.194 0.001409 **
## Method_VB -9.258e+04 1.785e+04 -5.188 2.20e-07 ***
## Gnelson -5.174e+04 2.086e+04 -2.480 0.013163 *
## GJellis 9.423e+04 2.188e+04 4.307 1.68e-05 ***
## Ghstuart -3.412e+04 2.087e+04 -1.635 0.102088
## Gbarry -4.847e+04 2.458e+04 -1.972 0.048676 *
## GMarshall 2.669e+05 2.649e+04 10.075 < 2e-16 ***
## GWoodards 6.422e+03 3.704e+04 0.173 0.862365
## GBrad -3.629e+04 3.522e+04 -1.030 0.302837
## GBiggin -1.909e+04 3.285e+04 -0.581 0.561153
## GRay -4.027e+04 2.936e+04 -1.372 0.170205
## GFletchers -3.266e+04 3.920e+04 -0.833 0.404799
## GRT 9.239e+04 3.801e+04 2.431 0.015107 *
## GSweeney -5.954e+04 4.442e+04 -1.341 0.180107
## GGreg 8.703e+03 4.041e+04 0.215 0.829491
## GNoel -4.855e+04 4.678e+04 -1.038 0.299470
## GGary 3.866e+04 4.361e+04 0.887 0.375343
## GJas -3.798e+04 4.351e+04 -0.873 0.382775
## GMiles 4.045e+04 4.653e+04 0.869 0.384626
## GMcGrath 1.018e+05 4.604e+04 2.210 0.027112 *
## GHodges -5.632e+04 5.130e+04 -1.098 0.272371
## GKay 4.870e+05 5.055e+04 9.633 < 2e-16 ***
## GStockdale -6.112e+04 5.340e+04 -1.145 0.252404
## GLove -2.331e+04 5.944e+04 -0.392 0.694925
## GDouglas -1.426e+05 6.439e+04 -2.214 0.026859 *
## GWilliams 6.003e+04 5.600e+04 1.072 0.283760
## GVillage 2.473e+03 5.941e+04 0.042 0.966800
## GRaine -1.677e+05 6.362e+04 -2.636 0.008422 **
## GRendina -3.314e+04 6.782e+04 -0.489 0.625105
## GChisholm 6.223e+04 7.054e+04 0.882 0.377668
## GCollins 1.963e+05 7.381e+04 2.660 0.007835 **
## GLITTLE -1.289e+05 7.698e+04 -1.674 0.094166 .
## GNick 5.440e+04 9.555e+04 0.569 0.569162
## GHarcourts -3.790e+04 7.893e+04 -0.480 0.631142
## GCayzer -2.627e+04 7.445e+04 -0.353 0.724150
## GMoonee -1.301e+05 8.611e+04 -1.510 0.131029
## GYPA -1.412e+05 1.005e+05 -1.406 0.159908
## CA_Banyule -1.519e+05 3.391e+04 -4.480 7.62e-06 ***
## CA_Bayside -1.310e+05 4.941e+04 -2.652 0.008013 **
## CA_Boroondara -2.486e+04 2.688e+04 -0.925 0.355181
## CA_Brimbank -1.321e+05 4.013e+04 -3.291 0.001004 **
## CA_Darebin -1.115e+05 2.580e+04 -4.322 1.57e-05 ***
## CA_Glen_Eira 3.886e+04 2.824e+04 1.376 0.168950
## CA_Monash -4.451e+03 4.602e+04 -0.097 0.922957
## CA_Melbourne -5.553e+04 3.514e+04 -1.580 0.114066
## CA_Maribyrnong -1.804e+05 3.243e+04 -5.562 2.79e-08 ***
## CA_Manningham -1.255e+05 4.304e+04 -2.917 0.003552 **
## CA_Kingston -1.064e+05 6.425e+04 -1.655 0.097912 .
## CA_Hume -1.012e+05 1.837e+05 -0.551 0.581728
## CA_HobsonsB -1.582e+05 3.810e+04 -4.150 3.37e-05 ***
## CA_MoonValley -1.191e+05 2.565e+04 -4.644 3.50e-06 ***
## CA_Moreland -1.515e+05 2.456e+04 -6.169 7.35e-10 ***
## CA_PortP 2.208e+04 3.123e+04 0.707 0.479515
## CA_Stonnington 1.269e+04 2.892e+04 0.439 0.660879
## CA_Whitehorse -1.870e+04 4.575e+04 -0.409 0.682699
## CA_Yarra -1.139e+05 3.211e+04 -3.548 0.000392 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 375800 on 5568 degrees of freedom
## Multiple R-squared: 0.6815, Adjusted R-squared: 0.6767
## F-statistic: 143.5 on 83 and 5568 DF, p-value: < 2.2e-16
Now removing all variables whose p value is >0.05 one by one.
LRf=lm(Price ~ .-Landsize-GRaine-GMoonee-CA_Bayside-GLITTLE-Gnelson-GSweeney-Ghstuart-CA_Kingston-Gbarry-GRay-GStockdale-GNoel-GJas-GBiggin-GYPA-CA_PortP-CA_Whitehorse-GRendina-GFletchers-GBrad-GHodges-GVillage-GLove-sub_4-GGary-CA_Hume-CA_Boroondara-Method_SA-GWilliams-GHarcourts-GNick-GGreg-CA_Monash-GWoodards-CA_Stonnington-GCayzer-Postcode-sub_3,data=train_75)
summary(LRf)
##
## Call:
## lm(formula = Price ~ . - Landsize - GRaine - GMoonee - CA_Bayside -
## GLITTLE - Gnelson - GSweeney - Ghstuart - CA_Kingston - Gbarry -
## GRay - GStockdale - GNoel - GJas - GBiggin - GYPA - CA_PortP -
## CA_Whitehorse - GRendina - GFletchers - GBrad - GHodges -
## GVillage - GLove - sub_4 - GGary - CA_Hume - CA_Boroondara -
## Method_SA - GWilliams - GHarcourts - GNick - GGreg - CA_Monash -
## GWoodards - CA_Stonnington - GCayzer - Postcode - sub_3,
## data = train_75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1656253 -198103 -26651 152266 3860482
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3895526.2 410315.4 9.494 < 2e-16 ***
## Rooms 230453.8 9817.6 23.474 < 2e-16 ***
## Distance -36543.2 2177.0 -16.786 < 2e-16 ***
## Bedroom2 -49181.9 10962.2 -4.486 7.39e-06 ***
## Bathroom 152338.3 9796.0 15.551 < 2e-16 ***
## Car 71883.3 6932.3 10.369 < 2e-16 ***
## BuildingArea 456.2 62.7 7.276 3.90e-13 ***
## YearBuilt -1758.6 210.9 -8.339 < 2e-16 ***
## sub_1 -273475.0 105758.2 -2.586 0.00974 **
## sub_2 -112772.3 36858.2 -3.060 0.00223 **
## sub_5 94069.4 19621.4 4.794 1.68e-06 ***
## sub_6 144194.2 19408.8 7.429 1.26e-13 ***
## sub_7 216978.0 20449.8 10.610 < 2e-16 ***
## sub_8 196349.4 21932.0 8.953 < 2e-16 ***
## sub_9 299226.1 27050.0 11.062 < 2e-16 ***
## sub_10 377663.0 55937.6 6.752 1.61e-11 ***
## sub_11 456073.2 31635.9 14.416 < 2e-16 ***
## sub_12 450564.6 31299.9 14.395 < 2e-16 ***
## sub_13 597071.7 34730.8 17.191 < 2e-16 ***
## sub_14 580413.7 27760.1 20.908 < 2e-16 ***
## sub_15 898556.4 40814.8 22.015 < 2e-16 ***
## sub_16 768474.4 38796.4 19.808 < 2e-16 ***
## Type_t -241570.0 17712.5 -13.638 < 2e-16 ***
## type_u -394543.4 15236.5 -25.895 < 2e-16 ***
## Method_PI -122614.8 15513.2 -7.904 3.23e-15 ***
## Method_SP -49640.2 15859.5 -3.130 0.00176 **
## Method_VB -90027.6 17697.0 -5.087 3.75e-07 ***
## GJellis 119215.4 18021.3 6.615 4.05e-11 ***
## GMarshall 291140.0 23689.8 12.290 < 2e-16 ***
## GRT 112799.9 36148.7 3.120 0.00181 **
## GMiles 74389.9 44864.5 1.658 0.09735 .
## GMcGrath 128966.3 44398.0 2.905 0.00369 **
## GKay 515973.8 48812.4 10.571 < 2e-16 ***
## GDouglas -104470.2 62903.4 -1.661 0.09681 .
## GChisholm 102832.9 67807.5 1.517 0.12944
## GCollins 223817.5 72596.2 3.083 0.00206 **
## CA_Banyule -149576.7 32588.6 -4.590 4.53e-06 ***
## CA_Brimbank -127545.6 38862.2 -3.282 0.00104 **
## CA_Darebin -121975.2 22895.1 -5.328 1.03e-07 ***
## CA_Glen_Eira 63622.7 24966.5 2.548 0.01085 *
## CA_Melbourne -66990.4 31641.0 -2.117 0.03429 *
## CA_Maribyrnong -182664.9 26894.4 -6.792 1.22e-11 ***
## CA_Manningham -116875.2 40860.4 -2.860 0.00425 **
## CA_HobsonsB -138397.7 33563.2 -4.123 3.79e-05 ***
## CA_MoonValley -136609.5 21863.4 -6.248 4.45e-10 ***
## CA_Moreland -167533.8 22109.9 -7.577 4.10e-14 ***
## CA_Yarra -130267.9 28117.7 -4.633 3.69e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 376000 on 5605 degrees of freedom
## Multiple R-squared: 0.6791, Adjusted R-squared: 0.6764
## F-statistic: 257.8 on 46 and 5605 DF, p-value: < 2.2e-16
PP_test_25=predict(LRf,newdata =test_25)
PP_test_25=round(PP_test_25,1)
class(PP_test_25)
## [1] "numeric"
PP_test_25 contains the predicted price values for corresponding observations based on the model LRF. ####Calculating the RMSE and Plotting the graph.
#lets plot the real price vs predicted price for dataset test_25:
plot(test_25$Price,PP_test_25)
res=test_25$Price-PP_test_25 #(real value-predicted value)
#root mean square error is as follows
RMSE_test_25=sqrt(mean(res^2))
RMSE_test_25
## [1] 387505.6
#the passing criteria mentioned,was to have value >0.5 which we have successfuly crossed.Hence our model is good.
212467/RMSE_test_25
## [1] 0.548294
library(ggplot2)
d=data.frame(real=test_25$Price,predicted=PP_test_25)
ggplot(d,aes(x=real,y=predicted))+geom_point()
plot(LRf,which = 1) #gives residual vz fitted plot
plot(LRf,which = 2) #gives q-q-plot
plot(LRf,which = 3) #gives scale-location plot
plot(LRf,which = 4) #gives cooks distance
PP_test_final=predict(LRf,newdata =test)
PP_test_final=round(PP_test_final,1)
class(PP_test_final)
## [1] "numeric"
write.csv(PP_test_final, "PP_test_final.csv")#stores the predicted prices in a csv file on your local repository in pc.
summary(LRf)
##
## Call:
## lm(formula = Price ~ . - Landsize - GRaine - GMoonee - CA_Bayside -
## GLITTLE - Gnelson - GSweeney - Ghstuart - CA_Kingston - Gbarry -
## GRay - GStockdale - GNoel - GJas - GBiggin - GYPA - CA_PortP -
## CA_Whitehorse - GRendina - GFletchers - GBrad - GHodges -
## GVillage - GLove - sub_4 - GGary - CA_Hume - CA_Boroondara -
## Method_SA - GWilliams - GHarcourts - GNick - GGreg - CA_Monash -
## GWoodards - CA_Stonnington - GCayzer - Postcode - sub_3,
## data = train_75)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1656253 -198103 -26651 152266 3860482
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3895526.2 410315.4 9.494 < 2e-16 ***
## Rooms 230453.8 9817.6 23.474 < 2e-16 ***
## Distance -36543.2 2177.0 -16.786 < 2e-16 ***
## Bedroom2 -49181.9 10962.2 -4.486 7.39e-06 ***
## Bathroom 152338.3 9796.0 15.551 < 2e-16 ***
## Car 71883.3 6932.3 10.369 < 2e-16 ***
## BuildingArea 456.2 62.7 7.276 3.90e-13 ***
## YearBuilt -1758.6 210.9 -8.339 < 2e-16 ***
## sub_1 -273475.0 105758.2 -2.586 0.00974 **
## sub_2 -112772.3 36858.2 -3.060 0.00223 **
## sub_5 94069.4 19621.4 4.794 1.68e-06 ***
## sub_6 144194.2 19408.8 7.429 1.26e-13 ***
## sub_7 216978.0 20449.8 10.610 < 2e-16 ***
## sub_8 196349.4 21932.0 8.953 < 2e-16 ***
## sub_9 299226.1 27050.0 11.062 < 2e-16 ***
## sub_10 377663.0 55937.6 6.752 1.61e-11 ***
## sub_11 456073.2 31635.9 14.416 < 2e-16 ***
## sub_12 450564.6 31299.9 14.395 < 2e-16 ***
## sub_13 597071.7 34730.8 17.191 < 2e-16 ***
## sub_14 580413.7 27760.1 20.908 < 2e-16 ***
## sub_15 898556.4 40814.8 22.015 < 2e-16 ***
## sub_16 768474.4 38796.4 19.808 < 2e-16 ***
## Type_t -241570.0 17712.5 -13.638 < 2e-16 ***
## type_u -394543.4 15236.5 -25.895 < 2e-16 ***
## Method_PI -122614.8 15513.2 -7.904 3.23e-15 ***
## Method_SP -49640.2 15859.5 -3.130 0.00176 **
## Method_VB -90027.6 17697.0 -5.087 3.75e-07 ***
## GJellis 119215.4 18021.3 6.615 4.05e-11 ***
## GMarshall 291140.0 23689.8 12.290 < 2e-16 ***
## GRT 112799.9 36148.7 3.120 0.00181 **
## GMiles 74389.9 44864.5 1.658 0.09735 .
## GMcGrath 128966.3 44398.0 2.905 0.00369 **
## GKay 515973.8 48812.4 10.571 < 2e-16 ***
## GDouglas -104470.2 62903.4 -1.661 0.09681 .
## GChisholm 102832.9 67807.5 1.517 0.12944
## GCollins 223817.5 72596.2 3.083 0.00206 **
## CA_Banyule -149576.7 32588.6 -4.590 4.53e-06 ***
## CA_Brimbank -127545.6 38862.2 -3.282 0.00104 **
## CA_Darebin -121975.2 22895.1 -5.328 1.03e-07 ***
## CA_Glen_Eira 63622.7 24966.5 2.548 0.01085 *
## CA_Melbourne -66990.4 31641.0 -2.117 0.03429 *
## CA_Maribyrnong -182664.9 26894.4 -6.792 1.22e-11 ***
## CA_Manningham -116875.2 40860.4 -2.860 0.00425 **
## CA_HobsonsB -138397.7 33563.2 -4.123 3.79e-05 ***
## CA_MoonValley -136609.5 21863.4 -6.248 4.45e-10 ***
## CA_Moreland -167533.8 22109.9 -7.577 4.10e-14 ***
## CA_Yarra -130267.9 28117.7 -4.633 3.69e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 376000 on 5605 degrees of freedom
## Multiple R-squared: 0.6791, Adjusted R-squared: 0.6764
## F-statistic: 257.8 on 46 and 5605 DF, p-value: < 2.2e-16