Problem Statement:

Price of a property is one of the most important decision criterion when people buy homes. Real state firms need to be consistent in their pricing in order to attract buyers . Having a predictive model for the same will be great tool to have , which in turn can also be used to tweak development of properties , putting more emphasis on qualities which increase the value of the property.

Aim:

To Build a machine learning predictive model and predict the accurate prices of the proterties.

The evalution metric will be RMSE.

Data:

There exist two datasets, housing_train.csv and housing_test.csv . We will use data housing_train to build predictive model for response variable “Price”. Housing_test data contains all other factors except “Price” which we can use for testing purpose.

Data dictionary:

Variables : Type :: Definition

Suburb : categorical :: Which subsurb the property is located in

Address : categorical :: short address

Rooms : numeric :: Number of Rooms

Type : categorical :: type of the property

Price : numeric :: This is the target variable, price of the property

Method : categorical :: method for selling

SellerG : categorical :: Name of the seller

Distance : numeric :: distance from the city center

Postcode : categorical :: postcode of the property

Bedroom2 : Numeric :: numbers of secondary bedrooms (this is different from rooms)

Bathroom : numeric :: number of bathrooms

Car : numeric :: number of parking spaces

Landsize : numeric :: landsize

BuildingArea : numeric :: buildup area

YearBuilt : numeric :: year of building

CouncilArea : numeric :: council area to which the propery belongs

Methodology:

We will build a linear regression model to predict the response variable “Price”

1.Imputing NA values in the datasets.

2.Data Preparation.

3.Model Building.

4.Perfomance measurement of the model.

5:Predicting Real Estate Prices for the final Test Dataset.

Step 1: Imputing NA values in the datasets.

Initial setup

loading library dplyr

library(dplyr)
setwd("C:\\Users\\INS15R\\Documents\\R latest\\R EDVANCER\\Industry Based Projects\\Industry-Based-Projects-Edvancer-Eduventures")
getwd()
## [1] "C:/Users/INS15R/Documents/R latest/R EDVANCER/Industry Based Projects/Industry-Based-Projects-Edvancer-Eduventures"

Reading train and test datasets:

train=read.csv("housing_train.csv",stringsAsFactors = FALSE,header = T )
#7536 obs,16 variables
test=read.csv("housing_test.csv",stringsAsFactors = FALSE,header = T )
#1885 obs,15 variables

Imputing Na values of both test and train datasets:

Lets first impute the NA values of train dataset.

Replacing the NA values(1559 obs) of Bedroom2 variable with its median:3

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0         1559 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##         1559         1559         1564         4209         3717 
##  CouncilArea 
##            0
train$Bedroom2[is.na(train$Bedroom2)]=median(train$Bedroom2,na.rm=T)

Similarly ,all other NA values are imputed as follows:

For variable Bathroom:

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##         1559         1559         1564         4209         3717 
##  CouncilArea 
##            0
train$Bathroom[is.na(train$Bathroom)]=round(mean(train$Bathroom,na.rm=T),0)

For variable Car:

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##            0         1559         1564         4209         3717 
##  CouncilArea 
##            0
train$Car[is.na(train$Car)]=round(mean(train$Car,na.rm=T),0)

For variable Lansize:

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##            0            0         1564         4209         3717 
##  CouncilArea 
##            0
train$Landsize[is.na(train$Landsize)]=round(mean(train$Landsize,na.rm=T),0)

For Variable BuildingArea:

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##            0            0            0         4209         3717 
##  CouncilArea 
##            0
train$BuildingArea[is.na(train$BuildingArea)]=round(mean(train$BuildingArea,na.rm=T),0)

For variable YearBuilt:

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##            0            0            0            0         3717 
##  CouncilArea 
##            0
train$YearBuilt[is.na(train$YearBuilt)]=round(mean(train$YearBuilt,na.rm=T),0)

Thus all Na values of dataset train is succesfully imputed.

apply(train,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0            0 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##            0            0            0            0            0 
##  CouncilArea 
##            0

Now,lets impute the na values of test dataset.

apply(test,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type       Method 
##            0            0            0            0            0 
##      SellerG     Distance     Postcode     Bedroom2     Bathroom 
##            0            0            0          419          419 
##          Car     Landsize BuildingArea    YearBuilt  CouncilArea 
##          419          421         1060          943            0
test$Bedroom2[is.na(test$Bedroom2)]=median(test$Bedroom2,na.rm=T)
apply(test,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type       Method 
##            0            0            0            0            0 
##      SellerG     Distance     Postcode     Bedroom2     Bathroom 
##            0            0            0            0          419 
##          Car     Landsize BuildingArea    YearBuilt  CouncilArea 
##          419          421         1060          943            0
test$Bathroom[is.na(test$Bathroom)]=round(mean(test$Bathroom,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type       Method 
##            0            0            0            0            0 
##      SellerG     Distance     Postcode     Bedroom2     Bathroom 
##            0            0            0            0            0 
##          Car     Landsize BuildingArea    YearBuilt  CouncilArea 
##          419          421         1060          943            0
test$Car[is.na(test$Car)]=round(mean(test$Car,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type       Method 
##            0            0            0            0            0 
##      SellerG     Distance     Postcode     Bedroom2     Bathroom 
##            0            0            0            0            0 
##          Car     Landsize BuildingArea    YearBuilt  CouncilArea 
##            0          421         1060          943            0
test$Landsize[is.na(test$Landsize)]=round(mean(test$Landsize,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type       Method 
##            0            0            0            0            0 
##      SellerG     Distance     Postcode     Bedroom2     Bathroom 
##            0            0            0            0            0 
##          Car     Landsize BuildingArea    YearBuilt  CouncilArea 
##            0            0         1060          943            0
test$BuildingArea[is.na(test$BuildingArea)]=round(mean(test$BuildingArea,na.rm=T),0)
apply(test,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type       Method 
##            0            0            0            0            0 
##      SellerG     Distance     Postcode     Bedroom2     Bathroom 
##            0            0            0            0            0 
##          Car     Landsize BuildingArea    YearBuilt  CouncilArea 
##            0            0            0          943            0
test$YearBuilt[is.na(test$YearBuilt)]=round(median(test$YearBuilt,na.rm=T),0)

Thus,Na values of test datasets is imputed with its mean/median successfully.

Step 2:Data Preparation

Combining both train n test datasets prior to data preparation.

test$Price=NA
train$data='train'
test$data='test'
all_data=rbind(train,test)
apply(all_data,2,function(x)sum(is.na(x)))
##       Suburb      Address        Rooms         Type        Price 
##            0            0            0            0         1885 
##       Method      SellerG     Distance     Postcode     Bedroom2 
##            0            0            0            0            0 
##     Bathroom          Car     Landsize BuildingArea    YearBuilt 
##            0            0            0            0            0 
##  CouncilArea         data 
##            0            0

Lets see the structure and datatypes of the combined dataset.

glimpse(all_data)
## Observations: 9,421
## Variables: 17
## $ Suburb       <chr> "Brunswick", "Reservoir", "Newport", "Brighton Ea...
## $ Address      <chr> "52 Evans St", "85 Radford Rd", "99 Anderson St",...
## $ Rooms        <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Type         <chr> "h", "h", "h", "u", "h", "h", "h", "h", "h", "u",...
## $ Price        <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method       <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG      <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance     <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode     <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2     <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom     <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car          <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize     <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt    <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea  <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data         <chr> "train", "train", "train", "train", "train", "tra...

Creating dummy variables by combining similar categories for variable Suburb(char type)

t=table(all_data$Suburb)
View(t)
t1=round(tapply(all_data$Price,all_data$Suburb,mean,na.rm=T),0)
View(t1)
t1=sort(t1)
all_data=all_data %>% 
  mutate(
    sub_1=as.numeric(Suburb%in%c("Campbellfield","Jacana")),
    sub_2=as.numeric(Suburb%in%c("Kealba","Brooklyn","Albion","Sunshine West","Ripponlea","Fawkner")),
    sub_3=as.numeric(Suburb%in%c("Glenroy","Southbank","Sunshine North","Keilor Park","Heidelberg West","Reservoir","Braybrook","Kingsbury","Gowanbrae","Hadfield","Watsonia","Footscray","South Kingsville","Balaclava","Melbourne","Maidstone","Sunshine")),
    sub_4=as.numeric(Suburb%in%c("Airport West","Heidelberg Heights","Pascoe Vale","West Footscray","Altona North","Williamstown North","Brunswick West","Keilor East","Oak Park","Maribyrnong","Altona","Flemington","Coburg North","Yallambie","Avondale Heights","Bellfield")),
    sub_5=as.numeric(Suburb%in%c("Strathmore Heights","Glen Huntly","Kensington","Essendon North","St Kilda","Preston","North Melbourne","Coburg","Kingsville","Collingwood","Brunswick East","Gardenvale","Thornbury","Niddrie","West Melbourne","Viewbank")),
    sub_6=as.numeric(Suburb%in%c("Spotswood","Carnegie","Elwood","Heidelberg","Moorabbin","Oakleigh","Rosanna","Docklands","Yarraville","Cremorne","Seddon","Brunswick","Oakleigh South","Ascot Vale","Windsor","Caulfield","Essendon West","Newport")),
    sub_7=as.numeric(Suburb%in%c("Chadstone","South Yarra","Essendon","Bentleigh East","Murrumbeena","Hughesdale","Fairfield","Ashwood","Clifton Hill","Caulfield North","Abbotsford","Carlton","Prahran","Fitzroy","Ivanhoe","Hampton East","Caulfield East")),
    sub_8=as.numeric(Suburb%in%c("Richmond","Travancore","Templestowe Lower","Ormond","Caulfield South","Moonee Ponds","Hawthorn","Box Hill","Bulleen","Burnley","Burwood","Strathmore","Port Melbourne","Fitzroy North","Alphington")),
    sub_9=as.numeric(Suburb%in%c("Doncaster","South Melbourne","Northcote","Aberfeldie","Elsternwick","Bentleigh","Kooyong","Parkville")),
    sub_10=as.numeric(Suburb%in%c("Williamstown","East Melbourne","Seaholme")),
    sub_11=as.numeric(Suburb%in%c("Malvern East","Carlton North","Hawthorn East","Surrey Hills")),
    sub_12=as.numeric(Suburb%in%c("Princes Hill","Mont Albert","Armadale","Kew East","Glen Iris","Ashburton")),
    sub_13=as.numeric(Suburb%in%c("Brighton East","Eaglemont","Hampton")),
    sub_14=as.numeric(Suburb%in%c("Toorak","Ivanhoe East","Camberwell","Balwyn North","Kew")),
    sub_15=as.numeric(Suburb%in%c("Brighton","Middle Park")),
    sub_16=as.numeric(Suburb%in%c("Albert Park","Balwyn","Malvern"))
     ) %>% 
  select(-Suburb)

glimpse(all_data)
## Observations: 9,421
## Variables: 32
## $ Address      <chr> "52 Evans St", "85 Radford Rd", "99 Anderson St",...
## $ Rooms        <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Type         <chr> "h", "h", "h", "u", "h", "h", "h", "h", "h", "u",...
## $ Price        <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method       <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG      <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance     <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode     <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2     <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom     <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car          <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize     <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt    <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea  <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data         <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3        <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6        <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7        <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11       <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...

Deleting variable address as it is unique.

all_data=all_data %>% 
  select(-Address)

Making dummies for variable type.

glimpse(all_data)
## Observations: 9,421
## Variables: 31
## $ Rooms        <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Type         <chr> "h", "h", "h", "u", "h", "h", "h", "h", "h", "u",...
## $ Price        <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method       <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG      <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance     <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode     <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2     <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom     <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car          <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize     <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt    <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea  <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data         <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3        <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6        <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7        <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11       <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
table(all_data$Type)
## 
##    h    t    u 
## 5916 1048 2457
all_data=all_data %>%
  mutate(Type_t=as.numeric(Type=="t"),
         type_u=as.numeric(Type=="u"))
all_data=all_data %>% 
  select(-Type)

Making dummies for variable Method.

glimpse(all_data)  #9421obs and 16 variables 
## Observations: 9,421
## Variables: 32
## $ Rooms        <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Price        <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Method       <chr> "S", "S", "S", "SP", "VB", "S", "VB", "VB", "PI",...
## $ SellerG      <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance     <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode     <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2     <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom     <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car          <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize     <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt    <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea  <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data         <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3        <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6        <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7        <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11       <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Type_t       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ type_u       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1...
table(all_data$Method)
## 
##   PI    S   SA   SP   VB 
## 1235 6103   35 1162  886
all_data=all_data %>%
  mutate(Method_PI=as.numeric(Method=="PI"),
         Method_SA=as.numeric(Method=="SA"),
         Method_SP=as.numeric(Method=="SP"),
         Method_VB=as.numeric(Method=="VB")) %>% 
  select(-Method)

Making dummies for varible SellerG

glimpse(all_data)
## Observations: 9,421
## Variables: 35
## $ Rooms        <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Price        <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ SellerG      <chr> "Nelson", "Ray", "RT", "Buxton", "RT", "Hooper", ...
## $ Distance     <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode     <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2     <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom     <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car          <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize     <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt    <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea  <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data         <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3        <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6        <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7        <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11       <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Type_t       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ type_u       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1...
## $ Method_PI    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0...
## $ Method_SA    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Method_SP    <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
## $ Method_VB    <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
t=table(all_data$SellerG)
sort(t)
## 
##                    AIME                 Airport                   Allan 
##                       1                       1                       1 
##                 Appleby                   Batty                    Blue 
##                       1                       1                       1 
##                  Bustin             Buxton/Find                 CASTRAN 
##                       1                       1                       1 
##                 Century               Clairmont                Coventry 
##                       1                       1                       1 
##                     Del                  Direct                   Elite 
##                       1                       1                       1 
##     Fletchers/Fletchers           Fletchers/One                   Geoff 
##                       1                       1                       1 
##                     Ham hockingstuart/Advantage     hockingstuart/Barry 
##                       1                       1                       1 
##    hockingstuart/Buxton   hockingstuart/Village                   Homes 
##                       1                       1                       1 
##                  Hooper                  Iconek                    iOne 
##                       1                       1                       1 
##                   iTRAK                     Joe                Johnston 
##                       1                       1                       1 
##                  Joseph                   Karen                   Lucas 
##                       1                       1                       1 
##                    Luxe                  Luxton                   Mandy 
##                       1                       1                       1 
##                   Mason                 Meadows                  Naison 
##                       1                       1                       1 
##                Nardella                   North                     Oak 
##                       1                       1                       1 
##                     One               Parkinson       Private/Tiernan's 
##                       1                       1                       1 
##           Professionals                Property              Propertyau 
##                       1                       1                       1 
##                  Prowse                     R&H                   Reach 
##                       1                       1                       1 
##                     S&L                Steveway               Tiernan's 
##                       1                       1                       1 
##                     Vic                   Weast                     Win 
##                       1                       1                       1 
##                    Zahn                  Allens              Australian 
##                       1                       2                       2 
##                  Besser        Buxton/Advantage                  Calder 
##                       2                       2                       2 
##                Changing                Charlton                   Crane 
##                       2                       2                       2 
##                   David                   Dixon                 Galldon 
##                       2                       2                       2 
##                Grantham                    JMRE                     Ken 
##                       2                       2                       2 
##                      LJ                  Nguyen                      RE 
##                       2                       2                       2 
##                     Red                  Redina                    Ross 
##                       2                       2                       2 
##                   Scott       Sweeney/Advantage                 VICPROP 
##                       2                       2                       2 
##                   Walsh                    Wood                  Ascend 
##                       2                       2                       3 
##                     ASL                  Assisi                 Bayside 
##                       3                       3                       3 
##                 Compton                  Garvey                Hamilton 
##                       3                       3                       3 
##                   Jason                   Kelly                  Leased 
##                       3                       3                       3 
##                Maddison                     New                    Owen 
##                       3                       3                       3 
##                  Thomas                    Weda                Anderson 
##                       3                       3                       4 
##                   First                Morrison               Nicholson 
##                       4                       4                       4 
##                 O'Brien                   Prof.             Raine&Horne 
##                       4                       4                       4 
##                D'Aprano                  Domain                 Holland 
##                       5                       5                       5 
##                 Matthew                  Parkes                  Bekdon 
##                       5                       5                       6 
##                      FN                      Re               Sotheby's 
##                       6                       6                       6 
##                     HAR                 Morleys                   Pagan 
##                       7                       7                       7 
##                    W.B.                 William             Christopher 
##                       7                       7                       9 
##             O'Donoghues                Chambers                       J 
##                       9                      10                      10 
##                 Gunn&Co                  Hunter                   Pride 
##                      11                      11                      11 
##                 Trimson                   Brace                 Castran 
##                      11                      12                      13 
##                  Darren               Melbourne                  Rodney 
##                      14                      14                      14 
##                     Tim                 Whiting                   Caine 
##                      14                      14                      15 
##                Haughton               Lindellas                    MICM 
##                      15                      15                      15 
##                      GL                  Beller              Harrington 
##                      16                      17                      17 
##                    Paul            Purplebricks            Abercromby's 
##                      17                      17                      18 
##                  Barlow                  Wilson                  Philip 
##                      18                      18                      19 
##              Buckingham                  Walshe                  Edward 
##                      20                      20                      22 
##                McDonald              Alexkarbon                      RW 
##                      24                      25                      25 
##                   Bells                     C21               Considine 
##                      26                      26                      26 
##                   Eview                   Frank                 Thomson 
##                      27                      27                      27 
##                 Burnham                   Peter                  Dingle 
##                      28                      28                      29 
##                     YPA                  Moonee                  LITTLE 
##                      31                      33                      34 
##                    Nick               Harcourts                  Cayzer 
##                      34                      41                      43 
##                 Collins                Chisholm                 Rendina 
##                      44                      53                      58 
##                   Raine                    Love                 Douglas 
##                      62                      69                      73 
##                Williams                 Village               Stockdale 
##                      88                      93                     101 
##                     Kay                  Hodges                 McGrath 
##                     103                     104                     117 
##                    Noel                    Gary                     Jas 
##                     119                     147                     163 
##                   Miles                    Greg                 Sweeney 
##                     167                     173                     175 
##                      RT               Fletchers                Woodards 
##                     183                     191                     213 
##                    Brad                  Biggin                     Ray 
##                     274                     292                     361 
##                  Buxton                Marshall                   Barry 
##                     481                     539                     660 
##           hockingstuart                  Jellis                  Nelson 
##                     874                     995                    1194
all_data=all_data %>%
  mutate(Gnelson=as.numeric(SellerG=="Nelson"),
         GJellis=as.numeric(SellerG=="Jellis"),
         Ghstuart=as.numeric(SellerG=="hockingstuart"),
         Gbarry=as.numeric(SellerG=="Barry"),
         GMarshall=as.numeric(SellerG=="Marshall"),
         GWoodards=as.numeric(SellerG=="Woodards"),
         GBrad=as.numeric(SellerG=="Brad"),
         GBiggin=as.numeric(SellerG=="Biggin"),
         GRay=as.numeric(SellerG=="Ray"),
         GFletchers=as.numeric(SellerG=="Fletchers"),
         GRT=as.numeric(SellerG=="RT"),
         GSweeney=as.numeric(SellerG=="Sweeney"),
         GGreg=as.numeric(SellerG=="Greg"),
         GNoel=as.numeric(SellerG=="Noel"),
         GGary=as.numeric(SellerG=="Gary"),
         GJas=as.numeric(SellerG=="Jas"),
         GMiles=as.numeric(SellerG=="Miles"),
         GMcGrath=as.numeric(SellerG=="McGrath"),
         GHodges=as.numeric(SellerG=="Hodges"),
         GKay=as.numeric(SellerG=="Kay"),
         GStockdale=as.numeric(SellerG=="Stockdale"),
         GLove=as.numeric(SellerG=="Love"),
         GDouglas=as.numeric(SellerG=="Douglas"),
         GWilliams=as.numeric(SellerG=="Williams"),
         GVillage=as.numeric(SellerG=="Village"),
         GRaine=as.numeric(SellerG=="Raine"),
         GRendina=as.numeric(SellerG=="Rendina"),
         GChisholm=as.numeric(SellerG=="Chisholm"),
         GCollins=as.numeric(SellerG=="Collins"),
         GLITTLE=as.numeric(SellerG=="LITTLE"),
         GNick=as.numeric(SellerG=="Nick"),
         GHarcourts=as.numeric(SellerG=="Harcourts"),
         GCayzer=as.numeric(SellerG=="Cayzer"),
         GMoonee=as.numeric(SellerG=="Moonee"),
         GYPA=as.numeric(SellerG=="YPA")
                ) %>% 
  select(-SellerG)

Making dummies for variable CouncilArea.

glimpse(all_data)
## Observations: 9,421
## Variables: 69
## $ Rooms        <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Price        <int> 1650000, 791000, 785000, 755000, 2500000, 3020000...
## $ Distance     <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12.8,...
## $ Postcode     <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127, 3...
## $ Bedroom2     <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3, 2...
## $ Bathroom     <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1...
## $ Car          <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2, 1...
## $ Landsize     <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0, 4...
## $ BuildingArea <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80, ...
## $ YearBuilt    <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961, 1...
## $ CouncilArea  <chr> "Moreland", "Darebin", "Hobsons Bay", "", "Boroon...
## $ data         <chr> "train", "train", "train", "train", "train", "tra...
## $ sub_1        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_2        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_3        <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
## $ sub_4        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1...
## $ sub_5        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ sub_6        <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ sub_7        <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ sub_8        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_9        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_10       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_11       <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_12       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
## $ sub_13       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_14       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_15       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ sub_16       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Type_t       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ type_u       <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1...
## $ Method_PI    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0...
## $ Method_SA    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Method_SP    <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
## $ Method_VB    <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Gnelson      <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0...
## $ GJellis      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
## $ Ghstuart     <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ Gbarry       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
## $ GMarshall    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
## $ GWoodards    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GBrad        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GBiggin      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1...
## $ GRay         <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GFletchers   <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GRT          <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GSweeney     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GGreg        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GNoel        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GGary        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GJas         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GMiles       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GMcGrath     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GHodges      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GKay         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GStockdale   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GLove        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GDouglas     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GWilliams    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GVillage     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GRaine       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GRendina     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GChisholm    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GCollins     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GLITTLE      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GNick        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GHarcourts   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GCayzer      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GMoonee      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ GYPA         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
table(all_data$CouncilArea)
## 
##                     Banyule       Bayside    Boroondara      Brimbank 
##          1985           359           311           803           238 
##       Darebin     Glen Eira   Hobsons Bay          Hume      Kingston 
##           648           608           286            14            65 
##    Manningham   Maribyrnong     Melbourne        Monash Moonee Valley 
##           181           478           324           132           678 
##      Moreland  Port Phillip   Stonnington    Whitehorse         Yarra 
##           765           441           511           139           455
all_data=all_data %>%
  mutate(CA_Banyule=as.numeric(CouncilArea=="Banyule"),
         CA_Bayside=as.numeric(CouncilArea=="Bayside"),
         CA_Boroondara=as.numeric(CouncilArea=="Boroondara"),
         CA_Brimbank=as.numeric(CouncilArea=="Brimbank"),
         CA_Darebin=as.numeric(CouncilArea=="Darebin"),
         CA_Glen_Eira=as.numeric(CouncilArea=="Glen Eira"),
         CA_Monash=as.numeric(CouncilArea=="Monash"),
         CA_Melbourne=as.numeric(CouncilArea=="Melbourne"),
         CA_Maribyrnong=as.numeric(CouncilArea=="Maribyrnong"),
         CA_Manningham=as.numeric(CouncilArea=="Manningham"),
         CA_Kingston=as.numeric(CouncilArea=="Kingston"),
         CA_Hume=as.numeric(CouncilArea=="Hume"),
         CA_HobsonsB=as.numeric(CouncilArea=="Hobsons Bay"),
         CA_MoonValley=as.numeric(CouncilArea=="Moonee Valley"),
         CA_Moreland=as.numeric(CouncilArea=="Moreland"),
         CA_PortP=as.numeric(CouncilArea=="Port Phillip"),
         CA_Stonnington=as.numeric(CouncilArea=="Stonnington"),
         CA_Whitehorse=as.numeric(CouncilArea=="Whitehorse"),
         CA_Yarra=as.numeric(CouncilArea=="Yarra")) %>% 
  select(-CouncilArea)

Thus data preparation is done and we will now seperate both test n train data.

glimpse(all_data)
## Observations: 9,421
## Variables: 87
## $ Rooms          <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3,...
## $ Price          <int> 1650000, 791000, 785000, 755000, 2500000, 30200...
## $ Distance       <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12....
## $ Postcode       <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127,...
## $ Bedroom2       <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3,...
## $ Bathroom       <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1,...
## $ Car            <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2,...
## $ Landsize       <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0,...
## $ BuildingArea   <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80...
## $ YearBuilt      <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961,...
## $ data           <chr> "train", "train", "train", "train", "train", "t...
## $ sub_1          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_2          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_3          <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ sub_4          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_5          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ sub_6          <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,...
## $ sub_7          <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ sub_8          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_9          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_10         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_11         <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_12         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,...
## $ sub_13         <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_14         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_15         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_16         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Type_t         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ type_u         <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ Method_PI      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,...
## $ Method_SA      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SP      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_VB      <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gnelson        <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ GJellis        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ Ghstuart       <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gbarry         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ GMarshall      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ GWoodards      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBrad          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBiggin        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ GRay           <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GFletchers     <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRT            <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GSweeney       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGreg          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNoel          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGary          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GJas           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMiles         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMcGrath       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHodges        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GKay           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GStockdale     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLove          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GDouglas       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWilliams      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GVillage       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRaine         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRendina       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GChisholm      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCollins       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLITTLE        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNick          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHarcourts     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCayzer        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMoonee        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GYPA           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ CA_Banyule     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Bayside     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Boroondara  <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Brimbank    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Darebin     <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Glen_Eira   <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ CA_Monash      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Melbourne   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Maribyrnong <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Manningham  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Kingston    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Hume        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_HobsonsB    <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_MoonValley  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Moreland    <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,...
## $ CA_PortP       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Stonnington <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ CA_Whitehorse  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ CA_Yarra       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...

Separating test and train:

train=all_data %>% 
  filter(data=='train') %>% 
  select(-data)
#thus train has total obs as 7536 and 70 variables (69+price)

test=all_data %>% 
  filter(data=='test') %>% 
  select(-data,-Price)#thus test data has original obs 1885 and added new dummy variables totalling to 69 variables

Lets view the structure of test n train datasets:

glimpse(train) #7536 obs and 86 variables.
## Observations: 7,536
## Variables: 86
## $ Rooms          <int> 3, 5, 3, 2, 5, 3, 3, 3, 4, 2, 3, 2, 2, 2, 4, 3,...
## $ Price          <int> 1650000, 791000, 785000, 755000, 2500000, 30200...
## $ Distance       <dbl> 5.2, 11.2, 8.4, 10.7, 7.5, 7.5, 13.9, 11.2, 12....
## $ Postcode       <int> 3056, 3073, 3015, 3187, 3123, 3123, 3165, 3127,...
## $ Bedroom2       <dbl> 3, 4, 3, 3, 5, 3, 3, 3, 3, 2, 3, 2, 2, 2, 4, 3,...
## $ Bathroom       <dbl> 1, 3, 1, 1, 3, 2, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1,...
## $ Car            <dbl> 2, 1, 1, 2, 3, 2, 1, 4, 2, 2, 2, 1, 1, 1, 1, 2,...
## $ Landsize       <dbl> 495, 961, 185, 452, 757, 832, 710, 816, 452, 0,...
## $ BuildingArea   <dbl> 141, 143, 143, 143, 240, 143, 143, 143, 143, 80...
## $ YearBuilt      <dbl> 1920, 1961, 1961, 1961, 1925, 1961, 1966, 1961,...
## $ sub_1          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_2          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_3          <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ sub_4          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_5          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ sub_6          <dbl> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0,...
## $ sub_7          <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ sub_8          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_9          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_10         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_11         <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_12         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0,...
## $ sub_13         <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_14         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_15         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_16         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Type_t         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ type_u         <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,...
## $ Method_PI      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,...
## $ Method_SA      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SP      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_VB      <dbl> 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gnelson        <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ GJellis        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,...
## $ Ghstuart       <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gbarry         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,...
## $ GMarshall      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ GWoodards      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBrad          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBiggin        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ GRay           <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GFletchers     <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRT            <dbl> 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GSweeney       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGreg          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNoel          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGary          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GJas           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMiles         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMcGrath       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHodges        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GKay           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GStockdale     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLove          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GDouglas       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWilliams      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GVillage       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRaine         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRendina       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GChisholm      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCollins       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLITTLE        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNick          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHarcourts     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCayzer        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMoonee        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GYPA           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ CA_Banyule     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Bayside     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Boroondara  <dbl> 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Brimbank    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Darebin     <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Glen_Eira   <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,...
## $ CA_Monash      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Melbourne   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Maribyrnong <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Manningham  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Kingston    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Hume        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_HobsonsB    <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_MoonValley  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Moreland    <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,...
## $ CA_PortP       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Stonnington <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,...
## $ CA_Whitehorse  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,...
## $ CA_Yarra       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
glimpse(test) #1885 obs and 85 variables.
## Observations: 1,885
## Variables: 85
## $ Rooms          <int> 1, 2, 1, 4, 3, 3, 3, 1, 1, 2, 3, 1, 3, 2, 3, 3,...
## $ Distance       <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2....
## $ Postcode       <int> 3067, 3067, 3067, 3067, 3067, 3067, 3067, 3067,...
## $ Bedroom2       <dbl> 1, 3, 3, 3, 2, 3, 3, 3, 1, 2, 3, 1, 3, 3, 3, 2,...
## $ Bathroom       <dbl> 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1,...
## $ Car            <dbl> 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1,...
## $ Landsize       <dbl> 0, 461, 461, 461, 138, 0, 4290, 461, 0, 98, 461...
## $ BuildingArea   <dbl> 151, 151, 151, 151, 105, 151, 27, 151, 151, 128...
## $ YearBuilt      <dbl> 1965, 1965, 1965, 1965, 1890, 2010, 1965, 1965,...
## $ sub_1          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_2          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_3          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_4          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ sub_5          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_6          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_7          <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,...
## $ sub_8          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_9          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_10         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_11         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_12         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_13         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_14         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_15         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ sub_16         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Type_t         <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,...
## $ type_u         <dbl> 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0,...
## $ Method_PI      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SA      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Method_SP      <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0,...
## $ Method_VB      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gnelson        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,...
## $ GJellis        <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,...
## $ Ghstuart       <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ Gbarry         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMarshall      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWoodards      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GBrad          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
## $ GBiggin        <dbl> 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,...
## $ GRay           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GFletchers     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRT            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GSweeney       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGreg          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,...
## $ GNoel          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GGary          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GJas           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMiles         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMcGrath       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHodges        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GKay           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GStockdale     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLove          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GDouglas       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GWilliams      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GVillage       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRaine         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GRendina       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GChisholm      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCollins       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GLITTLE        <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GNick          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GHarcourts     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GCayzer        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GMoonee        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ GYPA           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Banyule     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Bayside     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Boroondara  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Brimbank    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Darebin     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Glen_Eira   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Monash      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Melbourne   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Maribyrnong <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Manningham  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Kingston    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Hume        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_HobsonsB    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_MoonValley  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,...
## $ CA_Moreland    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_PortP       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Stonnington <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Whitehorse  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CA_Yarra       <dbl> 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0,...

now lets divide the train dataset in the ratio 75:25.

set.seed(123)
s=sample(1:nrow(train),0.75*nrow(train))
train_75=train[s,] #5652
test_25=train[-s,]  #1884

Step 3: Model Building

We will use train_75 for linear regression model building and use train_25 to test the performance of the model thus built.

Lets build linear regression model on train_75 dataset.

library(car)
## Warning: package 'car' was built under R version 3.3.3
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
LRf=lm(Price ~ .,data=train_75)
summary(LRf)
## 
## Call:
## lm(formula = Price ~ ., data = train_75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1662340  -193462   -21807   150940  3847200 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.372e+05  7.479e+05   0.852 0.394246    
## Rooms           2.256e+05  9.758e+03  23.116  < 2e-16 ***
## Distance       -3.843e+04  2.660e+03 -14.445  < 2e-16 ***
## Postcode        1.346e+03  2.005e+02   6.714 2.08e-11 ***
## Bedroom2       -4.562e+04  1.113e+04  -4.099 4.21e-05 ***
## Bathroom        1.585e+05  1.065e+04  14.878  < 2e-16 ***
## Car             6.976e+04  7.054e+03   9.890  < 2e-16 ***
## Landsize        7.163e+00  3.886e+00   1.843 0.065310 .  
## BuildingArea    4.549e+02  6.187e+01   7.352 2.24e-13 ***
## YearBuilt      -1.719e+03  2.087e+02  -8.235  < 2e-16 ***
## sub_1          -1.088e+06  1.607e+05  -6.769 1.43e-11 ***
## sub_2          -1.026e+06  9.489e+04 -10.809  < 2e-16 ***
## sub_3          -9.449e+05  8.868e+04 -10.656  < 2e-16 ***
## sub_4          -9.259e+05  8.855e+04 -10.456  < 2e-16 ***
## sub_5          -8.619e+05  8.791e+04  -9.804  < 2e-16 ***
## sub_6          -8.078e+05  8.786e+04  -9.194  < 2e-16 ***
## sub_7          -7.538e+05  8.713e+04  -8.651  < 2e-16 ***
## sub_8          -7.713e+05  8.672e+04  -8.893  < 2e-16 ***
## sub_9          -6.876e+05  8.906e+04  -7.720 1.37e-14 ***
## sub_10         -5.407e+05  1.026e+05  -5.268 1.43e-07 ***
## sub_11         -4.600e+05  8.803e+04  -5.226 1.80e-07 ***
## sub_12         -4.807e+05  8.764e+04  -5.485 4.33e-08 ***
## sub_13         -3.529e+05  9.646e+04  -3.658 0.000256 ***
## sub_14         -3.220e+05  8.600e+04  -3.744 0.000183 ***
## sub_15         -4.508e+04  9.946e+04  -0.453 0.650437    
## sub_16         -1.503e+05  9.067e+04  -1.658 0.097411 .  
## Type_t         -2.415e+05  1.770e+04 -13.645  < 2e-16 ***
## type_u         -4.015e+05  1.515e+04 -26.509  < 2e-16 ***
## Method_PI      -1.191e+05  1.537e+04  -7.746 1.12e-14 ***
## Method_SA      -5.995e+04  8.179e+04  -0.733 0.463596    
## Method_SP      -4.376e+04  1.577e+04  -2.775 0.005544 ** 
## Method_VB      -8.951e+04  1.758e+04  -5.093 3.65e-07 ***
## Gnelson        -2.796e+04  2.077e+04  -1.346 0.178350    
## GJellis         8.341e+04  2.156e+04   3.869 0.000111 ***
## Ghstuart       -4.208e+04  2.057e+04  -2.045 0.040873 *  
## Gbarry         -2.244e+04  2.433e+04  -0.922 0.356379    
## GMarshall       2.343e+05  2.623e+04   8.931  < 2e-16 ***
## GWoodards      -1.405e+03  3.648e+04  -0.039 0.969285    
## GBrad          -1.072e+04  3.482e+04  -0.308 0.758266    
## GBiggin        -3.286e+04  3.243e+04  -1.013 0.310919    
## GRay           -3.854e+04  2.893e+04  -1.332 0.182864    
## GFletchers     -7.255e+04  3.877e+04  -1.871 0.061394 .  
## GRT             6.609e+04  3.748e+04   1.763 0.077878 .  
## GSweeney       -3.095e+04  4.384e+04  -0.706 0.480230    
## GGreg           1.225e+04  3.979e+04   0.308 0.758218    
## GNoel          -6.138e+04  4.607e+04  -1.332 0.182887    
## GGary           2.250e+04  4.305e+04   0.523 0.601335    
## GJas           -1.155e+04  4.300e+04  -0.269 0.788210    
## GMiles          5.773e+04  4.600e+04   1.255 0.209538    
## GMcGrath        8.733e+04  4.537e+04   1.925 0.054290 .  
## GHodges        -6.915e+04  5.054e+04  -1.368 0.171308    
## GKay            4.588e+05  4.982e+04   9.208  < 2e-16 ***
## GStockdale     -2.027e+04  5.267e+04  -0.385 0.700401    
## GLove          -8.090e+03  5.853e+04  -0.138 0.890067    
## GDouglas       -8.696e+04  6.364e+04  -1.366 0.171866    
## GWilliams       5.151e+04  5.514e+04   0.934 0.350244    
## GVillage        3.106e+04  5.869e+04   0.529 0.596665    
## GRaine         -1.486e+05  6.267e+04  -2.371 0.017797 *  
## GRendina       -4.041e+02  6.693e+04  -0.006 0.995183    
## GChisholm       3.283e+04  6.953e+04   0.472 0.636858    
## GCollins        2.181e+05  7.278e+04   2.996 0.002743 ** 
## GLITTLE        -1.246e+05  7.579e+04  -1.644 0.100222    
## GNick           3.151e+04  9.411e+04   0.335 0.737771    
## GHarcourts     -1.778e+04  7.775e+04  -0.229 0.819166    
## GCayzer        -5.045e+04  7.335e+04  -0.688 0.491568    
## GMoonee        -1.097e+05  8.480e+04  -1.294 0.195805    
## GYPA           -1.072e+05  9.894e+04  -1.083 0.278712    
## CA_Banyule     -1.345e+05  3.341e+04  -4.026 5.75e-05 ***
## CA_Bayside     -1.619e+05  4.886e+04  -3.314 0.000926 ***
## CA_Boroondara  -9.435e+04  2.701e+04  -3.494 0.000480 ***
## CA_Brimbank    -5.146e+04  4.044e+04  -1.272 0.203279    
## CA_Darebin     -8.003e+04  2.555e+04  -3.132 0.001742 ** 
## CA_Glen_Eira   -2.365e+04  2.909e+04  -0.813 0.416168    
## CA_Monash      -4.768e+04  4.569e+04  -1.044 0.296687    
## CA_Melbourne   -9.328e+03  3.490e+04  -0.267 0.789258    
## CA_Maribyrnong -9.527e+04  3.341e+04  -2.852 0.004367 ** 
## CA_Manningham  -9.730e+04  4.275e+04  -2.276 0.022875 *  
## CA_Kingston    -1.910e+05  6.450e+04  -2.961 0.003084 ** 
## CA_Hume        -9.056e+04  1.809e+05  -0.501 0.616592    
## CA_HobsonsB    -6.407e+04  3.956e+04  -1.619 0.105404    
## CA_MoonValley  -5.413e+04  2.719e+04  -1.991 0.046514 *  
## CA_Moreland    -1.158e+05  2.443e+04  -4.739 2.20e-06 ***
## CA_PortP       -9.684e+04  3.581e+04  -2.705 0.006860 ** 
## CA_Stonnington -6.407e+04  2.982e+04  -2.149 0.031700 *  
## CA_Whitehorse  -2.575e+04  4.507e+04  -0.571 0.567739    
## CA_Yarra       -1.164e+05  3.162e+04  -3.680 0.000236 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 370000 on 5566 degrees of freedom
## Multiple R-squared:  0.6914, Adjusted R-squared:  0.6866 
## F-statistic: 146.7 on 85 and 5566 DF,  p-value: < 2.2e-16

In order to take care of multi collinearity,we remove variables whose VIF>5,as follows:

a=vif(LRf)
sort(a,decreasing = T)[1:3]
##    sub_3    sub_7    sub_8 
## 36.00506 35.58604 32.98879

Removing variable sub_3,Postcode

LRf=lm(Price ~ .-Postcode-sub_3,data=train_75)
summary(LRf)
## 
## Call:
## lm(formula = Price ~ . - Postcode - sub_3, data = train_75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1674852  -198669   -26909   150628  3836158 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     3.866e+06  4.137e+05   9.345  < 2e-16 ***
## Rooms           2.277e+05  9.893e+03  23.013  < 2e-16 ***
## Distance       -3.406e+04  2.593e+03 -13.133  < 2e-16 ***
## Bedroom2       -4.694e+04  1.129e+04  -4.156 3.29e-05 ***
## Bathroom        1.564e+05  1.082e+04  14.458  < 2e-16 ***
## Car             7.201e+04  7.163e+03  10.053  < 2e-16 ***
## Landsize        6.351e+00  3.946e+00   1.609 0.107598    
## BuildingArea    4.603e+02  6.284e+01   7.326 2.72e-13 ***
## YearBuilt      -1.742e+03  2.120e+02  -8.217 2.57e-16 ***
## sub_1          -2.075e+05  1.374e+05  -1.511 0.130859    
## sub_2          -1.179e+05  3.799e+04  -3.103 0.001924 ** 
## sub_4          -6.069e+03  2.328e+04  -0.261 0.794341    
## sub_5           8.810e+04  2.245e+04   3.924 8.82e-05 ***
## sub_6           1.423e+05  2.396e+04   5.939 3.05e-09 ***
## sub_7           2.094e+05  2.452e+04   8.541  < 2e-16 ***
## sub_8           1.950e+05  2.572e+04   7.581 3.99e-14 ***
## sub_9           2.950e+05  2.927e+04  10.077  < 2e-16 ***
## sub_10          3.789e+05  5.808e+04   6.524 7.43e-11 ***
## sub_11          4.560e+05  3.622e+04  12.592  < 2e-16 ***
## sub_12          4.518e+05  3.545e+04  12.744  < 2e-16 ***
## sub_13          6.588e+05  4.791e+04  13.749  < 2e-16 ***
## sub_14          5.878e+05  3.265e+04  18.005  < 2e-16 ***
## sub_15          9.648e+05  5.414e+04  17.820  < 2e-16 ***
## sub_16          7.646e+05  4.175e+04  18.315  < 2e-16 ***
## Type_t         -2.417e+05  1.797e+04 -13.448  < 2e-16 ***
## type_u         -3.962e+05  1.536e+04 -25.786  < 2e-16 ***
## Method_PI      -1.226e+05  1.561e+04  -7.852 4.89e-15 ***
## Method_SA      -6.103e+04  8.307e+04  -0.735 0.462539    
## Method_SP      -5.110e+04  1.600e+04  -3.194 0.001409 ** 
## Method_VB      -9.258e+04  1.785e+04  -5.188 2.20e-07 ***
## Gnelson        -5.174e+04  2.086e+04  -2.480 0.013163 *  
## GJellis         9.423e+04  2.188e+04   4.307 1.68e-05 ***
## Ghstuart       -3.412e+04  2.087e+04  -1.635 0.102088    
## Gbarry         -4.847e+04  2.458e+04  -1.972 0.048676 *  
## GMarshall       2.669e+05  2.649e+04  10.075  < 2e-16 ***
## GWoodards       6.422e+03  3.704e+04   0.173 0.862365    
## GBrad          -3.629e+04  3.522e+04  -1.030 0.302837    
## GBiggin        -1.909e+04  3.285e+04  -0.581 0.561153    
## GRay           -4.027e+04  2.936e+04  -1.372 0.170205    
## GFletchers     -3.266e+04  3.920e+04  -0.833 0.404799    
## GRT             9.239e+04  3.801e+04   2.431 0.015107 *  
## GSweeney       -5.954e+04  4.442e+04  -1.341 0.180107    
## GGreg           8.703e+03  4.041e+04   0.215 0.829491    
## GNoel          -4.855e+04  4.678e+04  -1.038 0.299470    
## GGary           3.866e+04  4.361e+04   0.887 0.375343    
## GJas           -3.798e+04  4.351e+04  -0.873 0.382775    
## GMiles          4.045e+04  4.653e+04   0.869 0.384626    
## GMcGrath        1.018e+05  4.604e+04   2.210 0.027112 *  
## GHodges        -5.632e+04  5.130e+04  -1.098 0.272371    
## GKay            4.870e+05  5.055e+04   9.633  < 2e-16 ***
## GStockdale     -6.112e+04  5.340e+04  -1.145 0.252404    
## GLove          -2.331e+04  5.944e+04  -0.392 0.694925    
## GDouglas       -1.426e+05  6.439e+04  -2.214 0.026859 *  
## GWilliams       6.003e+04  5.600e+04   1.072 0.283760    
## GVillage        2.473e+03  5.941e+04   0.042 0.966800    
## GRaine         -1.677e+05  6.362e+04  -2.636 0.008422 ** 
## GRendina       -3.314e+04  6.782e+04  -0.489 0.625105    
## GChisholm       6.223e+04  7.054e+04   0.882 0.377668    
## GCollins        1.963e+05  7.381e+04   2.660 0.007835 ** 
## GLITTLE        -1.289e+05  7.698e+04  -1.674 0.094166 .  
## GNick           5.440e+04  9.555e+04   0.569 0.569162    
## GHarcourts     -3.790e+04  7.893e+04  -0.480 0.631142    
## GCayzer        -2.627e+04  7.445e+04  -0.353 0.724150    
## GMoonee        -1.301e+05  8.611e+04  -1.510 0.131029    
## GYPA           -1.412e+05  1.005e+05  -1.406 0.159908    
## CA_Banyule     -1.519e+05  3.391e+04  -4.480 7.62e-06 ***
## CA_Bayside     -1.310e+05  4.941e+04  -2.652 0.008013 ** 
## CA_Boroondara  -2.486e+04  2.688e+04  -0.925 0.355181    
## CA_Brimbank    -1.321e+05  4.013e+04  -3.291 0.001004 ** 
## CA_Darebin     -1.115e+05  2.580e+04  -4.322 1.57e-05 ***
## CA_Glen_Eira    3.886e+04  2.824e+04   1.376 0.168950    
## CA_Monash      -4.451e+03  4.602e+04  -0.097 0.922957    
## CA_Melbourne   -5.553e+04  3.514e+04  -1.580 0.114066    
## CA_Maribyrnong -1.804e+05  3.243e+04  -5.562 2.79e-08 ***
## CA_Manningham  -1.255e+05  4.304e+04  -2.917 0.003552 ** 
## CA_Kingston    -1.064e+05  6.425e+04  -1.655 0.097912 .  
## CA_Hume        -1.012e+05  1.837e+05  -0.551 0.581728    
## CA_HobsonsB    -1.582e+05  3.810e+04  -4.150 3.37e-05 ***
## CA_MoonValley  -1.191e+05  2.565e+04  -4.644 3.50e-06 ***
## CA_Moreland    -1.515e+05  2.456e+04  -6.169 7.35e-10 ***
## CA_PortP        2.208e+04  3.123e+04   0.707 0.479515    
## CA_Stonnington  1.269e+04  2.892e+04   0.439 0.660879    
## CA_Whitehorse  -1.870e+04  4.575e+04  -0.409 0.682699    
## CA_Yarra       -1.139e+05  3.211e+04  -3.548 0.000392 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 375800 on 5568 degrees of freedom
## Multiple R-squared:  0.6815, Adjusted R-squared:  0.6767 
## F-statistic: 143.5 on 83 and 5568 DF,  p-value: < 2.2e-16
a=vif(LRf)
sort(a,decreasing = T)[1:3]
## Bedroom2    Rooms Distance 
## 3.770476 3.705437 3.549597
summary(LRf)
## 
## Call:
## lm(formula = Price ~ . - Postcode - sub_3, data = train_75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1674852  -198669   -26909   150628  3836158 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     3.866e+06  4.137e+05   9.345  < 2e-16 ***
## Rooms           2.277e+05  9.893e+03  23.013  < 2e-16 ***
## Distance       -3.406e+04  2.593e+03 -13.133  < 2e-16 ***
## Bedroom2       -4.694e+04  1.129e+04  -4.156 3.29e-05 ***
## Bathroom        1.564e+05  1.082e+04  14.458  < 2e-16 ***
## Car             7.201e+04  7.163e+03  10.053  < 2e-16 ***
## Landsize        6.351e+00  3.946e+00   1.609 0.107598    
## BuildingArea    4.603e+02  6.284e+01   7.326 2.72e-13 ***
## YearBuilt      -1.742e+03  2.120e+02  -8.217 2.57e-16 ***
## sub_1          -2.075e+05  1.374e+05  -1.511 0.130859    
## sub_2          -1.179e+05  3.799e+04  -3.103 0.001924 ** 
## sub_4          -6.069e+03  2.328e+04  -0.261 0.794341    
## sub_5           8.810e+04  2.245e+04   3.924 8.82e-05 ***
## sub_6           1.423e+05  2.396e+04   5.939 3.05e-09 ***
## sub_7           2.094e+05  2.452e+04   8.541  < 2e-16 ***
## sub_8           1.950e+05  2.572e+04   7.581 3.99e-14 ***
## sub_9           2.950e+05  2.927e+04  10.077  < 2e-16 ***
## sub_10          3.789e+05  5.808e+04   6.524 7.43e-11 ***
## sub_11          4.560e+05  3.622e+04  12.592  < 2e-16 ***
## sub_12          4.518e+05  3.545e+04  12.744  < 2e-16 ***
## sub_13          6.588e+05  4.791e+04  13.749  < 2e-16 ***
## sub_14          5.878e+05  3.265e+04  18.005  < 2e-16 ***
## sub_15          9.648e+05  5.414e+04  17.820  < 2e-16 ***
## sub_16          7.646e+05  4.175e+04  18.315  < 2e-16 ***
## Type_t         -2.417e+05  1.797e+04 -13.448  < 2e-16 ***
## type_u         -3.962e+05  1.536e+04 -25.786  < 2e-16 ***
## Method_PI      -1.226e+05  1.561e+04  -7.852 4.89e-15 ***
## Method_SA      -6.103e+04  8.307e+04  -0.735 0.462539    
## Method_SP      -5.110e+04  1.600e+04  -3.194 0.001409 ** 
## Method_VB      -9.258e+04  1.785e+04  -5.188 2.20e-07 ***
## Gnelson        -5.174e+04  2.086e+04  -2.480 0.013163 *  
## GJellis         9.423e+04  2.188e+04   4.307 1.68e-05 ***
## Ghstuart       -3.412e+04  2.087e+04  -1.635 0.102088    
## Gbarry         -4.847e+04  2.458e+04  -1.972 0.048676 *  
## GMarshall       2.669e+05  2.649e+04  10.075  < 2e-16 ***
## GWoodards       6.422e+03  3.704e+04   0.173 0.862365    
## GBrad          -3.629e+04  3.522e+04  -1.030 0.302837    
## GBiggin        -1.909e+04  3.285e+04  -0.581 0.561153    
## GRay           -4.027e+04  2.936e+04  -1.372 0.170205    
## GFletchers     -3.266e+04  3.920e+04  -0.833 0.404799    
## GRT             9.239e+04  3.801e+04   2.431 0.015107 *  
## GSweeney       -5.954e+04  4.442e+04  -1.341 0.180107    
## GGreg           8.703e+03  4.041e+04   0.215 0.829491    
## GNoel          -4.855e+04  4.678e+04  -1.038 0.299470    
## GGary           3.866e+04  4.361e+04   0.887 0.375343    
## GJas           -3.798e+04  4.351e+04  -0.873 0.382775    
## GMiles          4.045e+04  4.653e+04   0.869 0.384626    
## GMcGrath        1.018e+05  4.604e+04   2.210 0.027112 *  
## GHodges        -5.632e+04  5.130e+04  -1.098 0.272371    
## GKay            4.870e+05  5.055e+04   9.633  < 2e-16 ***
## GStockdale     -6.112e+04  5.340e+04  -1.145 0.252404    
## GLove          -2.331e+04  5.944e+04  -0.392 0.694925    
## GDouglas       -1.426e+05  6.439e+04  -2.214 0.026859 *  
## GWilliams       6.003e+04  5.600e+04   1.072 0.283760    
## GVillage        2.473e+03  5.941e+04   0.042 0.966800    
## GRaine         -1.677e+05  6.362e+04  -2.636 0.008422 ** 
## GRendina       -3.314e+04  6.782e+04  -0.489 0.625105    
## GChisholm       6.223e+04  7.054e+04   0.882 0.377668    
## GCollins        1.963e+05  7.381e+04   2.660 0.007835 ** 
## GLITTLE        -1.289e+05  7.698e+04  -1.674 0.094166 .  
## GNick           5.440e+04  9.555e+04   0.569 0.569162    
## GHarcourts     -3.790e+04  7.893e+04  -0.480 0.631142    
## GCayzer        -2.627e+04  7.445e+04  -0.353 0.724150    
## GMoonee        -1.301e+05  8.611e+04  -1.510 0.131029    
## GYPA           -1.412e+05  1.005e+05  -1.406 0.159908    
## CA_Banyule     -1.519e+05  3.391e+04  -4.480 7.62e-06 ***
## CA_Bayside     -1.310e+05  4.941e+04  -2.652 0.008013 ** 
## CA_Boroondara  -2.486e+04  2.688e+04  -0.925 0.355181    
## CA_Brimbank    -1.321e+05  4.013e+04  -3.291 0.001004 ** 
## CA_Darebin     -1.115e+05  2.580e+04  -4.322 1.57e-05 ***
## CA_Glen_Eira    3.886e+04  2.824e+04   1.376 0.168950    
## CA_Monash      -4.451e+03  4.602e+04  -0.097 0.922957    
## CA_Melbourne   -5.553e+04  3.514e+04  -1.580 0.114066    
## CA_Maribyrnong -1.804e+05  3.243e+04  -5.562 2.79e-08 ***
## CA_Manningham  -1.255e+05  4.304e+04  -2.917 0.003552 ** 
## CA_Kingston    -1.064e+05  6.425e+04  -1.655 0.097912 .  
## CA_Hume        -1.012e+05  1.837e+05  -0.551 0.581728    
## CA_HobsonsB    -1.582e+05  3.810e+04  -4.150 3.37e-05 ***
## CA_MoonValley  -1.191e+05  2.565e+04  -4.644 3.50e-06 ***
## CA_Moreland    -1.515e+05  2.456e+04  -6.169 7.35e-10 ***
## CA_PortP        2.208e+04  3.123e+04   0.707 0.479515    
## CA_Stonnington  1.269e+04  2.892e+04   0.439 0.660879    
## CA_Whitehorse  -1.870e+04  4.575e+04  -0.409 0.682699    
## CA_Yarra       -1.139e+05  3.211e+04  -3.548 0.000392 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 375800 on 5568 degrees of freedom
## Multiple R-squared:  0.6815, Adjusted R-squared:  0.6767 
## F-statistic: 143.5 on 83 and 5568 DF,  p-value: < 2.2e-16

Now removing all variables whose p value is >0.05 one by one.

LRf=lm(Price ~ .-Landsize-GRaine-GMoonee-CA_Bayside-GLITTLE-Gnelson-GSweeney-Ghstuart-CA_Kingston-Gbarry-GRay-GStockdale-GNoel-GJas-GBiggin-GYPA-CA_PortP-CA_Whitehorse-GRendina-GFletchers-GBrad-GHodges-GVillage-GLove-sub_4-GGary-CA_Hume-CA_Boroondara-Method_SA-GWilliams-GHarcourts-GNick-GGreg-CA_Monash-GWoodards-CA_Stonnington-GCayzer-Postcode-sub_3,data=train_75)
summary(LRf)
## 
## Call:
## lm(formula = Price ~ . - Landsize - GRaine - GMoonee - CA_Bayside - 
##     GLITTLE - Gnelson - GSweeney - Ghstuart - CA_Kingston - Gbarry - 
##     GRay - GStockdale - GNoel - GJas - GBiggin - GYPA - CA_PortP - 
##     CA_Whitehorse - GRendina - GFletchers - GBrad - GHodges - 
##     GVillage - GLove - sub_4 - GGary - CA_Hume - CA_Boroondara - 
##     Method_SA - GWilliams - GHarcourts - GNick - GGreg - CA_Monash - 
##     GWoodards - CA_Stonnington - GCayzer - Postcode - sub_3, 
##     data = train_75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1656253  -198103   -26651   152266  3860482 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3895526.2   410315.4   9.494  < 2e-16 ***
## Rooms           230453.8     9817.6  23.474  < 2e-16 ***
## Distance        -36543.2     2177.0 -16.786  < 2e-16 ***
## Bedroom2        -49181.9    10962.2  -4.486 7.39e-06 ***
## Bathroom        152338.3     9796.0  15.551  < 2e-16 ***
## Car              71883.3     6932.3  10.369  < 2e-16 ***
## BuildingArea       456.2       62.7   7.276 3.90e-13 ***
## YearBuilt        -1758.6      210.9  -8.339  < 2e-16 ***
## sub_1          -273475.0   105758.2  -2.586  0.00974 ** 
## sub_2          -112772.3    36858.2  -3.060  0.00223 ** 
## sub_5            94069.4    19621.4   4.794 1.68e-06 ***
## sub_6           144194.2    19408.8   7.429 1.26e-13 ***
## sub_7           216978.0    20449.8  10.610  < 2e-16 ***
## sub_8           196349.4    21932.0   8.953  < 2e-16 ***
## sub_9           299226.1    27050.0  11.062  < 2e-16 ***
## sub_10          377663.0    55937.6   6.752 1.61e-11 ***
## sub_11          456073.2    31635.9  14.416  < 2e-16 ***
## sub_12          450564.6    31299.9  14.395  < 2e-16 ***
## sub_13          597071.7    34730.8  17.191  < 2e-16 ***
## sub_14          580413.7    27760.1  20.908  < 2e-16 ***
## sub_15          898556.4    40814.8  22.015  < 2e-16 ***
## sub_16          768474.4    38796.4  19.808  < 2e-16 ***
## Type_t         -241570.0    17712.5 -13.638  < 2e-16 ***
## type_u         -394543.4    15236.5 -25.895  < 2e-16 ***
## Method_PI      -122614.8    15513.2  -7.904 3.23e-15 ***
## Method_SP       -49640.2    15859.5  -3.130  0.00176 ** 
## Method_VB       -90027.6    17697.0  -5.087 3.75e-07 ***
## GJellis         119215.4    18021.3   6.615 4.05e-11 ***
## GMarshall       291140.0    23689.8  12.290  < 2e-16 ***
## GRT             112799.9    36148.7   3.120  0.00181 ** 
## GMiles           74389.9    44864.5   1.658  0.09735 .  
## GMcGrath        128966.3    44398.0   2.905  0.00369 ** 
## GKay            515973.8    48812.4  10.571  < 2e-16 ***
## GDouglas       -104470.2    62903.4  -1.661  0.09681 .  
## GChisholm       102832.9    67807.5   1.517  0.12944    
## GCollins        223817.5    72596.2   3.083  0.00206 ** 
## CA_Banyule     -149576.7    32588.6  -4.590 4.53e-06 ***
## CA_Brimbank    -127545.6    38862.2  -3.282  0.00104 ** 
## CA_Darebin     -121975.2    22895.1  -5.328 1.03e-07 ***
## CA_Glen_Eira     63622.7    24966.5   2.548  0.01085 *  
## CA_Melbourne    -66990.4    31641.0  -2.117  0.03429 *  
## CA_Maribyrnong -182664.9    26894.4  -6.792 1.22e-11 ***
## CA_Manningham  -116875.2    40860.4  -2.860  0.00425 ** 
## CA_HobsonsB    -138397.7    33563.2  -4.123 3.79e-05 ***
## CA_MoonValley  -136609.5    21863.4  -6.248 4.45e-10 ***
## CA_Moreland    -167533.8    22109.9  -7.577 4.10e-14 ***
## CA_Yarra       -130267.9    28117.7  -4.633 3.69e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 376000 on 5605 degrees of freedom
## Multiple R-squared:  0.6791, Adjusted R-squared:  0.6764 
## F-statistic: 257.8 on 46 and 5605 DF,  p-value: < 2.2e-16

Thus linear regression model is successfully built.

Step 4.Perfomance measurement of the model.

Lets check the performance of the model on test_25 and calculate RMSE

PP_test_25=predict(LRf,newdata =test_25)
PP_test_25=round(PP_test_25,1)
class(PP_test_25)
## [1] "numeric"

PP_test_25 contains the predicted price values for corresponding observations based on the model LRF. ####Calculating the RMSE and Plotting the graph.

#lets plot the real price vs predicted price for dataset test_25:
plot(test_25$Price,PP_test_25)

res=test_25$Price-PP_test_25 #(real value-predicted value)
#root mean square error is as follows
RMSE_test_25=sqrt(mean(res^2))
RMSE_test_25
## [1] 387505.6
#the passing criteria mentioned,was to have value >0.5 which we have successfuly crossed.Hence our model is good.
212467/RMSE_test_25
## [1] 0.548294

lets create diagonostic plots for linearity:

library(ggplot2)
d=data.frame(real=test_25$Price,predicted=PP_test_25)
ggplot(d,aes(x=real,y=predicted))+geom_point()

plot(LRf,which = 1) #gives residual vz fitted plot

plot(LRf,which = 2) #gives q-q-plot

plot(LRf,which = 3) #gives scale-location plot

plot(LRf,which = 4) #gives cooks distance

STEP 5:Predicting Real Estate Prices for the final Test Dataset.

As per the model thus built,we can now predict Prices for the Test dataset as follows:

PP_test_final=predict(LRf,newdata =test)
PP_test_final=round(PP_test_final,1)
class(PP_test_final)
## [1] "numeric"
write.csv(PP_test_final, "PP_test_final.csv")#stores the predicted prices in a csv file on your local repository in pc.
summary(LRf)
## 
## Call:
## lm(formula = Price ~ . - Landsize - GRaine - GMoonee - CA_Bayside - 
##     GLITTLE - Gnelson - GSweeney - Ghstuart - CA_Kingston - Gbarry - 
##     GRay - GStockdale - GNoel - GJas - GBiggin - GYPA - CA_PortP - 
##     CA_Whitehorse - GRendina - GFletchers - GBrad - GHodges - 
##     GVillage - GLove - sub_4 - GGary - CA_Hume - CA_Boroondara - 
##     Method_SA - GWilliams - GHarcourts - GNick - GGreg - CA_Monash - 
##     GWoodards - CA_Stonnington - GCayzer - Postcode - sub_3, 
##     data = train_75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1656253  -198103   -26651   152266  3860482 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3895526.2   410315.4   9.494  < 2e-16 ***
## Rooms           230453.8     9817.6  23.474  < 2e-16 ***
## Distance        -36543.2     2177.0 -16.786  < 2e-16 ***
## Bedroom2        -49181.9    10962.2  -4.486 7.39e-06 ***
## Bathroom        152338.3     9796.0  15.551  < 2e-16 ***
## Car              71883.3     6932.3  10.369  < 2e-16 ***
## BuildingArea       456.2       62.7   7.276 3.90e-13 ***
## YearBuilt        -1758.6      210.9  -8.339  < 2e-16 ***
## sub_1          -273475.0   105758.2  -2.586  0.00974 ** 
## sub_2          -112772.3    36858.2  -3.060  0.00223 ** 
## sub_5            94069.4    19621.4   4.794 1.68e-06 ***
## sub_6           144194.2    19408.8   7.429 1.26e-13 ***
## sub_7           216978.0    20449.8  10.610  < 2e-16 ***
## sub_8           196349.4    21932.0   8.953  < 2e-16 ***
## sub_9           299226.1    27050.0  11.062  < 2e-16 ***
## sub_10          377663.0    55937.6   6.752 1.61e-11 ***
## sub_11          456073.2    31635.9  14.416  < 2e-16 ***
## sub_12          450564.6    31299.9  14.395  < 2e-16 ***
## sub_13          597071.7    34730.8  17.191  < 2e-16 ***
## sub_14          580413.7    27760.1  20.908  < 2e-16 ***
## sub_15          898556.4    40814.8  22.015  < 2e-16 ***
## sub_16          768474.4    38796.4  19.808  < 2e-16 ***
## Type_t         -241570.0    17712.5 -13.638  < 2e-16 ***
## type_u         -394543.4    15236.5 -25.895  < 2e-16 ***
## Method_PI      -122614.8    15513.2  -7.904 3.23e-15 ***
## Method_SP       -49640.2    15859.5  -3.130  0.00176 ** 
## Method_VB       -90027.6    17697.0  -5.087 3.75e-07 ***
## GJellis         119215.4    18021.3   6.615 4.05e-11 ***
## GMarshall       291140.0    23689.8  12.290  < 2e-16 ***
## GRT             112799.9    36148.7   3.120  0.00181 ** 
## GMiles           74389.9    44864.5   1.658  0.09735 .  
## GMcGrath        128966.3    44398.0   2.905  0.00369 ** 
## GKay            515973.8    48812.4  10.571  < 2e-16 ***
## GDouglas       -104470.2    62903.4  -1.661  0.09681 .  
## GChisholm       102832.9    67807.5   1.517  0.12944    
## GCollins        223817.5    72596.2   3.083  0.00206 ** 
## CA_Banyule     -149576.7    32588.6  -4.590 4.53e-06 ***
## CA_Brimbank    -127545.6    38862.2  -3.282  0.00104 ** 
## CA_Darebin     -121975.2    22895.1  -5.328 1.03e-07 ***
## CA_Glen_Eira     63622.7    24966.5   2.548  0.01085 *  
## CA_Melbourne    -66990.4    31641.0  -2.117  0.03429 *  
## CA_Maribyrnong -182664.9    26894.4  -6.792 1.22e-11 ***
## CA_Manningham  -116875.2    40860.4  -2.860  0.00425 ** 
## CA_HobsonsB    -138397.7    33563.2  -4.123 3.79e-05 ***
## CA_MoonValley  -136609.5    21863.4  -6.248 4.45e-10 ***
## CA_Moreland    -167533.8    22109.9  -7.577 4.10e-14 ***
## CA_Yarra       -130267.9    28117.7  -4.633 3.69e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 376000 on 5605 degrees of freedom
## Multiple R-squared:  0.6791, Adjusted R-squared:  0.6764 
## F-statistic: 257.8 on 46 and 5605 DF,  p-value: < 2.2e-16

Conclusion:

Real estate price prediction was done successfully using linear regression model having Adjusted R-square:0.6764.