31/01/2020

Introduction

  • Kaggle is an online community of data scientists and machine learning practitioners that contains different datasets that can be used for training and testing algorithms for Data Analysis
  • melb_data.csv is a free available database, from Kaggle, containing information about Melbourne, Australia houses market
  • The goal of this project is to plot, on the city map, some information about 1000 sampled data from the dataset
  • The map can be used as basis for futher analysis

Dataset Information

  • The dataset size is:
## [1] 18396    22
  • And contains the following information
##  [1] "X"             "Suburb"        "Address"       "Rooms"        
##  [5] "Type"          "Price"         "Method"        "SellerG"      
##  [9] "Date"          "Distance"      "Postcode"      "Bedroom2"     
## [13] "Bathroom"      "Car"           "Landsize"      "BuildingArea" 
## [17] "YearBuilt"     "CouncilArea"   "Lattitude"     "Longtitude"   
## [21] "Regionname"    "Propertycount"

Dataset Transformation

  • The dataset has been reduced by deleting all rows that do not contain information about the house longitude and latitude
  • A further variable has been also added to split the dataset in 4 different prices categories:
    • Low Price (up to 640000$)

    • Medium Low Price (from 640000$ to 900000$)

    • Medium High Price (from 900000$ to 1320000$)

    • High Price (from 1320000$ to 9000000$)

  • A sample of 1000 houses has been extracted from the original database in order to plot the information on the map

House Pricing Map

Acknowledgment