Created by James Lim - 2016-03-27

Introduction

This paper attempts to determine the factors / variables that may impact the PROFITS/LOSS of a sales of a Singapore property. The data source are from here

Questions a typical property buyer or seller will ask for PROFITS are

  1. Was it the District that can determine the highest profits ? Or was it the type of dwellings that make the highest profits ?
  2. Was it the Holding Period, ie the longer the property was held, the high chances of higher profits ?
  3. Was it timing of BUY and SELL DATE that makes highest profts ?
  4. Or was it a combination of the above variables that can generate the highest profits ?

If you are given the set of data below, how do you know which variables impact the profits ?

Statisically we will say PROFITS is the response variable, while the others are independent variables .

Algorithm 1 : Boruta (for data scientist only)

## Warning in TentativeRoughFix(boruta.train): There are no Tentative attributes! Returning original
## object.
##                 meanImp medianImp    minImp    maxImp  normHits  decision
## TYPE           9.679534  9.626888  7.832471 11.130314 1.0000000 Confirmed
## DISTRICT       2.966274  2.850645  1.742487  4.477639 0.9285714 Confirmed
## AREA          12.340088 12.296343 10.605783 14.337192 1.0000000 Confirmed
## SALES         13.832120 13.839041 12.245022 16.109153 1.0000000 Confirmed
## PURCHASE       5.699052  5.725489  3.490076  7.532352 1.0000000 Confirmed
## HOLDING_YEARS 12.099892 12.443896  9.180466 13.498201 1.0000000 Confirmed

You can see the DECISION which variables are important !

Blue boxplots correspond to minimal, average and maximum Z score of a shadow attribute.

Red represent Z scores of rejected attributes.

yellow represent Z scores of tentative attributes.

Green represent Z scores of confirmed attributes.

Algorithm 2 : RFE (Recusive Feature Elimination) to determine variables (for data scientist)

Layman no need to bother what is RFE, you only want to be proven that algoritm 1 is correct or similar.

## [1] "SALES"         "HOLDING_YEARS" "AREA"          "TYPE"          "PURCHASE"      "DISTRICT"

Graphical Relationships