class: center, middle, inverse, title-slide # Singapore Flat Resale Price Prediction Model - A Data Science Approach ## WQD7001 Principles of Data Science Group Project ### Amy Lang S2127213,Ching Peng Liaw S2038321, Yong Kok Khuen S17147279, Yeoh Li Tian S2120306, Wong Wei Wen S2121928 ### Universiti Malaya ### 2022/1/18 --- # Introduction and Background .pull-left[  According to Urban Redevelopment Authority, SG residential property recorded highest annual growth of 10.6% since 2010, fueled by macroeconomic factors and pandemic hit. ] .pull-right[  Taking a look back at Malaysia, housing affordability has been always a growing concern as it significantly __influences socioeconomic health and well-being__, however, due to the limitations of data availability for local property market, we have opted out to SG data as a starting point. ] --- # Research Questions and Objectives __Problem Statement__ 1. What are the factors affecting SG flat resale price? 2. What are the prices of SG flat available for sale according to historical transacted data with the selected different features? 3. What are the statistical relationship between the influencing factors and resale price? __Domain: Property__ Motivation & Objectives: - To empower investors with __data-driven approach__ in property selection process for maximal return on investment - To enable people in buying their home with __proper budget planning__ and affordability __Data Source: Singapore Housing and Development Board__ --- # Visualization of Trend and HeatMap This visualization part answer Q1, which is "What are the factors affecting SG flat resale price?"   --- # Prediction Visualized This visualization answers Q2, which is "What are the prices of SG flat available for sale according to historical transacted data with the selected different features?" .pull-left[ __Investors or house buyers can screen through criteria such as__ - Region - Presence of commercial centers - Features of the property itself  ] .pull-right[ __The prediction results shall display predicted current price__  ] --- # Comparison between ML Predictive Models We have came out with a list of predictive models, and we __selected the final one using random forest__ based on its better accuracy as denoted by the below metrics of lower MAE, MSE, RMSE and and stronger correlation R^2 | ML Predictive Models | MAE | MSE |RMSE |R^2 | | ----------------------|:----------:|:----------:|:----------:|----------:| | Linear Regression | 64959.65 | 6760393066 | 82221.61 | 0.74333811| | Polynomial Regression | 60087.26 | 5722711681 | 75648.61 | 0.7827342 | | Lasso Regression | 64945.49 | 6804664478 | 82490.39 | 0.7416573 | | Ridge Regression | 64542.10 | 6882723777 | 82962.18 | 0.7386937 | | Elastic Net Regression| 64738.73 | 6753693720 | 82180.86 | 0.7435924 | | Decision Tree | 73261.68 | 9652784300 | 98248.58 | 0.6335269 | | __Random Forest__ | __48536.04__ | __3647279626__ | __60392.71__ | __0.8615291__ | | Support Vector | 60440.25 | 5676513252 | 75342.64 | 0.7844881 | --- # Correlation Analytics of Features This analysis answers Q3, which is "What are the statistical relationship between the influencing factors and resale price?" ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## street_name 555 1.726e+15 3.109e+12 251.4 <2e-16 *** ## Residuals 151899 1.879e+15 1.237e+10 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## flat_type 6 1.490e+15 2.483e+14 17903 <2e-16 *** ## Residuals 152448 2.115e+15 1.387e+10 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## storey_range 16 5.709e+14 3.568e+13 1793 <2e-16 *** ## Residuals 152438 3.034e+15 1.990e+10 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## flat_model 19 1.226e+15 6.452e+13 4135 <2e-16 *** ## Residuals 152435 2.379e+15 1.560e+10 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## region 4 3.545e+14 8.862e+13 4157 <2e-16 *** ## Residuals 152450 3.250e+15 2.132e+10 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ``` ## Loading required package: carData ``` ``` ## Anova Table (Type III tests) ## ## Response: resale_price ## Sum Sq Df F value Pr(>F) ## (Intercept) 1.8670e+12 1 138.4 < 2.2e-16 *** ## region 3.2232e+14 4 5973.2 < 2.2e-16 *** ## flat_model 1.1937e+15 19 4657.3 < 2.2e-16 *** ## Residuals 2.0563e+15 152431 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` __The ANOVA and ANCOVA result serve to inform on whether are there significant mean difference between variables of__ - flat type - flat model - storey range - street name - region This will give us better confidence and decision-making during dropping/grouping these variable data. As denoted by the *** symbol on the Pr(>F) column, there __is significant mean difference for all variables.__ --- # Open for Q&A and more! - [_"Check your predicted SG property price through ShinyAPP here"_: SG Flat Resale Price Predictor](https://ltyeoh.shinyapps.io/sg_flat_resale_price/) - [_"Our GitHUB repo here"_: GITHUB](https://github.com/yongkokkhuen/pds-group-project) .center[  ]