A Comprehensive Analysis and Predictive Modeling of Condominium Prices in Malaysia


Group Member’s Name Student ID Role
YANG HUILAN S2171624 Data Pre-Processing
ALVIN CHUA CHEE SIANG 22094960 Exploratory Data Analysis
CUI SHIYU 22086544 Data Modelling
ZAHRA SYAHIDA BINTI EHWAN 22092140 Data Evaluation
LIANG RUIJIE 22100508 Data Interpretation

Background Introduction

Our team has been has been assign with a project to conduct a comprehensive analysis and predictive modeling of condominium prices in Malaysia. The Malaysian real estate market is dynamic, diverse, and highly competitive, influenced by various economic factors like population growth, urbanization, and changing consumer preferences. Demographic shifts, particularly an aging society, are altering housing demands, leading developers to diversify offerings. The market exhibits regional variations in supply and demand, causing price fluctuations.

Real estate companies navigate this complexity through extensive market research to understand and forecast trends. They strive to balance attractive pricing with profitability by investing in locations with promising future potential. The research project seeks to leverage data analytics to address these challenges. The goal is to analyze factors affecting condominium prices in Malaysia, developing predictive models for price estimation and to detect if the selected property should be leasehold or freehold. The study aims to provide the client with deep market insights, reliable data support, and strategic recommendations for future building planning and design.

The initial dataset, sourced from Mudah.com, an online marketplace platform, comprises 4,000 records and 32 columns with detailed information on property listings, prices, and descriptions.

Research Objectives

1.Exploratory Data Analysis (EDA): Understand the relationships and correlations among different variables related to condominium properties.

2.Classification Modeling: Classify tenure types (e.g., Freehold or Leasehold) based on various property-related features.

3.Regression Modeling: Predict the prices of condominiums using linear regression, random forest regression, and decision tree regression models.

1. Data pre-processing

1.1. Environment Setup and Library Loading

This section is for setting up the environment and loading necessary libraries.

# Clear all variables
#rm(list = ls(all.names = TRUE))

# Libraries for data manipulation
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Libraries for data visualization
library(ggplot2)
library(lattice)

# Libraries for pre-processing
#install.packages("caret")
library(caret)
## Warning: package 'caret' was built under R version 4.3.2
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

1.2. Data Reading and Initial Exploration

Reading the data file and performing basic data exploration.

# Reading the data
data <- read.csv('houses.csv', fileEncoding = "UTF-8")

# Initial Data Exploration
head(data)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            description
## 1 Iconic Building @ KL SETAPAK\nNew launching & Latest condo !!!!! 🔥\nHouse with luxury hotel concept 😍👑\n💎 Freehold\n🔑 Dual key\n🛌 2 / 3 / 4 rooms\n💰Affordable and Low entry price\n💼 100% full furnish, move in with a luggage\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp link :\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\n🏝5 🌟 facilities : Sky lounge, Sky bridge, Sky garden\n🚗 6km to KLCC/ Bkt Bintang\n🍱 Food Heaven\n📈 Freehold Appreciation\n👑 Luxury Hotel Drop-off Lobby\n🏊🏻 Infinity Pool\n🏡 Sky Garden, Sky Lounge, Sky bridge\nFacilities:\nLevel 10 – Elevated lawn for Yoga, Jogging Trail, Jacuzzi, Infinity Pool, Wading Pool, Pool Deck, Play land, Sunbathe Terrace, Sun Lounge, Gymnamsium, Viewing Deck, Squash, Futsal, Half Basketball court, central lobby and relaxing yard.\nLevel 50 – Viewing Terrace, Barbeque area, Gathering space, Turf Mound, Multistep Seating Lounge, Rooftop Lounge and open stage.\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp link :\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\nContinue Reading\nPROPERTY HIGHLIGHTS\nNEW!\nBedroom\n4\nBathroom\n2\nProperty Size\n1000 sq.ft.\nNearby School\nSekolah Menengah Pendidikan Khas Cacat Penglihatan\nNearby Mall\nSetapak Central\nSee more details
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                FOR SALE @ RM250,000\nIntroduction:\n~ Pangsapuri Kenanga @ Kampung Lapan\n~ 980 sqft\n~ 3 Bedrooms & 2 Bathrooms\n~ Bathroom with Water Heather\n~ Master Bedroom with Aircond\n~ Walk Up Apartment\n~ Bare Unit\nFacilities\n~ Enter with access card\n~ Swimming Pool\n~ Nice environment\nNearby:\n- Strategic Location\n- Jonker Walk, UNESCO World Heritage Site\n- Dataran Pahlawan\n- Hospital Mahkota\n- Kota Laksamana\n- Melaka Central (Bus Station)\n>>> https://www.youtube.com/@propertysoldier6688\n>>> https://www.facebook.com/propertysoldiermelaka\n>>> You may call me up for other house inquiries, we have other houses ready for sale,\nMelaka Property Wanted For Sale: House, Apartment, Condominium, Shop, Factory, Land\nand etc.\n#Free Consultation Loan Service & Lawyer advise\nMy Profile:\nWe are a property management company.\nIn this industry for over eight years.\nProvide Service :\n* Sales & Purchase\n* Super Host Home Stay Management\n* Long-Term Contract Renting\n* House Cleaning Service\n* Design & Renovation\nContinue Reading
## 3                                                                                                                                                                                                                                                                                    [Below Market] Sri Lavender Apartment,Tmn Sepakat Indah,100% FULL LOAN\nRM 230,000\n(💥BELOW MARKET VALUE💥)\nMARKET VALUE: RM 330,000\n# LPPSA | KWSP AC 2\n# Mark up Price Available\n# 100% Full Loan Available\n# Cash Back Available\n# Below Market Price\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\n=============================\nProperty Type: Apartment\nTitle type: Freehold\nBedrooms: 3\nBathroom: 2\nSize: 1000 sq.ft.\nProperty Details:\n- 7th Floor with Balcony\n- Freehold, Open title & Strata Ready\n- 1 Covered parking lot(can be seen from the unit ) with smart access card\n- Corner unit with awesome scenery from any angle of the unit.\n=============================\nFacilities:\nMini Market, Playground, 24 Hour Security, Balcony/Patio, Cable TV\n \nLocation :\n- 3 min to Silk Highway\n- 7 min to Plus Highway/Uniten\n- 5 min to Bandar Baru Bangi(nearest to seksyen 7)\n- 10 min to IOI Mall/Hospital Serdang/UPM\n- 10 min to Kajang/MRT Stadium Kajang\nPublic Transport\n- 2 min walking distance to Smart Selangor(free) bus stop/MRT Feeder Bus(MRT Kajang)\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\nContinue Reading\nPROPERTY HIGHLIGHTS\nNEW!\nBedroom\n3\nBathroom\n2\nProperty Size\n1000 sq.ft.\nSee more details
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Flat Pandan Indah\nJalan Pandan Indah 3/3\nNon Bumi lot, 100% loan\n=====================\n• Walk up flat\n• Non Bumi lot\n• Leasehold with strata title\n• Build up 592sqft\n• 3 bedroom,1 bathroom\n• Flooring fully tiles\n• Build  in concreate table top\n• Well Maintain and Good Condition\nHenrick Tan\nShow contact number\nShow contact number\nSenior Negotiator
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        * Open-concept Soho with balcony, unblock view\n* fully furnished studio unit 467 sq feet\n*consists of built in kitchen cabinets with induction cookers, microwave oven, washing machine, fridge, led tv, aircond, ceiling fan, water heater, alarm bell system, dining table sets with chairs, curtains, bed with mattress plus wardrobe & sofa, installed iron grill for front door\n*Best for the newly married\nPROPERTY HIGHLIGHTS\nNEW!\nBedroom\n1\nBathroom\n1\nProperty Size\n467 sq.ft.\nNearby School\nSekolah Jenis Kebangsaan (T) Ladang Midlands\nNearby Mall\ni-Soho i-City\nSee more details
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         D'Piazza, Bayan Baru for SALE\nDetails:\n✅1100 sqf\n✅3 bedroom\n✅2bathroom\n✅1 car park\n📌Original unit, airconds, heater \nInterested please call \nShow contact number
##   Bedroom Bathroom Property.Size
## 1       4        2   1000 sq.ft.
## 2       3        2    980 sq.ft.
## 3       3        2   1000 sq.ft.
## 4       3        1    592 sq.ft.
## 5       1        1    467 sq.ft.
## 6       3        2   1100 sq.ft.
##                                        Nearby.School     Nearby.Mall   Ad.List
## 1 Sekolah Menengah Pendidikan Khas Cacat Penglihatan Setapak Central  98187451
## 2                                                                    101683090
## 3                                                                    103792905
## 4                                                                    103806240
## 5       Sekolah Jenis Kebangsaan (T) Ladang Midlands   i-Soho i-City 103806234
## 6                                                                    103739787
##                            Category
## 1 Apartment / Condominium, For sale
## 2 Apartment / Condominium, For sale
## 3 Apartment / Condominium, For sale
## 4 Apartment / Condominium, For sale
## 5 Apartment / Condominium, For sale
## 6 Apartment / Condominium, For sale
##                                                                                                                       Facilities
## 1                                                                                                                              -
## 2                                                     Parking, Security, Swimming Pool, Playground, Barbeque area, Jogging Track
## 3                                                    Playground, Minimart, Jogging Track, Barbeque area, Parking, Security, Lift
## 4                                                                                   Parking, Playground, Minimart, Jogging Track
## 5                                                                                         Minimart, Gymnasium, Parking, Security
## 6 Parking, Swimming Pool, Multipurpose hall, Sauna, Minimart, Barbeque area, Security, Playground, Gymnasium, Tennis Court, Lift
##               Building.Name             Developer Tenure.Type
## 1         Kenwingston Platz     Kenwingston Group    Freehold
## 2 Kenanga (Park View Court)                     -    Freehold
## 3    Sri Lavender Apartment             TLS Group    Freehold
## 4         Flat Pandan Indah                     -   Leasehold
## 5           i-Soho @ i-City              i-Berhad    Freehold
## 6      D'Piazza Condominium X-Scan Penang Sdn Bhd    Freehold
##                                                            Address
## 1                              Jalan Gombak, Setapak, Kuala Lumpur
## 2                           Jalan Kenanga 3/8, Melaka City, Melaka
## 3 Jalan Sepakat Indah 2/1, Taman Sepakat Indah 2, Kajang, Selangor
## 4                         jalan pandan indah 3/3, Selangor, Ampang
## 5                         Jalan Plumbum 7/102, Shah Alam, Selangor
## 6                         Jalan Mayang Pasir 2, Bayan Baru, Penang
##   Completion.Year X..of.Floors Total.Units     Property.Type Parking.Lot
## 1               -            -           - Service Residence           2
## 2               -            -           -         Apartment           1
## 3            2007           13         445         Apartment           1
## 4               -            -           -              Flat           1
## 5               -           43         956            Studio           -
## 6            2010           19         706       Condominium           1
##   Floor.Range   Land.Title Firm.Type Firm.Number REN.Number
## 1           - Non Bumi Lot        VE       30338          -
## 2         Low Non Bumi Lot         E       30812  REN 15862
## 3      Medium Non Bumi Lot         -           -          -
## 4           - Non Bumi Lot         E       11584  REN 16279
## 5         Low     Bumi Lot         E       31916          -
## 6         Low Non Bumi Lot         E       11307  REN 61472
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Bus.Stop
## 1 Bus Stop Starparc Point\nBus Stop Setapak Central\nBus Stop Setapak Sentral (Opp)\nBus Stop Columbia Hospital\nBus Stop PV12 Residence\nBus Stop PV15 Platinum\nBus Stop PV12 Condominium (Opp)\nBus Stop Sri Utama Schools (Opp)\nBus Stop CIMB Genting Klang\nBus Stop 1 Sri Utama Schools\nBus Stop Aeon Big Danau Kota\nBus Stop 2 Sri Utama Schools\nBus Stop Setapak Commercial\nBus Stop 1 Setapak Food Court\nBus Stop Setapak Industrial Area\nBus Stop 2 Setapak Food Court\nBus Stop 2 Medan Makmur Setapak\nBus Stop 1 Medan Makmur Setapak\nBus Stop Langkawi Apartment (Opp)\nBus Stop BHP Genting Klang\nBus Stop Jalan Kilang\nBus Stop PV128 Setapak
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## 5                                                                                                                                                                                                                                                    Bus Stop at Persiaran Permai 1\nBus Stop at Persiaran Permai 2\nBus Stop at Pusat Komersial Seksyen 7 (Timur)\nBus Stop at Persiaran Bestari 1\nBus Stop at Jalan Plumbum N7/N\nBus Stop at UITM Shah Alam (Barat)\nBus Stop at Jakel (Seksyen 7)\nBus Stop at Jalan Sungai Rasau 1\nBus Stop at Federal Highway 1\nBus Stop at Jalan Sungai Rasau 2\nBus Stop at Jakel 2 (Seksyen 7)\nBus Stop at Pusat Kesihatan
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
##                                                    Mall
## 1                                       Setapak Central
## 2                                                      
## 3                                                      
## 4                                                      
## 5 i-Soho i-City\nGulati\nCentral i-City Shopping Centre
## 6                                                      
##                                                                                                                                                                                    Park
## 1 Park at Taman Tasik Danau Kota, Setapak, Kuala Lumpur, Malaysia\nPark at Taman Danau Kota, Setapak, Kuala Lumpur, Malaysia\nPark 1 at Setapak Garden, Setapak, Kuala Lumpur, Malaysia
## 2                                                                                                                                                                                      
## 3                                                                                                                                                                                      
## 4                                                                                                                                                                                      
## 5                                                                                                                        Park 2 at Section 7, Shah Alam\nPark 1 at Section 7, Shah Alam
## 6                                                                                                                                                                                      
##                                                                                                                                                                                   School
## 1 Sekolah Menengah Pendidikan Khas Cacat Penglihatan\nSekolah Kebangsaan Danau Kota\nSJK (C) Wangsa Maju\nKolej Vokasional Setapak\nSri Utama Schools\nSK Danau Kota (2)\nSMK Danau Kota
## 2                                                                                                                                                                                       
## 3                                                                                                                                                                                       
## 4                                                                                                                                                                                       
## 5                                                                                                             Sekolah Jenis Kebangsaan (T) Ladang Midlands\nSekolah Kebangsaan Seksyen 7
## 6                                                                                                                                                                                       
##                                      Hospital      price
## 1                      Columbia Asia Hospital RM 340 000
## 2                                             RM 250 000
## 3                                             RM 230 000
## 4                                             RM 158 000
## 5 Osel Clinic (Shah Alam)\nHospital Shah Alam RM 305 000
## 6                                             RM 425 000
##                        Highway Nearby.Railway.Station Railway.Station
## 1                                                                    
## 2                                                                    
## 3 SILK Sg Ramal (T) Toll Plaza                                       
## 4                                                                    
## 5                                                                    
## 6
str(data)
## 'data.frame':    4000 obs. of  32 variables:
##  $ description           : chr  "Iconic Building @ KL SETAPAK\nNew launching & Latest condo !!!!! 🔥\nHouse with luxury hotel concept 😍👑\n💎 Freeh"| __truncated__ "FOR SALE @ RM250,000\nIntroduction:\n~ Pangsapuri Kenanga @ Kampung Lapan\n~ 980 sqft\n~ 3 Bedrooms & 2 Bathroo"| __truncated__ "[Below Market] Sri Lavender Apartment,Tmn Sepakat Indah,100% FULL LOAN\nRM 230,000\n(💥BELOW MARKET VALUE💥)\nMAR"| __truncated__ "Flat Pandan Indah\nJalan Pandan Indah 3/3\nNon Bumi lot, 100% loan\n=====================\n• Walk up flat\n• No"| __truncated__ ...
##  $ Bedroom               : chr  "4" "3" "3" "3" ...
##  $ Bathroom              : chr  "2" "2" "2" "1" ...
##  $ Property.Size         : chr  "1000 sq.ft." "980 sq.ft." "1000 sq.ft." "592 sq.ft." ...
##  $ Nearby.School         : chr  "Sekolah Menengah Pendidikan Khas Cacat Penglihatan" "" "" "" ...
##  $ Nearby.Mall           : chr  "Setapak Central" "" "" "" ...
##  $ Ad.List               : int  98187451 101683090 103792905 103806240 103806234 103739787 103690767 103615852 103615849 102460346 ...
##  $ Category              : chr  "Apartment / Condominium, For sale" "Apartment / Condominium, For sale" "Apartment / Condominium, For sale" "Apartment / Condominium, For sale" ...
##  $ Facilities            : chr  "-" "Parking, Security, Swimming Pool, Playground, Barbeque area, Jogging Track" "Playground, Minimart, Jogging Track, Barbeque area, Parking, Security, Lift" "Parking, Playground, Minimart, Jogging Track" ...
##  $ Building.Name         : chr  "Kenwingston Platz" "Kenanga (Park View Court)" "Sri Lavender Apartment" "Flat Pandan Indah" ...
##  $ Developer             : chr  "Kenwingston Group" "-" "TLS Group" "-" ...
##  $ Tenure.Type           : chr  "Freehold" "Freehold" "Freehold" "Leasehold" ...
##  $ Address               : chr  "Jalan Gombak, Setapak, Kuala Lumpur" "Jalan Kenanga 3/8, Melaka City, Melaka" "Jalan Sepakat Indah 2/1, Taman Sepakat Indah 2, Kajang, Selangor" "jalan pandan indah 3/3, Selangor, Ampang" ...
##  $ Completion.Year       : chr  "-" "-" "2007" "-" ...
##  $ X..of.Floors          : chr  "-" "-" "13" "-" ...
##  $ Total.Units           : chr  "-" "-" "445" "-" ...
##  $ Property.Type         : chr  "Service Residence" "Apartment" "Apartment" "Flat" ...
##  $ Parking.Lot           : chr  "2" "1" "1" "1" ...
##  $ Floor.Range           : chr  "-" "Low" "Medium" "-" ...
##  $ Land.Title            : chr  "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" ...
##  $ Firm.Type             : chr  "VE" "E" "-" "E" ...
##  $ Firm.Number           : chr  "30338" "30812" "-" "11584" ...
##  $ REN.Number            : chr  "-" "REN 15862" "-" "REN 16279" ...
##  $ Bus.Stop              : chr  "Bus Stop Starparc Point\nBus Stop Setapak Central\nBus Stop Setapak Sentral (Opp)\nBus Stop Columbia Hospital\n"| __truncated__ "" "" "" ...
##  $ Mall                  : chr  "Setapak Central" "" "" "" ...
##  $ Park                  : chr  "Park at Taman Tasik Danau Kota, Setapak, Kuala Lumpur, Malaysia\nPark at Taman Danau Kota, Setapak, Kuala Lumpu"| __truncated__ "" "" "" ...
##  $ School                : chr  "Sekolah Menengah Pendidikan Khas Cacat Penglihatan\nSekolah Kebangsaan Danau Kota\nSJK (C) Wangsa Maju\nKolej V"| __truncated__ "" "" "" ...
##  $ Hospital              : chr  "Columbia Asia Hospital" "" "" "" ...
##  $ price                 : chr  "RM 340 000" "RM 250 000" "RM 230 000" "RM 158 000" ...
##  $ Highway               : chr  "" "" "SILK Sg Ramal (T) Toll Plaza" "" ...
##  $ Nearby.Railway.Station: chr  "" "" "" "" ...
##  $ Railway.Station       : chr  "" "" "" "" ...
summary(data)
##  description          Bedroom            Bathroom         Property.Size     
##  Length:4000        Length:4000        Length:4000        Length:4000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Nearby.School      Nearby.Mall           Ad.List            Category        
##  Length:4000        Length:4000        Min.   : 30964923   Length:4000       
##  Class :character   Class :character   1st Qu.:102384201   Class :character  
##  Mode  :character   Mode  :character   Median :103350207   Mode  :character  
##                                        Mean   :102443246                     
##                                        3rd Qu.:103782293                     
##                                        Max.   :103806285                     
##   Facilities        Building.Name       Developer         Tenure.Type       
##  Length:4000        Length:4000        Length:4000        Length:4000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Address          Completion.Year    X..of.Floors       Total.Units       
##  Length:4000        Length:4000        Length:4000        Length:4000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Property.Type      Parking.Lot        Floor.Range         Land.Title       
##  Length:4000        Length:4000        Length:4000        Length:4000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##   Firm.Type         Firm.Number         REN.Number          Bus.Stop        
##  Length:4000        Length:4000        Length:4000        Length:4000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      Mall               Park              School            Hospital        
##  Length:4000        Length:4000        Length:4000        Length:4000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     price             Highway          Nearby.Railway.Station
##  Length:4000        Length:4000        Length:4000           
##  Class :character   Class :character   Class :character      
##  Mode  :character   Mode  :character   Mode  :character      
##                                                              
##                                                              
##                                                              
##  Railway.Station   
##  Length:4000       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

1.3. Data Cleaning and Transformation

This part involves cleaning the data and preparing it for analysis. First check for missing values and then remove duplicates. After that, we extract and transform the data from the “Description” and “Facility” columns, creating new binary columns based on specific keywords.

# Check for missing values - counts
column_counts <- colSums(!is.na(data))
column_counts
##            description                Bedroom               Bathroom 
##                   4000                   4000                   4000 
##          Property.Size          Nearby.School            Nearby.Mall 
##                   4000                   4000                   4000 
##                Ad.List               Category             Facilities 
##                   4000                   4000                   4000 
##          Building.Name              Developer            Tenure.Type 
##                   4000                   4000                   4000 
##                Address        Completion.Year           X..of.Floors 
##                   4000                   4000                   4000 
##            Total.Units          Property.Type            Parking.Lot 
##                   4000                   4000                   4000 
##            Floor.Range             Land.Title              Firm.Type 
##                   4000                   4000                   4000 
##            Firm.Number             REN.Number               Bus.Stop 
##                   4000                   4000                   4000 
##                   Mall                   Park                 School 
##                   4000                   4000                   4000 
##               Hospital                  price                Highway 
##                   4000                   4000                   4000 
## Nearby.Railway.Station        Railway.Station 
##                   4000                   4000
# Remove duplicates
data <- distinct(data)
summary(data)
##  description          Bedroom            Bathroom         Property.Size     
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Nearby.School      Nearby.Mall           Ad.List            Category        
##  Length:3815        Length:3815        Min.   : 30964923   Length:3815       
##  Class :character   Class :character   1st Qu.:102383709   Class :character  
##  Mode  :character   Mode  :character   Median :103343288   Mode  :character  
##                                        Mean   :102446511                     
##                                        3rd Qu.:103782204                     
##                                        Max.   :103806285                     
##   Facilities        Building.Name       Developer         Tenure.Type       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Address          Completion.Year    X..of.Floors       Total.Units       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Property.Type      Parking.Lot        Floor.Range         Land.Title       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##   Firm.Type         Firm.Number         REN.Number          Bus.Stop        
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      Mall               Park              School            Hospital        
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     price             Highway          Nearby.Railway.Station
##  Length:3815        Length:3815        Length:3815           
##  Class :character   Class :character   Class :character      
##  Mode  :character   Mode  :character   Mode  :character      
##                                                              
##                                                              
##                                                              
##  Railway.Station   
##  Length:3815       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
# Define a list of keywords for each relevant column
garden_keywords <- c('garden')
Hospital_keywords <- c('hospital')
security_keywords <- c('security', 'access card', 'gated')
lift_keywords <- c('lift')
sauna_keywords <- c('sauna')
Basketball_keywords <- c('basketball')
parking_keywords <- c('parking')
Badminton_keywords <- c('badminton')
swimming_pool_keywords <- c('swimming pool', 'infinity pool', 'pool')
playground_keywords <- c('playground')
tennis_keywords <- c('tennis')
squash_keywords <- c('squash')
surau_keywords <- c('surau')
gymnasium_keywords <- c('gymnasium','gym')
barbeque_area_keywords <- c('barbeque area')
minimart_keywords <- c('minimart', 'supermarket', 'mart')
multipurpose_hall_keywords <- c('multipurpose hall','hall')
clubhouse_keywords <- c('club house')
jogging_track_keywords <- c('jogging track','jogging')
mrt_keywords <- c('mrt','lrt','erl','ktm')

# Convert the 'description' column to lowercase
data$description <- tolower(data$description)
data$Facilities <- tolower(data$Facilities)

#Data Extraction from semi-structure description data Create new columns for each relevant feature
data$garden <- ifelse(sapply(data$description, function(x) any(grepl(paste(garden_keywords, collapse = '|'), x))), 1, 0)
data$securitynew <- ifelse(sapply(data$description, function(x) any(grepl(paste(security_keywords, collapse = '|'), x))), 1, 0)
data$liftnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(lift_keywords, collapse = '|'), x))), 1, 0)
data$saunanew <- ifelse(sapply(data$description, function(x) any(grepl(paste(sauna_keywords, collapse = '|'), x))), 1, 0)
data$tennisnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(tennis_keywords, collapse = '|'), x))), 1, 0)
data$squashnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(squash_keywords, collapse = '|'), x))), 1, 0)
data$surau <- ifelse(sapply(data$description, function(x) any(grepl(paste(surau_keywords, collapse = '|'), x))), 1, 0)
data$parkingnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(parking_keywords, collapse = '|'), x))), 1, 0)
data$swimmingpoolnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(swimming_pool_keywords, collapse = '|'), x))), 1, 0)
data$playgroundnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(playground_keywords, collapse = '|'), x))), 1, 0)
data$gymnasiumnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(gymnasium_keywords, collapse = '|'), x))), 1, 0)
data$barbequeareanew<- ifelse(sapply(data$description, function(x) any(grepl(paste(barbeque_area_keywords, collapse = '|'), x))), 1, 0)
data$minimartnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(minimart_keywords, collapse = '|'), x))), 1, 0)
data$multipurposehallnew<- ifelse(sapply(data$description, function(x) any(grepl(paste(multipurpose_hall_keywords, collapse = '|'), x))), 1, 0)
data$joggingtracknew <- ifelse(sapply(data$description, function(x) any(grepl(paste(jogging_track_keywords, collapse = '|'), x))), 1, 0)
data$hospital <- ifelse(sapply(data$description, function(x) any(grepl(paste(Hospital_keywords, collapse = '|'), x))), 1, 0)
data$mrt.lrt <- ifelse(sapply(data$description, function(x) any(grepl(paste(mrt_keywords, collapse = '|'), x))), 1, 0)
data$basketball <- ifelse(sapply(data$description, function(x) any(grepl(paste(Basketball_keywords, collapse = '|'), x))), 1, 0)
data$badminton <- ifelse(sapply(data$description, function(x) any(grepl(paste(Badminton_keywords, collapse = '|'), x))), 1, 0)
data$clubhousenew <- ifelse(sapply(data$description, function(x) any(grepl(paste(clubhouse_keywords, collapse = '|'), x))), 1, 0)
summary(data)
##  description          Bedroom            Bathroom         Property.Size     
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Nearby.School      Nearby.Mall           Ad.List            Category        
##  Length:3815        Length:3815        Min.   : 30964923   Length:3815       
##  Class :character   Class :character   1st Qu.:102383709   Class :character  
##  Mode  :character   Mode  :character   Median :103343288   Mode  :character  
##                                        Mean   :102446511                     
##                                        3rd Qu.:103782204                     
##                                        Max.   :103806285                     
##   Facilities        Building.Name       Developer         Tenure.Type       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Address          Completion.Year    X..of.Floors       Total.Units       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Property.Type      Parking.Lot        Floor.Range         Land.Title       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##   Firm.Type         Firm.Number         REN.Number          Bus.Stop        
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      Mall               Park              School            Hospital        
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     price             Highway          Nearby.Railway.Station
##  Length:3815        Length:3815        Length:3815           
##  Class :character   Class :character   Class :character      
##  Mode  :character   Mode  :character   Mode  :character      
##                                                              
##                                                              
##                                                              
##  Railway.Station        garden        securitynew        liftnew       
##  Length:3815        Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  Class :character   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Mode  :character   Median :0.0000   Median :0.0000   Median :0.00000  
##                     Mean   :0.1012   Mean   :0.2375   Mean   :0.07837  
##                     3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##                     Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##     saunanew         tennisnew         squashnew           surau       
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.00000   Median :0.00000   Median :0.0000  
##  Mean   :0.02805   Mean   :0.02595   Mean   :0.01232   Mean   :0.0789  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.0000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000  
##    parkingnew     swimmingpoolnew  playgroundnew     gymnasiumnew   
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1678   Mean   :0.2013   Mean   :0.1471   Mean   :0.1261  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  barbequeareanew    minimartnew      multipurposehallnew joggingtracknew  
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000     Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000     1st Qu.:0.00000  
##  Median :0.00000   Median :0.00000   Median :0.00000     Median :0.00000  
##  Mean   :0.01389   Mean   :0.09174   Mean   :0.08598     Mean   :0.03827  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000     3rd Qu.:0.00000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.00000     Max.   :1.00000  
##     hospital         mrt.lrt         basketball        badminton      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   :0.1127   Mean   :0.2571   Mean   :0.02333   Mean   :0.03853  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##   clubhousenew     
##  Min.   :0.000000  
##  1st Qu.:0.000000  
##  Median :0.000000  
##  Mean   :0.007339  
##  3rd Qu.:0.000000  
##  Max.   :1.000000

1.4. Advanced Data Manipulation

Here we have more complex data manipulation. This includes applying one-hot encoding to the ‘Facilities’ column, logically merging similar columns, and removing additional columns that are no longer required for our analysis.

# Separate the facilities into multiple columns (one-hot encoding)
data <- data %>%
  separate_rows(Facilities, sep = ", ") %>%
  mutate(value = 1) %>%
  spread(Facilities, value, fill = 0)

summary(data)
##  description          Bedroom            Bathroom         Property.Size     
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Nearby.School      Nearby.Mall           Ad.List            Category        
##  Length:3815        Length:3815        Min.   : 30964923   Length:3815       
##  Class :character   Class :character   1st Qu.:102383709   Class :character  
##  Mode  :character   Mode  :character   Median :103343288   Mode  :character  
##                                        Mean   :102446511                     
##                                        3rd Qu.:103782204                     
##                                        Max.   :103806285                     
##  Building.Name       Developer         Tenure.Type          Address         
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Completion.Year    X..of.Floors       Total.Units        Property.Type     
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Parking.Lot        Floor.Range         Land.Title         Firm.Type        
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Firm.Number         REN.Number          Bus.Stop             Mall          
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      Park              School            Hospital            price          
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Highway          Nearby.Railway.Station Railway.Station        garden      
##  Length:3815        Length:3815            Length:3815        Min.   :0.0000  
##  Class :character   Class :character       Class :character   1st Qu.:0.0000  
##  Mode  :character   Mode  :character       Mode  :character   Median :0.0000  
##                                                               Mean   :0.1012  
##                                                               3rd Qu.:0.0000  
##                                                               Max.   :1.0000  
##   securitynew        liftnew           saunanew         tennisnew      
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.00000   Median :0.00000   Median :0.00000  
##  Mean   :0.2375   Mean   :0.07837   Mean   :0.02805   Mean   :0.02595  
##  3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
##    squashnew           surau          parkingnew     swimmingpoolnew 
##  Min.   :0.00000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.01232   Mean   :0.0789   Mean   :0.1678   Mean   :0.2013  
##  3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.00000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  playgroundnew     gymnasiumnew    barbequeareanew    minimartnew     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   :0.1471   Mean   :0.1261   Mean   :0.01389   Mean   :0.09174  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##  multipurposehallnew joggingtracknew      hospital         mrt.lrt      
##  Min.   :0.00000     Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.00000     1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.00000     Median :0.00000   Median :0.0000   Median :0.0000  
##  Mean   :0.08598     Mean   :0.03827   Mean   :0.1127   Mean   :0.2571  
##  3rd Qu.:0.00000     3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.00000     Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##    basketball        badminton        clubhousenew            -         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.000000   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0.0000  
##  Median :0.00000   Median :0.00000   Median :0.000000   Median :0.0000  
##  Mean   :0.02333   Mean   :0.03853   Mean   :0.007339   Mean   :0.1594  
##  3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:0.0000  
##  Max.   :1.00000   Max.   :1.00000   Max.   :1.000000   Max.   :1.0000  
##        10            barbeque area      club house       gymnasium     
##  Min.   :0.0000000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.0002621   Mean   :0.3714   Mean   :0.1583   Mean   :0.4938  
##  3rd Qu.:0.0000000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :1.0000000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  jogging track         lift           minimart      multipurpose hall
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   
##  Median :0.0000   Median :1.0000   Median :0.0000   Median :0.0000   
##  Mean   :0.3554   Mean   :0.5245   Mean   :0.3992   Mean   :0.3269   
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   
##     parking         playground         sauna           security     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000  
##  Median :1.0000   Median :1.0000   Median :0.0000   Median :1.0000  
##  Mean   :0.7596   Mean   :0.6826   Mean   :0.2663   Mean   :0.7554  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##   squash court    swimming pool     tennis court   
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.1499   Mean   :0.6005   Mean   :0.1714  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000
data$hospital1 <- ifelse(grepl(".", data$Hospital), 1, 0)
data$busstop <- ifelse(grepl(".", data$Bus.Stop), 1, 0)
data$school <- ifelse(grepl(".", data$School), 1, 0)
data$mall <- ifelse(grepl(".", data$Mall), 1, 0)
data$railway <- ifelse(grepl(".", data$Railway.Station), 1, 0)
data$highway <- ifelse(grepl(".", data$Highway), 1, 0)

# Merge similar columns with Logical OR
data$barbequearea <- as.integer(data$"barbeque area" | data$barbequeareanew)
data$clubhouse <- as.integer(data$"club house" | data$clubhousenew)
data$gymnasium <- as.integer(data$gymnasium | data$gymnasiumnew)
data$joggingtrack <- as.integer(data$"jogging track" | data$joggingtracknew)
data$lift <- as.integer(data$lift | data$liftnew)
data$minimart <- as.integer(data$minimart | data$minimartnew)
data$multipurposehall <- as.integer(data$"multipurpose hall" | data$multipurposehallnew)
data$parking <- as.integer(data$parking | data$parkingnew)
data$playground <- as.integer(data$playground | data$playgroundnew)
data$sauna <- as.integer(data$sauna | data$saunanew)
data$security <- as.integer(data$security | data$securitynew)
data$squashcourt <- as.integer(data$"squash court"| data$squashnew)
data$swimmingpool <- as.integer(data$"swimming pool" | data$swimmingpoolnew)
data$tennis <- as.integer(data$"tennis court" | data$tennisnew)
data$hospital <- as.integer(data$hospital1 | data$hospital )
data$railway<-as.integer(data$railway|data$mrt.lrt)

# Drop the additional columns
data <- data[, !(names(data) %in% c("barbeque area","barbequeareanew","club house","clubhousenew","gymnasiumnew", "joggingtracknew","jogging track","liftnew","minimartnew","multipurposehallnew" ,"multipurpose hall","parkingnew","playgroundnew","saunanew","securitynew", "squashnew","squash court","swimming pool","swimmingpoolnew","tennis court", "tennisnew","Bus.Stop","schoolnew","Hospital","hospital1","School","Mall","Railway.Station", "mrt.lrt","Highway"))]

1.5. Final Data Processing

At the end of data cleaning, processing steps include handling missing values, extracting specific data from columns, replacing placeholder values with NA, and converting the data to the correct type for further analysis.

# Removal of irrelevant columns
data <- dplyr::select(data, -description, -Ad.List, -Firm.Type, -Firm.Number, -REN.Number,-Category,-Park,-Nearby.School,-Nearby.Mall,-Nearby.Railway.Station)
data <- dplyr::select(data, -c('-', '10'))
summary(data)
##    Bedroom            Bathroom         Property.Size      Building.Name     
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##   Developer         Tenure.Type          Address          Completion.Year   
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  X..of.Floors       Total.Units        Property.Type      Parking.Lot       
##  Length:3815        Length:3815        Length:3815        Length:3815       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Floor.Range         Land.Title           price               garden      
##  Length:3815        Length:3815        Length:3815        Min.   :0.0000  
##  Class :character   Class :character   Class :character   1st Qu.:0.0000  
##  Mode  :character   Mode  :character   Mode  :character   Median :0.0000  
##                                                           Mean   :0.1012  
##                                                           3rd Qu.:0.0000  
##                                                           Max.   :1.0000  
##      surau           hospital        basketball        badminton      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.00000  
##  Mean   :0.0789   Mean   :0.1798   Mean   :0.02333   Mean   :0.03853  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  
##    gymnasium           lift          minimart         parking      
##  Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:1.0000  
##  Median :1.0000   Median :1.000   Median :0.0000   Median :1.0000  
##  Mean   :0.5174   Mean   :0.557   Mean   :0.4385   Mean   :0.7924  
##  3rd Qu.:1.0000   3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.000   Max.   :1.0000   Max.   :1.0000  
##    playground         sauna           security         busstop      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.7101   Mean   :0.2771   Mean   :0.7885   Mean   :0.1793  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##      school            mall           railway          highway       
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.00000  
##  Mean   :0.2406   Mean   :0.1201   Mean   :0.2587   Mean   :0.03591  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##   barbequearea      clubhouse       joggingtrack    multipurposehall
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.3767   Mean   :0.1638   Mean   :0.3712   Mean   :0.3672  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##   squashcourt     swimmingpool        tennis      
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.000   Median :1.0000   Median :0.0000  
##  Mean   :0.157   Mean   :0.6265   Mean   :0.1811  
##  3rd Qu.:0.000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.000   Max.   :1.0000   Max.   :1.0000
# Removing rows where Address is NA
data <- data %>% filter(!is.na(Address))

# Extracting State and City
# Define a set of Malay states
malay_states <- c(
  'Johor', 'Kedah', 'Kelantan', 'Perak', 'Selangor', 'Melaka', 'Negeri Sembilan',
  'Pahang', 'Perlis', 'Penang', 'Sabah', 'Sarawak', 'Sarawak', 'Terengganu',
  'Kuala Lumpur', 'Labuan', 'Putrajaya'
)

# Extract state information and create a new column 'state'
data <- data %>%
  mutate(State = str_extract(Address, paste(malay_states, collapse = '|')))

#Remove data without address
data <- subset(data, Address != "-")

# Extract state information and create a new column 'city'
data$City <- ifelse(
  sapply(strsplit(as.character(data$Address), ", "), function(x) ifelse(length(x) > 1, x[length(x) - 1], NA)) %in% malay_states,
  sapply(strsplit(as.character(data$Address), ", "), function(x) x[length(x)]),
  sapply(strsplit(as.character(data$Address), ", "), function(x) ifelse(length(x) > 1, x[length(x) - 1], NA))
)


# Replace "-" with 0 or NA and convert to numeric or appropriate type
data$Bedroom <- as.numeric(gsub("-", "0", data$Bedroom))
data$Bathroom <- as.numeric(gsub("-", "0", data$Bathroom))
data$No.of.Floors <- as.numeric(gsub("-", NA, data$X..of.Floors))
data$Total.Units <- as.numeric(gsub("-", NA, data$Total.Units))
data$Parking.Lot <- as.numeric(gsub("-", "0", data$Parking.Lot))
data$Property.Size <- as.numeric(gsub(" sq.ft.", "", gsub("-", "0", data$Property.Size)))
data$price <- as.numeric(gsub("[^0-9]", "", data$price))
data$Building.Name <- ifelse(data$Building.Name == "-", NA, data$Building.Name)
data$Developer <- ifelse(data$Developer == "-", NA, data$Developer)
data$Completion.Year <- ifelse(data$Completion.Year == "-", NA, data$Completion.Year)
data$Floor.Range <- ifelse(data$Floor.Range == "-", NA, data$Floor.Range)

# Remove renamed column
data <- subset(data, select = -c(X..of.Floors, Address))

2. Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves exploring and understanding the structure, patterns, and characteristics of a dataset before applying more advanced statistical methods or machine learning algorithms.

2.1.Data Inspection

This section is to examine the first few rows of the dataset to understand its structure. Check the data types of each variable (numeric, categorical, datetime, etc.). Identify missing values and outliers.

#view first six rows of dataset
head(data)
## # A tibble: 6 × 40
##   Bedroom Bathroom Property.Size Building.Name             Developer Tenure.Type
##     <dbl>    <dbl>         <dbl> <chr>                     <chr>     <chr>      
## 1       4        2          1000 Kenwingston Platz         Kenwings… Freehold   
## 2       3        2           980 Kenanga (Park View Court) <NA>      Freehold   
## 3       3        2          1000 Sri Lavender Apartment    TLS Group Freehold   
## 4       3        1           592 Flat Pandan Indah         <NA>      Leasehold  
## 5       1        1           467 i-Soho @ i-City           i-Berhad  Freehold   
## 6       3        2          1100 D'Piazza Condominium      X-Scan P… Freehold   
## # ℹ 34 more variables: Completion.Year <chr>, Total.Units <dbl>,
## #   Property.Type <chr>, Parking.Lot <dbl>, Floor.Range <chr>,
## #   Land.Title <chr>, price <dbl>, garden <dbl>, surau <dbl>, hospital <int>,
## #   basketball <dbl>, badminton <dbl>, gymnasium <int>, lift <int>,
## #   minimart <int>, parking <int>, playground <int>, sauna <int>,
## #   security <int>, busstop <dbl>, school <dbl>, mall <dbl>, railway <int>,
## #   highway <dbl>, barbequearea <int>, clubhouse <int>, joggingtrack <int>, …
# display the dimensions of the dataset
dim(data)
## [1] 3730   40
#data structure
str(data)
## tibble [3,730 × 40] (S3: tbl_df/tbl/data.frame)
##  $ Bedroom         : num [1:3730] 4 3 3 3 1 3 3 3 3 3 ...
##  $ Bathroom        : num [1:3730] 2 2 2 1 1 2 2 2 2 2 ...
##  $ Property.Size   : num [1:3730] 1000 980 1000 592 467 1100 780 852 918 850 ...
##  $ Building.Name   : chr [1:3730] "Kenwingston Platz" "Kenanga (Park View Court)" "Sri Lavender Apartment" "Flat Pandan Indah" ...
##  $ Developer       : chr [1:3730] "Kenwingston Group" NA "TLS Group" NA ...
##  $ Tenure.Type     : chr [1:3730] "Freehold" "Freehold" "Freehold" "Leasehold" ...
##  $ Completion.Year : chr [1:3730] NA NA "2007" NA ...
##  $ Total.Units     : num [1:3730] NA NA 445 NA 956 706 281 NA NA 435 ...
##  $ Property.Type   : chr [1:3730] "Service Residence" "Apartment" "Apartment" "Flat" ...
##  $ Parking.Lot     : num [1:3730] 2 1 1 1 0 1 1 1 1 2 ...
##  $ Floor.Range     : chr [1:3730] NA "Low" "Medium" NA ...
##  $ Land.Title      : chr [1:3730] "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" ...
##  $ price           : num [1:3730] 340000 250000 230000 158000 305000 425000 230000 200000 275000 300000 ...
##  $ garden          : num [1:3730] 1 0 0 0 0 0 0 0 0 0 ...
##  $ surau           : num [1:3730] 0 0 0 0 0 0 0 0 1 0 ...
##  $ hospital        : int [1:3730] 1 1 1 0 1 0 0 0 0 1 ...
##  $ basketball      : num [1:3730] 1 0 0 0 0 0 0 0 0 0 ...
##  $ badminton       : num [1:3730] 0 0 0 0 0 0 0 0 0 1 ...
##  $ gymnasium       : int [1:3730] 1 0 0 0 1 1 0 1 1 0 ...
##  $ lift            : int [1:3730] 0 0 1 0 0 1 0 1 1 0 ...
##  $ minimart        : int [1:3730] 0 0 1 1 1 1 0 1 1 0 ...
##  $ parking         : int [1:3730] 0 1 1 1 1 1 1 1 1 1 ...
##  $ playground      : int [1:3730] 0 1 1 1 0 1 1 1 1 0 ...
##  $ sauna           : int [1:3730] 0 0 0 0 0 1 0 0 0 0 ...
##  $ security        : int [1:3730] 0 1 1 0 1 1 1 1 1 1 ...
##  $ busstop         : num [1:3730] 1 0 0 0 1 0 0 1 1 1 ...
##  $ school          : num [1:3730] 1 0 0 0 1 0 0 1 1 0 ...
##  $ mall            : num [1:3730] 1 0 0 0 1 0 0 0 1 0 ...
##  $ railway         : int [1:3730] 0 0 1 0 0 0 0 0 0 0 ...
##  $ highway         : num [1:3730] 0 0 1 0 0 0 0 0 0 0 ...
##  $ barbequearea    : int [1:3730] 1 1 1 0 0 1 1 1 0 0 ...
##  $ clubhouse       : int [1:3730] 0 0 0 0 0 0 0 1 0 0 ...
##  $ joggingtrack    : int [1:3730] 1 1 1 1 0 0 0 1 0 0 ...
##  $ multipurposehall: int [1:3730] 0 0 0 0 0 1 0 1 1 1 ...
##  $ squashcourt     : int [1:3730] 1 0 0 0 0 0 0 0 0 0 ...
##  $ swimmingpool    : int [1:3730] 1 1 0 0 0 1 0 1 0 1 ...
##  $ tennis          : int [1:3730] 0 0 0 0 0 1 0 0 0 0 ...
##  $ State           : chr [1:3730] "Kuala Lumpur" "Melaka" "Selangor" "Selangor" ...
##  $ City            : chr [1:3730] "Setapak" "Melaka City" "Kajang" "Ampang" ...
##  $ No.of.Floors    : num [1:3730] NA NA 13 NA 43 19 5 12 NA 435 ...
# list data types for each features
sapply(data,class)
##          Bedroom         Bathroom    Property.Size    Building.Name 
##        "numeric"        "numeric"        "numeric"      "character" 
##        Developer      Tenure.Type  Completion.Year      Total.Units 
##      "character"      "character"      "character"        "numeric" 
##    Property.Type      Parking.Lot      Floor.Range       Land.Title 
##      "character"        "numeric"      "character"      "character" 
##            price           garden            surau         hospital 
##        "numeric"        "numeric"        "numeric"        "integer" 
##       basketball        badminton        gymnasium             lift 
##        "numeric"        "numeric"        "integer"        "integer" 
##         minimart          parking       playground            sauna 
##        "integer"        "integer"        "integer"        "integer" 
##         security          busstop           school             mall 
##        "integer"        "numeric"        "numeric"        "numeric" 
##          railway          highway     barbequearea        clubhouse 
##        "integer"        "numeric"        "integer"        "integer" 
##     joggingtrack multipurposehall      squashcourt     swimmingpool 
##        "integer"        "integer"        "integer"        "integer" 
##           tennis            State             City     No.of.Floors 
##        "integer"      "character"      "character"        "numeric"
#Change numeric to integer as it will represent discrete variable
data$Bedroom<-as.integer(data$Bedroom)
data$Bathroom<-as.integer(data$Bathroom)
data$Completion.Year<-as.integer(data$Completion.Year)
data$Total.Units<-as.integer(data$Total.Units)
data$Parking.Lot <-as.integer(data$Parking.Lot)
data$garden <-as.integer(data$garden)
data$surau <-as.integer(data$surau)
data$surau <-as.integer(data$surau)
data$basketball <-as.integer(data$basketball)
data$badminton <-as.integer(data$badminton)
data$busstop <-as.integer(data$busstop)
data$school <-as.integer(data$school)
data$mall <-as.integer(data$mall)
data$highway <-as.integer(data$highway)
data$No.of.Floors <-as.integer(data$No.of.Floors)

data$Developer <- as.factor(data$Developer)
data$Tenure.Type <- as.factor(data$Tenure.Type)
data$Property.Type <- as.factor(data$Property.Type)
data$Floor.Range <- as.factor(data$Floor.Range)
data$Land.Title <- as.factor(data$Land.Title)
data$State <- as.factor(data$State)
data$City <- as.factor(data$City)

#summarize dataset
summary(data)
##     Bedroom          Bathroom     Property.Size    Building.Name     
##  Min.   : 1.000   Min.   :1.000   Min.   :     1   Length:3730       
##  1st Qu.: 3.000   1st Qu.:2.000   1st Qu.:   750   Class :character  
##  Median : 3.000   Median :2.000   Median :   900   Mode  :character  
##  Mean   : 2.916   Mean   :2.018   Mean   :  1038                     
##  3rd Qu.: 3.000   3rd Qu.:2.000   3rd Qu.:  1116                     
##  Max.   :10.000   Max.   :8.000   Max.   :122774                     
##                                                                      
##                             Developer       Tenure.Type   Completion.Year
##  Ideal Property Group            :  67   Freehold :2264   Min.   :1985   
##  Belleview Group                 :  61   Leasehold:1466   1st Qu.:2006   
##  Asia Green Group                :  51                    Median :2014   
##  IJM LAND BERHAD                 :  49                    Mean   :2011   
##  Syarikat Perumahan Negara Berhad:  29                    3rd Qu.:2017   
##  (Other)                         :1908                    Max.   :2026   
##  NA's                            :1565                    NA's   :1829   
##   Total.Units               Property.Type   Parking.Lot     Floor.Range  
##  Min.   :   1.0   Condominium      :1585   Min.   : 0.000   High  : 781  
##  1st Qu.: 290.0   Apartment        :1402   1st Qu.: 0.000   Low   : 646  
##  Median : 462.0   Service Residence: 474   Median : 1.000   Medium:1315  
##  Mean   : 613.3   Flat             : 233   Mean   : 1.046   NA's  : 988  
##  3rd Qu.: 754.0   Others           :  14   3rd Qu.: 2.000                
##  Max.   :7810.0   Studio           :  13   Max.   :10.000                
##  NA's   :1721     (Other)          :   9                                 
##           Land.Title       price             garden           surau        
##  Bumi Lot      : 604   Min.   :  38000   Min.   :0.0000   Min.   :0.00000  
##  Malay Reserved:   7   1st Qu.: 250000   1st Qu.:0.0000   1st Qu.:0.00000  
##  Non Bumi Lot  :3119   Median : 350000   Median :0.0000   Median :0.00000  
##                        Mean   : 421198   Mean   :0.1016   Mean   :0.07962  
##                        3rd Qu.: 490000   3rd Qu.:0.0000   3rd Qu.:0.00000  
##                        Max.   :6016000   Max.   :1.0000   Max.   :1.00000  
##                                                                            
##     hospital       basketball        badminton         gymnasium     
##  Min.   :0.000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.000   Median :0.00000   Median :0.00000   Median :1.0000  
##  Mean   :0.181   Mean   :0.02386   Mean   :0.03941   Mean   :0.5185  
##  3rd Qu.:0.000   3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.000   Max.   :1.00000   Max.   :1.00000   Max.   :1.0000  
##                                                                      
##       lift           minimart         parking         playground    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :0.0000   Median :1.0000   Median :1.0000  
##  Mean   :0.5601   Mean   :0.4421   Mean   :0.7944   Mean   :0.7115  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                     
##      sauna           security         busstop           school      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :1.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.2796   Mean   :0.7879   Mean   :0.1834   Mean   :0.2461  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                     
##       mall           railway          highway         barbequearea   
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :0.0000  
##  Mean   :0.1228   Mean   :0.2617   Mean   :0.03673   Mean   :0.3807  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.0000  
##                                                                      
##    clubhouse       joggingtrack    multipurposehall  squashcourt    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1651   Mean   :0.3729   Mean   :0.3697   Mean   :0.1563  
##  3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                     
##   swimmingpool        tennis                State               City     
##  Min.   :0.0000   Min.   :0.0000   Selangor    :1224   Johor Bahru: 235  
##  1st Qu.:0.0000   1st Qu.:0.0000   Penang      : 952   Cheras     : 193  
##  Median :1.0000   Median :0.0000   Kuala Lumpur: 671   Ayer Itam  : 187  
##  Mean   :0.6249   Mean   :0.1818   Johor       : 393   Jelutong   : 166  
##  3rd Qu.:1.0000   3rd Qu.:0.0000   Sabah       : 160   Shah Alam  : 164  
##  Max.   :1.0000   Max.   :1.0000   Sarawak     : 120   Bayan Lepas: 120  
##                                    (Other)     : 210   (Other)    :2665  
##   No.of.Floors   
##  Min.   :  2.00  
##  1st Qu.: 12.00  
##  Median : 20.00  
##  Mean   : 21.77  
##  3rd Qu.: 28.00  
##  Max.   :504.00  
##  NA's   :1570
# Check for unique values in each column
unique_counts <- sapply(data, n_distinct)
print(unique_counts)
##          Bedroom         Bathroom    Property.Size    Building.Name 
##                8                8              839             1937 
##        Developer      Tenure.Type  Completion.Year      Total.Units 
##              581                2               41              502 
##    Property.Type      Parking.Lot      Floor.Range       Land.Title 
##                8                9                4                3 
##            price           garden            surau         hospital 
##              555                2                2                2 
##       basketball        badminton        gymnasium             lift 
##                2                2                2                2 
##         minimart          parking       playground            sauna 
##                2                2                2                2 
##         security          busstop           school             mall 
##                2                2                2                2 
##          railway          highway     barbequearea        clubhouse 
##                2                2                2                2 
##     joggingtrack multipurposehall      squashcourt     swimmingpool 
##                2                2                2                2 
##           tennis            State             City     No.of.Floors 
##                2               16              188               61
# Check for missing values in each column
sapply(data, function(x) sum(is.na(x)))
##          Bedroom         Bathroom    Property.Size    Building.Name 
##                0                0                0                0 
##        Developer      Tenure.Type  Completion.Year      Total.Units 
##             1565                0             1829             1721 
##    Property.Type      Parking.Lot      Floor.Range       Land.Title 
##                0                0              988                0 
##            price           garden            surau         hospital 
##                0                0                0                0 
##       basketball        badminton        gymnasium             lift 
##                0                0                0                0 
##         minimart          parking       playground            sauna 
##                0                0                0                0 
##         security          busstop           school             mall 
##                0                0                0                0 
##          railway          highway     barbequearea        clubhouse 
##                0                0                0                0 
##     joggingtrack multipurposehall      squashcourt     swimmingpool 
##                0                0                0                0 
##           tennis            State             City     No.of.Floors 
##                0                0                0             1570
# Calculate the percentage of missing values in each column 
sapply(data, function(x) sum(is.na(x)) / nrow(data)) * 100
##          Bedroom         Bathroom    Property.Size    Building.Name 
##          0.00000          0.00000          0.00000          0.00000 
##        Developer      Tenure.Type  Completion.Year      Total.Units 
##         41.95710          0.00000         49.03485         46.13941 
##    Property.Type      Parking.Lot      Floor.Range       Land.Title 
##          0.00000          0.00000         26.48794          0.00000 
##            price           garden            surau         hospital 
##          0.00000          0.00000          0.00000          0.00000 
##       basketball        badminton        gymnasium             lift 
##          0.00000          0.00000          0.00000          0.00000 
##         minimart          parking       playground            sauna 
##          0.00000          0.00000          0.00000          0.00000 
##         security          busstop           school             mall 
##          0.00000          0.00000          0.00000          0.00000 
##          railway          highway     barbequearea        clubhouse 
##          0.00000          0.00000          0.00000          0.00000 
##     joggingtrack multipurposehall      squashcourt     swimmingpool 
##          0.00000          0.00000          0.00000          0.00000 
##           tennis            State             City     No.of.Floors 
##          0.00000          0.00000          0.00000         42.09115

2.2.Data Visualization

Create visualizations to understand the distribution of variables.

Univariate Analysis:

# Based on the numeric columns to include in the box plot
numeric_columns <- c("Property.Size", "Completion.Year", "Total.Units", "Parking.Lot", "price", "No.of.Floors")

# A separate box plots for each numeric variable
par(mfrow = c(3, 2))  # Set the layout to 3 rows and 2 columns for the plots
for (col in numeric_columns) {
  boxplot(data[, col], col = "skyblue", main = paste("Box Plot for", col),horizontal = TRUE)
}

# Reset the layout to default
par(mfrow = c(1, 1))


#create histogram of values for Property.Size
ggplot(data=data, aes(x=Property.Size)) +
  geom_histogram(fill="steelblue", color="black") +
  ggtitle("Histogram of Property Size")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#create histogram of values for Completion.Year
ggplot(data=data, aes(x=Completion.Year)) +
  geom_histogram(fill="steelblue", color="black") +
  ggtitle("Histogram of Property's Completion Year")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1829 rows containing non-finite values (`stat_bin()`).

#create histogram of values for Total.Units
ggplot(data=data, aes(x=Total.Units)) +
  geom_histogram(fill="steelblue", color="black") +
  ggtitle("Histogram of Property's Total Units")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1721 rows containing non-finite values (`stat_bin()`).

#create histogram of values for Parking.Lot
ggplot(data=data, aes(x=Parking.Lot)) +
  geom_histogram(fill="steelblue", color="black") +
  ggtitle("Histogram of Property's Total Parking Lot")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#create histogram of values for price
ggplot(data=data, aes(x=price)) +
  geom_histogram(fill="steelblue", color="black") +
  ggtitle("Histogram of Property's Price Values")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#create histogram of values for No.of.Floors
ggplot(data=data, aes(x=No.of.Floors)) +
  geom_histogram(fill="steelblue", color="black") +
  ggtitle("Histogram of Property's No of Floors")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1570 rows containing non-finite values (`stat_bin()`).

# Top 10 categories for building name
top_10_bncounts <- head(sort(table(data$Building.Name), decreasing = TRUE), 10)

# Labels for the top 10 counts
top_10_bnlabels <- names(top_10_bncounts)

# Bar plot for the top 10 building name's categories with labels
barplot(top_10_bncounts, col = "skyblue", 
        main = "Bar Plot of Building.Name (Top 10)", ylab = "Count", 
        names.arg = top_10_bnlabels, las = 2)

# Top 10 categories for developer
top_10_dcounts <- head(sort(table(data$Developer), decreasing = TRUE), 10)

# Labels for the top 10 counts
top_10_dlabels <- names(top_10_dcounts)

# Bar plot for the top 10 developer categories with labels
barplot(top_10_dcounts, col = "skyblue", 
        main = "Bar Plot of Developer (Top 10)", ylab = "Count", 
        names.arg = top_10_dlabels, las = 2)

# Bar plot based on the number of bedrooms
barplot(table(data$Bedroom), col = "skyblue", main = "Bar Plot of Bedroom", xlab = "No of Bedroom", ylab = "Count")

# Bar plot based on the number of Bathroom
barplot(table(data$Bathroom), col = "skyblue", main = "Bar Plot of Bathroom", xlab = "No of Bathroom", ylab = "Count")

# Bar plot based on the number of Tenure.Type
ggplot(data, aes(x = Tenure.Type, fill = Tenure.Type)) + geom_bar(color = "black") + ggtitle("Bar Plot of Tenure Type") + xlab("Tenure Type") + ylab("Count") + theme_minimal()

# Bar plot based on the number of  Property.Type
ggplot(data, aes(x = Property.Type, fill = Property.Type)) + geom_bar(color = "black") + ggtitle("Bar Plot of Property Type") + xlab("Property Type") + ylab("Count") + theme_minimal()

# Bar plot based on the number of  Floor.Range
ggplot(data, aes(x = Floor.Range, fill = Floor.Range)) + geom_bar(color = "black") + ggtitle("Bar Plot of Floor Range") + xlab("Floor Range") + ylab("Count") + theme_minimal()

# bar plot based on the number of  State
ggplot(data, aes(x = State, fill = State)) + geom_bar(color = "black") + ggtitle("Bar Plot of State") + xlab("State") + ylab("Count") + theme_minimal()+theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Top 10 city categories
top_10_ccounts <- head(sort(table(data$City), decreasing = TRUE), 10)
top_10_clabels <- names(top_10_ccounts)

top_10_cdata <- data.frame(City = names(top_10_ccounts), Freq = as.vector(top_10_ccounts))

# Reorder the levels of City based on the frequency count
top_10_cdata$City <- factor(top_10_cdata$City, levels = top_10_cdata$City[order(top_10_cdata$Freq)])

ggplot(top_10_cdata, aes(x = City, y = Freq, fill = City)) +
  geom_bar(stat = "identity", color = "black") +
  ggtitle("Bar Plot of City (Top 10)") +
  xlab("City") +
  ylab("Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# List of variables to create plots for
variables <- c('garden', 'surau', 'basketball', 'badminton', 'gymnasium', 
               'lift', 'playground', 'sauna', 'security', 'barbequearea', 
               'clubhouse', 'joggingtrack')

# List to store individual ggplot objects
plots <- list()

#Loop through each variable and create a plot
for (variable in variables) {
  plot <- ggplot(data, aes(x = factor(.data[[variable]]), fill = factor(.data[[variable]]))) +
    geom_bar() +
    scale_fill_manual(values = c("red", "green"), labels = c("No", "Yes")) +
    scale_x_discrete(labels = c("No", "Yes")) +
    labs(x = variable, y = "Count") +
    theme_minimal() +
    theme(legend.position = "none")  # Remove the legend
  
  # Add the plot to the list
  plots[[variable]] <- plot
}

# Arrange and print the plots
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.3.2
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
grid.arrange(grobs = plots, ncol = 3)

# List of variables to create plots for
variables1<- c('multipurposehall', 'squashcourt', 
               'swimmingpool', 'tennis', 'hospital', 'school', 'minimart', 
               'mall', 'parking', 'busstop', 'railway', 'highway')

# Create a list to store individual ggplot objects
plots <- list()

#Loop through each variable and create a plot
for (variable in variables1) {
  plot <- ggplot(data, aes(x = factor(.data[[variable]]), fill = factor(.data[[variable]]))) +
    geom_bar() +
    scale_fill_manual(values = c("red", "green"), labels = c("No", "Yes")) +
    scale_x_discrete(labels = c("No", "Yes")) +
    labs(x = variable, y = "Count") +
    theme_minimal() +
    theme(legend.position = "none")  # Remove the legend
  
  # Add the plot to the list
  plots[[variable]] <- plot
}

# Arrange and print the plots
library(gridExtra)
grid.arrange(grobs = plots, ncol = 3)

Bivariate Analysis:

#create scatterplot of Property.Siz vs. price, using cut as color variable
ggplot(data=data, aes(y=Property.Size, x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Completion.Year vs. price, using cut as color variable
ggplot(data=data, aes(y=Completion.Year, x=price, color=State)) + 
  geom_point()
## Warning: Removed 1829 rows containing missing values (`geom_point()`).

#create scatterplot of Bedroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=Bedroom , x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Bathroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=Bathroom , x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Bedroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=Parking.Lot , x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Total.Units  vs. price, using cut as color variable
ggplot(data=data, aes(y=Total.Units , x=price, color=Floor.Range)) + 
  geom_point()
## Warning: Removed 1721 rows containing missing values (`geom_point()`).

#create scatterplot of Bedroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=No.of.Floors , x=price, color=Floor.Range)) + 
  geom_point()
## Warning: Removed 1570 rows containing missing values (`geom_point()`).

correlation_test <- cor.test(data$Property.Size, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Property.Size and data$price
## t = 11.368, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1518334 0.2138725
## sample estimates:
##       cor 
## 0.1830352
#There is a positive correlation between the property size and the price. As the property size increases, the price tends to increase.

correlation_test <- cor.test(data$Total.Units, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Total.Units and data$price
## t = -5.3906, df = 2007, p-value = 7.847e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.16235060 -0.07613132
## sample estimates:
##        cor 
## -0.1194662
#There is a negative correlation between the total number of units and the price. As the total number of units increases, the price tends to decrease.

correlation_test <- cor.test(data$No.of.Floors, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$No.of.Floors and data$price
## t = 4.8263, df = 2158, p-value = 1.489e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.0614283 0.1448811
## sample estimates:
##       cor 
## 0.1033366
#There is a positive correlation between the number of floors and the price. As the number of floors increases, the price tends to increase.

correlation_test <- cor.test(data$Bathroom, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Bathroom and data$price
## t = 43.172, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5555288 0.5983366
## sample estimates:
##       cor 
## 0.5773293
#There is a strong positive correlation between the number of bathrooms and the price. As the number of bathrooms increases, the price tends to increase significantly.

correlation_test <- cor.test(data$Bedroom, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Bedroom and data$price
## t = 20.499, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2891398 0.3468308
## sample estimates:
##       cor 
## 0.3182799
#There is a moderate positive correlation between the number of bedrooms and the price. As the number of bedrooms increases, the price tends to increase.

correlation_test <- cor.test(data$Parking.Lot, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Parking.Lot and data$price
## t = 29.774, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4120109 0.4638772
## sample estimates:
##       cor 
## 0.4383088
#There is a strong positive correlation between the presence of a parking lot and the price. Properties with parking lots tend to have higher prices.

correlation_test <- cor.test(data$Completion.Year, data$price)
print(correlation_test)
## 
##  Pearson's product-moment correlation
## 
## data:  data$Completion.Year and data$price
## t = 9.8411, df = 1899, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1770773 0.2626386
## sample estimates:
##       cor 
## 0.2202816
#There is a positive correlation between the completion year and the price. Generally, more recently completed properties tend to have higher prices.

#Use ANOVA when comparing means of a numeric variable across different groups (categorical variable with more than two levels).Use Chi-Square when testing the association or independence between two categorical variables.

# Assuming 'Category' is a categorical variable and 'DependentOutput' is numeric
chi_sq_result <- chisq.test(data$Tenure.Type, data$price)
## Warning in chisq.test(data$Tenure.Type, data$price): Chi-squared approximation
## may be incorrect
print(chi_sq_result)
## 
##  Pearson's Chi-squared test
## 
## data:  data$Tenure.Type and data$price
## X-squared = 739.94, df = 554, p-value = 1.885e-07
# Check if the p-value is less than 0.05
if (chi_sq_result$p.value <= 0.05) {
  cat("The association is statistically significant.\n")
} else {
  cat("The association is not statistically significant.\n")
}
## The association is statistically significant.
#We perform ANOVA test when the category has more than 2 parameter
# Perform ANOVA for Developer and Price
anova_result <- aov(price ~ Developer, data = data)

# Print the result
print(summary(anova_result))
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## Developer    579 1.676e+14 2.895e+11   3.322 <2e-16 ***
## Residuals   1585 1.381e+14 8.715e+10                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1565 observations deleted due to missingness
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Developer", "Pr(>F)"] <= 0.05) {
  cat("The means are significantly different.\n")
} else {
  cat("There is no significant difference in means.\n")
}
## The means are significantly different.
# Perform ANOVA for Floor.Range and Price
anova_result <- aov(price ~ Floor.Range, data = data)

# Print the result
print(summary(anova_result))
##               Df    Sum Sq   Mean Sq F value   Pr(>F)    
## Floor.Range    2 4.817e+12 2.409e+12   22.05 3.15e-10 ***
## Residuals   2739 2.991e+14 1.092e+11                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 988 observations deleted due to missingness
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Floor.Range", "Pr(>F)"] <= 0.05) {
  cat("The means are significantly different.\n")
} else {
  cat("There is no significant difference in means.\n")
}
## The means are significantly different.
# Perform ANOVA for Property.Type and Price
anova_result <- aov(price ~ Property.Type, data = data)

# Print the result
print(summary(anova_result))
##                 Df    Sum Sq   Mean Sq F value Pr(>F)    
## Property.Type    7 8.376e+13 1.197e+13   140.7 <2e-16 ***
## Residuals     3722 3.166e+14 8.507e+10                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Property.Type", "Pr(>F)"] <= 0.05) {
  cat("The means are significantly different.\n")
} else {
  cat("There is no significant difference in means.\n")
}
## The means are significantly different.
# Perform ANOVA for Land.Title and Price
anova_result <- aov(price ~ Land.Title, data = data)

# Print the result
print(summary(anova_result))
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## Land.Title     2 1.235e+13 6.176e+12   59.32 <2e-16 ***
## Residuals   3727 3.880e+14 1.041e+11                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Land.Title", "Pr(>F)"] <= 0.05) {
  cat("The means are significantly different.\n")
} else {
  cat("There is no significant difference in means.\n")
}
## The means are significantly different.
## Perform ANOVA for State and Price
anova_result <- aov(price ~ State, data = data)

# Print the result
print(summary(anova_result))
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## State         15 4.891e+13 3.261e+12   34.45 <2e-16 ***
## Residuals   3714 3.515e+14 9.463e+10                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["State", "Pr(>F)"] <= 0.05) {
  cat("The means are significantly different.\n")
} else {
  cat("There is no significant difference in means.\n")
}
## The means are significantly different.
## Perform ANOVA for City and Price
anova_result <- aov(price ~ City, data = data)

# Print the result
print(summary(anova_result))
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## City         187 1.089e+14 5.822e+11   7.075 <2e-16 ***
## Residuals   3542 2.915e+14 8.230e+10                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["City", "Pr(>F)"] <= 0.05) {
  cat("The means are significantly different.\n")
} else {
  cat("There is no significant difference in means.\n")
}
## The means are significantly different.

2.3.Data Pre-processing and Feature Engineering

Data pre-processing and feature engineering are crucial steps in the data analysis and machine learning pipeline. They involve preparing and transforming the raw data to make it suitable for analysis or model training.

#Outlier Removal

#Using Tukey's method for identifying outliers in Property Size.
# Set a threshold for outliers (e.g., values outside the range of Q1 - 1.5*IQR to Q3 + 1.5*IQR)
Q1 <- quantile(data$Property.Size, 0.25)
Q3 <- quantile(data$Property.Size, 0.75)
IQR <- Q3 - Q1


# Define the lower and upper bounds for outliers
lower_bound <- Q1 - 1.5* IQR
upper_bound <- Q3 + 1.5 * IQR

# Remove outliers
data <- data[data$Property.Size >= lower_bound & data$Property.Size <= upper_bound, ]

#Outlier Adjustment
Adata <- data[complete.cases(data$No.of.Floors), ]
Bdata <- data[complete.cases(data$Total.Units), ]
Cdata <- data[complete.cases(data$Parking.Lot), ]
Ddata <- data[complete.cases(data$Bathroom), ]

# Winsorization function
winsorize <- function(x, trim = 0.05) {
  q <- quantile(x, c(trim, 1 - trim), na.rm = TRUE)
  x[x < q[1]] <- q[1]
  x[x > q[2]] <- q[2]
  return(x)
}
# Apply winsorization to No.of.Floors
Adjusted_medianNF <- median(winsorize(Adata$No.of.Floors))
Adjusted_medianTU <- median(winsorize(Bdata$Total.Units))
Adjusted_medianPL <- median(winsorize(Cdata$Parking.Lot))
Adjusted_medianb <- median(winsorize(Ddata$Bathroom))

# Impute outliers with the median
data$No.of.Floors[which(data$No.of.Floors >120)]<-Adjusted_medianNF 
data$Total.Units[which(data$Total.Units >4000)] <-Adjusted_medianTU
data$Parking.Lot[which(data$Parking.Lot >6)] <-Adjusted_medianPL
data$Bathroom[which(data$Bathroom >6)] <-Adjusted_medianb

##Imputation
# Install and load the necessary packages
library(VIM)
## Warning: package 'VIM' was built under R version 4.3.2
## Loading required package: colorspace
## Warning: package 'colorspace' was built under R version 4.3.2
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
## 
##     sleep
# Perform k-nearest neighbors imputation on 'Completion.Year'
imputed_data <- kNN(data , variable = "Completion.Year")
imputed_data1 <- kNN(data , variable = "Floor.Range")
imputed_data2 <-kNN(data,variable="No.of.Floors")
imputed_data3 <-kNN(data,variable="Total.Units")
imputed_data4 <-kNN(data,variable="Developer")


# Replace the imputed 'Completion.Year' back into the original dataset
data$Completion.Year <- imputed_data$Completion.Year
data$Floor.Range <- imputed_data1$Floor.Range
data$No.of.Floors <- imputed_data2$No.of.Floors
data$Total.Units <- imputed_data3$Total.Units
data$Developer <- imputed_data4$Developer


#Feature engineering

#Calculate the age of the property by subtracting the Completion Year from the current year.
data <- subset(data, Completion.Year < 2024)
data$Property.Age <- as.numeric(format(Sys.Date(), "%Y")) - data$Completion.Year

#Check for missing values in each column
sapply(data, function(x) sum(is.na(x)))
##          Bedroom         Bathroom    Property.Size    Building.Name 
##                0                0                0                0 
##        Developer      Tenure.Type  Completion.Year      Total.Units 
##                0                0                0                0 
##    Property.Type      Parking.Lot      Floor.Range       Land.Title 
##                0                0                0                0 
##            price           garden            surau         hospital 
##                0                0                0                0 
##       basketball        badminton        gymnasium             lift 
##                0                0                0                0 
##         minimart          parking       playground            sauna 
##                0                0                0                0 
##         security          busstop           school             mall 
##                0                0                0                0 
##          railway          highway     barbequearea        clubhouse 
##                0                0                0                0 
##     joggingtrack multipurposehall      squashcourt     swimmingpool 
##                0                0                0                0 
##           tennis            State             City     No.of.Floors 
##                0                0                0                0 
##     Property.Age 
##                0
#Count the total number of facilities provided for each property.
data$Total_Facilities <- rowSums(data[, c('garden', 'surau', 'basketball', 'badminton', 'gymnasium', 'lift', 'playground', 'sauna', 'security', 'barbequearea', 'clubhouse', 'joggingtrack', 'multipurposehall', 'squashcourt', 'swimmingpool', 'tennis')])

#Count the total number of surrounding amenities near for each property.
data$Total_surroundingamenities<- rowSums(data[, c('hospital','school','minimart','mall')])

#Count the total number of transportation advantage for each property.
data$Total_surroundingamenities<- rowSums(data[, c('parking','busstop','railway','highway')])

# Drop the specified columns
data <- data[, -which(names(data) %in% c(
  'garden', 'surau', 'basketball', 'badminton', 'gymnasium', 'lift', 'playground', 'sauna', 'security',
  'barbequearea', 'clubhouse', 'joggingtrack', 'multipurposehall', 'squashcourt', 'swimmingpool', 'tennis',
  'hospital', 'school', 'minimart', 'mall', 'parking', 'busstop', 'railway', 'highway'
))]
summary(data)
##     Bedroom        Bathroom     Property.Size    Building.Name     
##  Min.   :1.00   Min.   :1.000   Min.   : 280.0   Length:3510       
##  1st Qu.:3.00   1st Qu.:2.000   1st Qu.: 736.0   Class :character  
##  Median :3.00   Median :2.000   Median : 887.0   Mode  :character  
##  Mean   :2.86   Mean   :1.943   Mean   : 928.3                     
##  3rd Qu.:3.00   3rd Qu.:2.000   3rd Qu.:1091.0                     
##  Max.   :5.00   Max.   :5.000   Max.   :1661.0                     
##                                                                    
##                             Developer       Tenure.Type   Completion.Year
##  Tasek Maju Realty Sdn Bhd       : 101   Freehold :2118   Min.   :1985   
##  Ideal Property Group            :  98   Leasehold:1392   1st Qu.:2005   
##  Syarikat Perumahan Negara Berhad:  90                    Median :2014   
##  Belleview Group                 :  85                    Mean   :2011   
##  Hunza Properties Berhad         :  73                    3rd Qu.:2016   
##  Asia Green Group                :  59                    Max.   :2023   
##  (Other)                         :3004                                   
##   Total.Units               Property.Type   Parking.Lot     Floor.Range  
##  Min.   :   1.0   Condominium      :1437   Min.   :0.0000   High  : 939  
##  1st Qu.: 280.0   Apartment        :1366   1st Qu.:0.0000   Low   : 800  
##  Median : 420.0   Service Residence: 453   Median :1.0000   Medium:1771  
##  Mean   : 518.5   Flat             : 230   Mean   :0.9823                
##  3rd Qu.: 670.0   Studio           :  13   3rd Qu.:2.0000                
##  Max.   :3600.0   Others           :   7   Max.   :5.0000                
##                   (Other)          :   4                                 
##           Land.Title       price                  State               City     
##  Bumi Lot      : 576   Min.   :  38000   Selangor    :1184   Johor Bahru: 214  
##  Malay Reserved:   4   1st Qu.: 248000   Penang      : 890   Cheras     : 190  
##  Non Bumi Lot  :2930   Median : 344000   Kuala Lumpur: 628   Ayer Itam  : 180  
##                        Mean   : 382530   Johor       : 363   Jelutong   : 163  
##                        3rd Qu.: 470000   Sabah       : 143   Shah Alam  : 161  
##                        Max.   :2300000   Sarawak     : 109   Kajang     : 117  
##                                          (Other)     : 193   (Other)    :2485  
##   No.of.Floors    Property.Age  Total_Facilities Total_surroundingamenities
##  Min.   : 2.00   Min.   : 1.0   Min.   : 0.000   Min.   :0.000             
##  1st Qu.: 5.25   1st Qu.: 8.0   1st Qu.: 2.000   1st Qu.:1.000             
##  Median :16.00   Median :10.0   Median : 5.000   Median :1.000             
##  Mean   :17.39   Mean   :13.3   Mean   : 5.202   Mean   :1.281             
##  3rd Qu.:24.00   3rd Qu.:19.0   3rd Qu.: 8.000   3rd Qu.:2.000             
##  Max.   :63.00   Max.   :39.0   Max.   :15.000   Max.   :4.000             
## 

2.4.Grouped Analysis

Explore patterns within subgroups of the data.

# list data types for each features
sapply(data,class)
##                    Bedroom                   Bathroom 
##                  "integer"                  "numeric" 
##              Property.Size              Building.Name 
##                  "numeric"                "character" 
##                  Developer                Tenure.Type 
##                   "factor"                   "factor" 
##            Completion.Year                Total.Units 
##                  "integer"                  "numeric" 
##              Property.Type                Parking.Lot 
##                   "factor"                  "numeric" 
##                Floor.Range                 Land.Title 
##                   "factor"                   "factor" 
##                      price                      State 
##                  "numeric"                   "factor" 
##                       City               No.of.Floors 
##                   "factor"                  "numeric" 
##               Property.Age           Total_Facilities 
##                  "numeric"                  "numeric" 
## Total_surroundingamenities 
##                  "numeric"
#correlation test
numeric_data <- data[, sapply(data, is.numeric)]

# Now, you can use pairs function
pairs(numeric_data)

sum(is.na(numeric_data))
## [1] 0
correlation_matrix <- cor(numeric_data)
print(correlation_matrix)
##                                Bedroom    Bathroom Property.Size
## Bedroom                     1.00000000  0.59292229    0.52388971
## Bathroom                    0.59292229  1.00000000    0.60440645
## Property.Size               0.52388971  0.60440645    1.00000000
## Completion.Year            -0.09555783  0.03538928    0.14232579
## Total.Units                -0.07776785 -0.03417073   -0.06417774
## Parking.Lot                 0.17974323  0.28891621    0.39726460
## price                       0.13985341  0.38646225    0.62066965
## No.of.Floors               -0.01478028  0.17618437    0.25103494
## Property.Age                0.09555783 -0.03538928   -0.14232579
## Total_Facilities            0.01521986  0.19884093    0.32962134
## Total_surroundingamenities  0.05811323  0.07991480    0.07324648
##                            Completion.Year Total.Units Parking.Lot       price
## Bedroom                        -0.09555783 -0.07776785  0.17974323  0.13985341
## Bathroom                        0.03538928 -0.03417073  0.28891621  0.38646225
## Property.Size                   0.14232579 -0.06417774  0.39726460  0.62066965
## Completion.Year                 1.00000000  0.21945101  0.20697150  0.27664688
## Total.Units                     0.21945101  1.00000000  0.07962889  0.04601732
## Parking.Lot                     0.20697150  0.07962889  1.00000000  0.45607481
## price                           0.27664688  0.04601732  0.45607481  1.00000000
## No.of.Floors                    0.30651056  0.37117103  0.36711262  0.48473039
## Property.Age                   -1.00000000 -0.21945101 -0.20697150 -0.27664688
## Total_Facilities                0.28646226  0.18208918  0.38858592  0.37049140
## Total_surroundingamenities     -0.02971119  0.11298602  0.07970466  0.05032444
##                            No.of.Floors Property.Age Total_Facilities
## Bedroom                     -0.01478028   0.09555783       0.01521986
## Bathroom                     0.17618437  -0.03538928       0.19884093
## Property.Size                0.25103494  -0.14232579       0.32962134
## Completion.Year              0.30651056  -1.00000000       0.28646226
## Total.Units                  0.37117103  -0.21945101       0.18208918
## Parking.Lot                  0.36711262  -0.20697150       0.38858592
## price                        0.48473039  -0.27664688       0.37049140
## No.of.Floors                 1.00000000  -0.30651056       0.47627068
## Property.Age                -0.30651056   1.00000000      -0.28646226
## Total_Facilities             0.47627068  -0.28646226       1.00000000
## Total_surroundingamenities   0.11974253   0.02971119       0.33400489
##                            Total_surroundingamenities
## Bedroom                                    0.05811323
## Bathroom                                   0.07991480
## Property.Size                              0.07324648
## Completion.Year                           -0.02971119
## Total.Units                                0.11298602
## Parking.Lot                                0.07970466
## price                                      0.05032444
## No.of.Floors                               0.11974253
## Property.Age                               0.02971119
## Total_Facilities                           0.33400489
## Total_surroundingamenities                 1.00000000
#create scatterplot of Property.Siz vs. price, using cut as color variable
ggplot(data=data, aes(y=price, x=Property.Size, color=Property.Type)) + 
  geom_point()

#create scatterplot of Completion.Year vs. price, using cut as color variable
ggplot(data=data, aes(y=Completion.Year, x=price, color=State)) + 
  geom_point()

#create scatterplot of Bedroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=Bedroom , x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Bathroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=Bathroom , x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Bedroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=Parking.Lot , x=price, color=Property.Type)) + 
  geom_point()

#create scatterplot of Total.Units  vs. price, using cut as color variable
ggplot(data=data, aes(y=Total.Units , x=price, color=Floor.Range)) + 
  geom_point()

#create scatterplot of Bedroom  vs. price, using cut as color variable
ggplot(data=data, aes(y=No.of.Floors , x=price, color=Floor.Range)) + 
  geom_point()

# Select numeric variables
numeric_data <- subset(data, select = c("Bedroom", "Bathroom", "Property.Size", "Completion.Year", "Total.Units", "price", "No.of.Floors", "Property.Age", "Total_Facilities", "Total_surroundingamenities"))

# Calculate the correlation matrix
cor_matrix <- cor(numeric_data)

# Convert the correlation matrix to long format
cor_long <- reshape2::melt(cor_matrix)

# Create a ggplot heatmap
ggplot(cor_long, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = 'red') +
  labs(title = "Correlation Heatmap", x = "Variables", y = "Variables")+theme(axis.text.x = element_text(angle = 45, hjust = 1))

Numerical Insight

Positive Correlations:

The number of bedrooms and bathrooms have a strong positive correlation (0.57). Property size and the number of bedrooms also show a positive correlation (0.52). Property size and the number of bathrooms also show a positive correlation (0.58). Property size and the price of the property exhibit a positive correlation (0.62). The number of bathrooms and the price of the property have a positive correlation (0.37). Total number of units and the number of floors show a positive correlation (0.37). Parking lot and the price of the property have a positive correlation (0.46). The positive correlation (0.37) between the total number of facilities and the price implies that properties with more facilities may have higher prices.

Negative Correlations:

The completion year and property age have a negative correlation (-1.00), indicating that as the completion year increases, the property age decreases. The price and Property.Age have a slight negative correlation ( -0.28).This suggests that, on average, newer properties may be priced higher than older ones.

Other Observations:

The completion year and the number of floors, as well as the completion year and the total number of units, have positive correlations. Property age has negative correlations with completion year and the total number of facilities.

Strong Correlations:

There are strong positive correlations between “Total_Facilities” and the number of floors (0.48), as well as between “Total_surroundingamenities” and “Total_Facilities” (0.33). The strong positive correlation (0.48) between the total number of facilities and the number of floors suggests that properties with more floors may have more facilities.

#bar plot for each categorical variable with mean price
ggplot(data, aes(x = Floor.Range, y = price, fill = Floor.Range)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across Floor Ranges",
       x = "Floor.Range",
       y = "Mean Price")

ggplot(data, aes(x = Tenure.Type, y = price, fill = Tenure.Type)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across Tenure Type",
       x = " Tenure Type",
       y = "Mean Price")

ggplot(data, aes(x = Property.Type, y = price, fill = Property.Type)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across Property Type",
       x = "Property Type",
       y = "Mean Price")+theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(data, aes(x = Land.Title, y = price, fill = Land.Title)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across Land Title",
       x = "Land Title",
       y = "Mean Price")

ggplot(data, aes(x = State, y = price, fill = Property.Type)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across State",
       x = "State",
       y = "Mean Price")+theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(data, aes(x = Bedroom, y = price, fill = Property.Type)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across Bedroom",
       x = "Bedroom",
       y = "Mean Price")

ggplot(data, aes(x = Bathroom, y = price, fill = Property.Type)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Price across Bathroom",
       x = "Bathroom",
       y = "Mean Price")

ggplot(data, aes(x = Property.Type, y = Property.Size, fill = Property.Type)) +
  stat_summary(fun = "mean", geom = "bar", position = "dodge") +
  labs(title = "Comparison of Mean Property Size across Property Type",
       x = "Property.Type",
       y = "Property.Size")+theme(axis.text.x = element_text(angle = 45, hjust = 1))

Categorical Insight

The property price tends to increase with a higher floor range, suggesting a positive correlation between floor range and average property price.

Freehold properties are generally more expensive than leasehold properties.

Condominiums typically have the highest average prices, while flats tend to have lower prices among different property types.

Non-bumi lot properties are generally priced higher than bumi lot properties.

Penang stands out with the highest property prices, likely attributed to the greater availability of service residences and condominiums compared to other areas.

Service residences, among various property types, tend to have the highest number of bathrooms.

Townhouse condos are observed to have the highest prices compared to other property types.

Property size exhibits a notable correlation with property type, with flats commonly being smaller and condominiums or service residences having larger sizes and higher prices.

Property prices are influenced by various factors, including the number of floors, property size, the number of bathrooms, and the total number of facilities.

The heatmap reveals correlations indicating that property size, the number of bathrooms, and the number of bedrooms are interrelated. Additionally, property price is notably affected by the number of floors, property size, the number of bathrooms, and the number of facilities.

3. Data Modelling

3.1 Classification

3.1.1. Data Preparation and Exploration

In this section, we load and prepare the dataset for analysis. The data is read from a CSV file, and specific columns are selected and converted to factors. This step is crucial for understanding the structure and type of data we are dealing with, which informs further data processing and analysis.

# Reading the data - replace 'file_path' with the actual path of your CSV file

data <- data[, c(1, 2, 3, 7, 8, 10, 16, 17, 18 ,19, 4, 5, 9, 11 ,12, 14, 15, 13, 6 )]
data <- as.data.frame(lapply(data, as.factor))
sapply(data,class)
##                    Bedroom                   Bathroom 
##                   "factor"                   "factor" 
##              Property.Size            Completion.Year 
##                   "factor"                   "factor" 
##                Total.Units                Parking.Lot 
##                   "factor"                   "factor" 
##               No.of.Floors               Property.Age 
##                   "factor"                   "factor" 
##           Total_Facilities Total_surroundingamenities 
##                   "factor"                   "factor" 
##              Building.Name                  Developer 
##                   "factor"                   "factor" 
##              Property.Type                Floor.Range 
##                   "factor"                   "factor" 
##                 Land.Title                      State 
##                   "factor"                   "factor" 
##                       City                      price 
##                   "factor"                   "factor" 
##                Tenure.Type 
##                   "factor"
# Classification And REgression Training
library(caret)
# Classification and Visualisation (Naive Bayes)
library(klaR)
## Warning: package 'klaR' was built under R version 4.3.2
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
# Classification and Regression with Random Forest
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.3.2
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
## The following object is masked from 'package:dplyr':
## 
##     combine
# Classification functions for k-nearest neighbour
library(class)
## Warning: package 'class' was built under R version 4.3.2
# Machine Learning Benchmark Problems
library(mlbench)
## Warning: package 'mlbench' was built under R version 4.3.2
# Multivariate regression methods (PLSR)
library(pls)
## Warning: package 'pls' was built under R version 4.3.2
## 
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
## 
##     R2
## The following object is masked from 'package:stats':
## 
##     loadings
library(caret)
library(klaR)
library(randomForest)
library(class)
library(mlbench)
library(pls)
library(ggplot2)


trainIndex<-createDataPartition(data$Tenure.Type, p=0.80,list=F)
data_train<-data[trainIndex,]
data_test<-data[-trainIndex,]  

3.1.2. Naive Bayes Model for Classification

The dataset is split into training and testing sets. Post model fitting, we predict on the test data and evaluate the model’s performance using a confusion matrix. Visualization of the confusion matrix is also provided for better understanding of the model’s performance.

## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 1
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 2
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 3
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 4
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 5
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 6
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 7
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 8
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 9
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 10
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 11
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 12
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 13
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 14
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 15
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 16
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 17
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 18
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 19
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 20
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 21
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 22
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 23
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 24
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 25
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 26
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 27
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 28
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 29
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 30
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 31
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 32
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 33
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 34
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 35
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 36
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 37
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 38
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 39
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 40
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 41
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 42
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 43
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 44
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 45
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 46
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 47
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 48
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 49
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 50
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 51
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 52
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 53
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 54
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 55
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 56
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 57
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 58
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 59
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 60
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 61
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 62
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 63
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 64
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 65
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 66
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 67
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 68
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 69
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 70
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 71
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 72
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 73
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 74
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 75
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 76
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 77
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 78
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 79
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 80
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 81
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 82
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 83
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 84
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 85
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 86
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 87
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 88
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 89
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 90
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 91
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 92
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 93
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 94
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 95
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 96
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 97
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 98
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 99
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 100
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 101
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 102
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 103
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 104
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 105
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 106
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 107
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 108
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 109
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 110
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 111
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 112
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 113
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 114
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 115
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 116
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 117
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 118
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 119
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 120
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 121
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 122
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 123
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 124
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 125
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 126
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 127
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 128
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 129
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 130
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 131
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 132
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 133
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 134
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 135
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 136
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 137
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 138
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 139
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 140
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 141
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 142
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 143
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 144
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 145
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 146
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 147
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 148
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 149
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 150
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 151
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 152
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 153
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 154
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 155
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 156
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 157
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 158
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 159
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 160
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 161
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 162
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 163
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 164
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 165
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 166
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 167
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 168
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 169
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 170
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 171
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 172
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 173
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 174
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 175
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 176
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 177
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 178
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 179
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 180
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 181
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 182
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 183
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 184
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 185
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 186
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 187
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 188
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 189
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 190
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 191
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 192
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 193
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 194
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 195
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 196
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 197
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 198
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 199
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 200
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 201
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 202
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 203
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 204
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 205
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 206
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 207
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 208
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 209
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 210
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 211
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 212
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 213
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 214
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 215
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 216
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 217
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 218
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 219
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 220
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 221
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 222
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 223
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 224
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 225
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 226
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 227
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 228
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 229
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 230
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 231
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 232
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 233
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 234
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 235
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 236
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 237
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 238
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 239
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 240
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 241
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 242
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 243
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 244
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 245
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 246
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 247
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 248
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 249
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 250
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 251
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 252
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 253
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 254
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 255
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 256
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 257
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 258
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 259
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 260
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 261
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 262
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 263
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 264
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 265
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 266
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 267
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 268
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 269
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 270
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 271
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 272
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 273
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 274
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 275
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 276
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 277
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 278
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 279
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 280
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 281
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 282
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 283
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 284
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 285
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 286
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 287
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 288
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 289
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 290
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 291
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 292
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 293
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 294
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 295
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 296
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 297
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 298
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 299
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 300
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 301
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 302
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 303
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 304
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 305
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 306
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 307
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 308
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 309
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 310
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 311
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 312
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 313
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 314
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 315
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 316
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 317
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 318
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 319
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 320
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 321
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 322
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 323
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 324
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 325
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 326
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 327
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 328
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 329
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 330
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 331
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 332
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 333
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 334
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 335
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 336
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 337
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 338
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 339
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 340
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 341
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 342
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 343
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 344
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 345
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 346
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 347
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 348
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 349
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 350
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 351
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 352
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 353
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 354
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 355
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 356
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 357
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 358
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 359
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 360
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 361
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 362
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 363
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 364
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 365
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 366
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 367
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 368
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 369
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 370
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 371
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 372
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 373
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 374
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 375
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 376
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 377
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 378
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 379
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 380
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 381
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 382
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 383
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 384
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 385
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 386
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 387
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 388
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 389
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 390
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 391
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 392
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 393
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 394
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 395
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 396
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 397
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 398
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 399
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 400
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 401
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 402
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 403
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 404
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 405
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 406
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 407
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 408
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 409
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 410
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 411
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 412
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 413
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 414
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 415
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 416
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 417
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 418
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 419
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 420
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 421
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 422
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 423
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 424
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 425
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 426
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 427
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 428
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 429
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 430
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 431
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 432
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 433
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 434
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 435
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 436
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 437
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 438
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 439
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 440
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 441
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 442
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 443
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 444
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 445
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 446
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 447
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 448
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 449
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 450
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 451
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 452
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 453
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 454
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 455
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 456
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 457
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 458
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 459
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 460
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 461
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 462
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 463
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 464
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 465
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 466
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 467
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 468
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 469
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 470
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 471
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 472
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 473
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 474
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 475
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 476
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 477
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 478
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 479
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 480
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 481
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 482
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 483
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 484
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 485
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 486
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 487
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 488
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 489
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 490
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 491
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 492
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 493
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 494
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 495
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 496
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 497
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 498
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 499
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 500
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 501
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 502
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 503
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 504
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 505
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 506
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 507
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 508
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 509
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 510
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 511
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 512
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 513
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 514
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 515
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 516
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 517
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 518
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 519
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 520
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 521
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 522
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 523
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 524
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 525
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 526
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 527
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 528
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 529
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 530
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 531
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 532
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 533
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 534
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 535
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 536
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 537
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 538
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 539
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 540
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 541
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 542
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 543
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 544
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 545
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 546
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 547
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 548
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 549
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 550
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 551
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 552
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 553
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 554
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 555
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 556
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 557
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 558
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 559
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 560
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 561
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 562
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 563
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 564
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 565
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 566
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 567
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 568
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 569
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 570
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 571
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 572
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 573
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 574
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 575
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 576
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 577
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 578
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 579
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 580
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 581
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 582
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 583
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 584
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 585
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 586
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 587
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 588
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 589
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 590
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 591
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 592
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 593
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 594
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 595
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 596
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 597
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 598
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 599
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 600
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 601
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 602
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 603
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 604
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 605
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 606
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 607
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 608
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 609
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 610
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 611
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 612
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 613
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 614
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 615
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 616
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 617
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 618
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 619
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 620
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 621
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 622
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 623
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 624
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 625
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 626
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 627
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 628
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 629
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 630
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 631
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 632
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 633
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 634
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 635
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 636
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 637
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 638
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 639
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 640
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 641
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 642
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 643
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 644
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 645
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 646
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 647
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 648
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 649
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 650
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 651
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 652
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 653
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 654
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 655
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 656
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 657
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 658
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 659
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 660
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 661
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 662
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 663
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 664
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 665
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 666
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 667
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 668
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 669
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 670
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 671
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 672
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 673
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 674
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 675
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 676
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 677
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 678
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 679
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 680
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 681
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 682
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 683
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 684
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 685
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 686
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 687
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 688
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 689
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 690
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 691
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 692
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 693
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 694
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 695
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 696
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 697
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 698
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 699
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 700
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 701
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Freehold Leasehold
##   Freehold       366        44
##   Leasehold       57       234
##                                           
##                Accuracy : 0.8559          
##                  95% CI : (0.8277, 0.8811)
##     No Information Rate : 0.6034          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.7014          
##                                           
##  Mcnemar's Test P-Value : 0.2325          
##                                           
##             Sensitivity : 0.8652          
##             Specificity : 0.8417          
##          Pos Pred Value : 0.8927          
##          Neg Pred Value : 0.8041          
##              Prevalence : 0.6034          
##          Detection Rate : 0.5221          
##    Detection Prevalence : 0.5849          
##       Balanced Accuracy : 0.8535          
##                                           
##        'Positive' Class : Freehold        
## 
## Recall: 0.8652482

3.1.3. Support Vector Machine (SVM) Model

In this part, we develop a Support Vector Machine (SVM) model with a radial kernel. This model is again aimed at classifying ‘Tenure Type’. We train the model, make predictions, and assess its performance through a confusion matrix. The recall metric is also calculated to evaluate the model’s ability to correctly identify positive classes.

## Warning: package 'e1071' was built under R version 4.3.2
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Freehold Leasehold
##   Freehold       423       278
##   Leasehold        0         0
##                                           
##                Accuracy : 0.6034          
##                  95% CI : (0.5661, 0.6398)
##     No Information Rate : 0.6034          
##     P-Value [Acc > NIR] : 0.5165          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.0000          
##          Pos Pred Value : 0.6034          
##          Neg Pred Value :    NaN          
##              Prevalence : 0.6034          
##          Detection Rate : 0.6034          
##    Detection Prevalence : 1.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : Freehold        
## 
## Recall: 0.8652482

3.1.4. Decision Tree Model

This section focuses on implementing a Decision Tree model for classification. The process involves training the model on the training dataset and then making predictions on the test set. The model’s effectiveness is evaluated using a confusion matrix, and its visualization is provided for an intuitive understanding of the model’s accuracy.

## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Freehold Leasehold
##   Freehold       393        33
##   Leasehold       30       245
##                                           
##                Accuracy : 0.9101          
##                  95% CI : (0.8865, 0.9302)
##     No Information Rate : 0.6034          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.8119          
##                                           
##  Mcnemar's Test P-Value : 0.8011          
##                                           
##             Sensitivity : 0.9291          
##             Specificity : 0.8813          
##          Pos Pred Value : 0.9225          
##          Neg Pred Value : 0.8909          
##              Prevalence : 0.6034          
##          Detection Rate : 0.5606          
##    Detection Prevalence : 0.6077          
##       Balanced Accuracy : 0.9052          
##                                           
##        'Positive' Class : Freehold        
## 

## Recall: 0.8652482

3.2. Regression Analysis

3.2.1. Linear Regression

This part deals with predicting a continuous variable (e.g., price). A linear regression model is built, and predictions are made on the test dataset. The model’s accuracy is assessed through various metrics like MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and R-squared. Scatter plots are also generated to visually compare actual vs. predicted values.

##                    Bedroom                   Bathroom 
##                  "numeric"                  "numeric" 
##              Property.Size            Completion.Year 
##                  "numeric"                  "numeric" 
##                Total.Units                Parking.Lot 
##                  "numeric"                  "numeric" 
##               No.of.Floors               Property.Age 
##                  "numeric"                  "numeric" 
##           Total_Facilities Total_surroundingamenities 
##                  "numeric"                  "numeric" 
##              Building.Name                  Developer 
##                  "numeric"                  "numeric" 
##              Property.Type                Floor.Range 
##                  "numeric"                  "numeric" 
##                 Land.Title                      State 
##                  "numeric"                  "numeric" 
##                       City                Tenure.Type 
##                  "numeric"                  "numeric" 
##                      price 
##                  "numeric"
## 
## Call:
## lm(formula = price ~ Bedroom + Bathroom + Property.Size + Total.Units + 
##     Parking.Lot + No.of.Floors + Property.Age, data = data_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -371.22  -44.79   -3.86   41.18  328.84 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    93.52484    9.33434  10.019  < 2e-16 ***
## Bedroom       -39.14701    3.18450 -12.293  < 2e-16 ***
## Bathroom       18.14541    4.56543   3.975 7.26e-05 ***
## Property.Size   0.37943    0.01092  34.747  < 2e-16 ***
## Total.Units    -0.05711    0.01286  -4.441 9.35e-06 ***
## Parking.Lot    24.23669    2.02536  11.967  < 2e-16 ***
## No.of.Floors    3.56045    0.15565  22.875  < 2e-16 ***
## Property.Age   -0.61872    0.20615  -3.001  0.00271 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 71.78 on 2450 degrees of freedom
## Multiple R-squared:  0.6398, Adjusted R-squared:  0.6388 
## F-statistic: 621.7 on 7 and 2450 DF,  p-value: < 2.2e-16
## Length of actual values:  1052
## Length of predicted values:  1052

## Mean Absolute Error (MAE): 52.9102
## Root Mean Squared Error (RMSE): 68.16736
## R-squared: 0.6397955

3.2.2. Random Forest for Regression

The model is trained to predict a continuous variable, likely ‘price’, and its performance is evaluated using MAE, RMSE, and R-squared. A scatter plot comparing actual and predicted values is also presented.

library(dplyr)
library(randomForest)

data2 <- data %>% dplyr::select(-Building.Name, -City, -Developer)

# Assuming data2 is your dataframe
# Assuming price is the column you want to predict, and you have other columns as features
set.seed(777)
# Split your data into training and testing sets
sample_index <- sample(1:nrow(data2), 0.8 * nrow(data2))
train_data <- data2[sample_index, ]
test_data <- data2[-sample_index, ]

# Create a random forest model
model_rf <- randomForest(price ~ ., data = train_data)

# Make predictions on the test set
predictions_rf <- predict(model_rf, newdata = test_data)

# Evaluate the model
# Example: Mean Absolute Error (MAE)
mae_rf <- mean(abs(predictions_rf - test_data$price))
cat("Random Forest Mean Absolute Error (MAE):", mae_rf, "\n")
## Random Forest Mean Absolute Error (MAE): 43.52476
# Calculate R-squared
rsquared_rf <- 1 - (sum((test_data$price - predictions_rf)^2) / sum((test_data$price - mean(test_data$price))^2))
cat("Random Forest R-squared:", rsquared_rf, "\n")
## Random Forest R-squared: 0.740809
# Calculate RMSE
rmse_rf <- sqrt(mean((predictions_rf - test_data$price)^2))
cat("Random Forest RMSE:", rmse_rf, "\n")
## Random Forest RMSE: 61.66981
plot(test_data$price, predictions_rf, main = "Random Forest: Actual vs Predicted", xlab = "Actual", ylab = "Predicted", pch = 20,   
       col = "blue")
abline(0, 1, col = "red", lwd = 2)

3.2.3. Decision Tree for Regression

In this final section, a Decision Tree model is constructed for regression analysis. After training the model, predictions are made, and the model’s accuracy is evaluated through MAE, RMSE, and R-squared. Additionally, the decision tree is visualized for a comprehensive understanding of the model structure.

library(rpart)
set.seed(777)
sample_index <- sample(1:nrow(data2), 0.8 * nrow(data2))
train_data <- data2[sample_index, ]
test_data <- data2[-sample_index, ]

# Create a decision tree model
model_tree <- rpart(price ~ ., data = train_data)

# Visualize the decision tree
plot(model_tree)
text(model_tree, cex = 0.7)

# Make predictions on the test set
predictions_tree <- predict(model_tree, newdata = test_data)

# Evaluate the model
# Example: Mean Absolute Error (MAE)
mae_tree <- mean(abs(predictions_tree - test_data$price))
cat("Decision Tree Mean Absolute Error (MAE):", mae_tree, "\n")
## Decision Tree Mean Absolute Error (MAE): 59.66233
# Calculate R-squared
rsquared_tree <- 1 - (sum((test_data$price - predictions_tree)^2) / sum((test_data$price - mean(test_data$price))^2))
# Calculate RMSE
rmse_tree <- sqrt(mean((predictions_tree - test_data$price)^2))

cat("Decision Tree R-squared:", rsquared_tree, "\n")
## Decision Tree R-squared: 0.5802981
cat("Decision Tree RMSE:", rmse_tree, "\n")
## Decision Tree RMSE: 78.47533
plot(test_data$price, predictions_tree, main = "Decision Tree: Actual vs Predicted", xlab = "Actual", ylab = "Predicted", pch = 20,   
       col = "blue")
abline(0, 1, col = "red", lwd = 2)

4. Comparison

4.1 performance describe

4.1.1. Classification

# Initialize an empty data frame to store the summary of all models
# Initialize an empty data frame to store the summary of all models
# Assuming you have already calculated the confusion matrices for each model
# cm for Naive Bayes, cm_svm for SVM, and cm_tree for Decision Tree

# Initialize an empty data frame to store the summary of all models
models_summary <- data.frame(
  Model = character(),
  Accuracy = numeric(),
  Recall = numeric(),
  Precision = numeric(),
  F1_Score = numeric(),
  stringsAsFactors = FALSE
)

# Naive Bayes performance metrics
nb_recall <- cm$byClass['Sensitivity']
nb_precision <- cm$byClass['Pos Pred Value']
nb_F1 <- 2 * (nb_precision * nb_recall) / (nb_precision + nb_recall)
nb_accuracy <- cm$overall['Accuracy']

# Add Naive Bayes to summary
models_summary <- rbind(models_summary, data.frame(
  Model = "Naive Bayes",
  Accuracy = nb_accuracy,
  Recall = nb_recall,
  Precision = nb_precision,
  F1_Score = nb_F1
))

# SVM performance metrics
svm_recall <- cm_svm$byClass['Sensitivity']
svm_precision <- cm_svm$byClass['Pos Pred Value']
svm_F1 <- 2 * (svm_precision * svm_recall) / (svm_precision + svm_recall)
svm_accuracy <- cm_svm$overall['Accuracy']

# Add SVM to summary
models_summary <- rbind(models_summary, data.frame(
  Model = "SVM",
  Accuracy = svm_accuracy,
  Recall = svm_recall,
  Precision = svm_precision,
  F1_Score = svm_F1
))

# Decision Tree performance metrics
dt_recall <- cm_tree$byClass['Sensitivity']
dt_precision <- cm_tree$byClass['Pos Pred Value']
dt_F1 <- 2 * (dt_precision * dt_recall) / (dt_precision + dt_recall)
dt_accuracy <- cm_tree$overall['Accuracy']

# Add Decision Tree to summary
models_summary <- rbind(models_summary, data.frame(
  Model = "Decision Tree",
  Accuracy = dt_accuracy,
  Recall = dt_recall,
  Precision = dt_precision,
  F1_Score = dt_F1
))

# Print the summary table
print(models_summary)
##                   Model  Accuracy    Recall Precision  F1_Score
## Accuracy    Naive Bayes 0.8559201 0.8652482 0.8926829 0.8787515
## Accuracy1           SVM 0.6034237 1.0000000 0.6034237 0.7526690
## Accuracy2 Decision Tree 0.9101284 0.9290780 0.9225352 0.9257951

Best Performance in Classification: Decision Tree This model has the highest accuracy and balanced accuracy, indicating a better overall performance in correctly classifying the data. Balanced accuracy is particularly important as it takes into account the imbalance in the dataset. Naive Bayes also performs well, but Decision Tree edges ahead with a higher accuracy and balanced accuracy.

The reason why Decision Tree might be preferred:

No feature scaling: Decision trees don’t require feature scaling, such as normalization or normalization. This makes them very convenient when dealing with features with different scales.

Handle non-linear data: Decision trees work well with data with non-linear relationships. For nonlinear problems that many other algorithms, such as linear regression, struggle with, decision trees can provide a better solution.

Not affected by outliers: Decision trees are not sensitive to outliers. Because of the way decision trees are segmented, outliers usually only affect a small portion of the tree.

4.1.2. Regression

models_summary2 <- data.frame(
  Model = character(),
  mae = numeric(),
  rsquared = numeric(),
  rmse = numeric(),
  stringsAsFactors = FALSE
)

# Linear Regression performance metrics
lmmae <- mae1
lmrsquared <- rsquared1
lmrmse <- rmse1

# Adding linear regression metrics to summary
models_summary2 <- rbind(models_summary2, data.frame(
  Model = "Linear Regression",
  mae = lmmae,
  rsquared = lmrsquared,
  rmse = lmrmse
))

# Random Forest performance metrics
rf_mae <- mae_rf
rf_rsquared <- rsquared_rf
rf_rmse <- rmse_rf

# Adding Random Forest metrics to summary
models_summary2 <- rbind(models_summary2, data.frame(
  Model = "Random Forest",
  mae = rf_mae,
  rsquared = rf_rsquared,
  rmse = rf_rmse
))

# Decision Tree performance metrics
dt_mae <- mae_tree
dt_rsquared <- rsquared_tree
dt_rmse <- rmse_tree

# Adding Decision Tree metrics to summary
models_summary2 <- rbind(models_summary2, data.frame(
  Model = "Decision Tree",
  mae = dt_mae,
  rsquared = dt_rsquared,
  rmse = dt_rmse
))

# Print the summary table
print(models_summary2)
##               Model      mae  rsquared     rmse
## 1 Linear Regression 52.91020 0.6397955 68.16736
## 2     Random Forest 43.52476 0.7408090 61.66981
## 3     Decision Tree 59.66233 0.5802981 78.47533

Best Performance in Regression: Random Forest. This model has the lowest MAE and RMSE, indicating it has the least average error in predictions and the predictions are closer to the actual values. Additionally, it has the highest R-squared value, suggesting the best fit of the model to the data. Lower MAE and RMSE are critical for a good regression model as they directly relate to the accuracy of the predictions. Higher R-squared value indicates that the model explains a greater proportion of variance in the dependent variable.

Reason for best performance is Random Forest

Random Forest is often considered superior to individual Decision Trees and Linear Regression in certain scenarios due to its ensemble learning approach.

The reason why Random Forest might be preferred:

Reduced Overfitting:

Random Forest are prone to overfitting, meaning they may capture noise in the training data and perform poorly on new, unseen data. Random Forest helps mitigate this issue by combining the predictions of multiple trees, reducing the risk of overfitting.

Improved Generalization:

Random Forest generally provides better generalization to unseen data compared to a single Decision Tree or Linear Regression model. The ensemble nature of Random Forest helps in capturing a more accurate and robust representation of the underlying patterns in the data.

Handling Non-linearity:

Linear Regression assumes a linear relationship between the input features and the target variable. Random Forest, on the other hand, can capture non-linear relationships in the data more effectively, making it suitable for a wider range of problems.

Random Forest is less sensitive to outliers than Linear Regression. Outliers can heavily influence the coefficients in a linear model, leading to a skewed representation of the data.

Handling Missing Values:

Random Forest can handle missing values in the dataset without the need for imputation. It uses the majority voting mechanism during the tree-building process, making it robust to missing data.

4.2. Evaluation Conclusion

For classification tasks, the Decision Tree model demonstrates the best performance based on the result of confusion metrics that interpret the accuracy, recall and F1-score. For the regression tasks, the Random Forest model demonstrates the best performance based on MAE, RSME and R-square. The Decision Tree combines high accuracy and balanced accuracy in classification with the lowest prediction errors and the highest explanation of variance in regression. The Random Forest is a versatile model capable of handling different types of data and tasks effectively, likely due to its ability to manage high-dimensional data and protect against overfitting which benefit in regression model.

5. Conclusion

5.1 Interpretation and conclusion

In conclusion, this study addresses the challenging task of predicting condominium prices in the dynamic Malaysian property market. Employing a comprehensive data analysis methodology that incorporates classification and regression techniques, the research delves beyond mere price prediction to explore underlying factors, including the impact of property tenure types. The findings from this in-depth exploration have revealed crucial insights:

  1. Penang stands out with the highest property prices, likely attributed to the greater availability of service residences and condominiums compared to other areas, while Selangor has the highest number of total facilities provided.

  2. Visualizing the data through charts has uncovered trends and relationships, highlighting factors such as parking lot availability, number of floors, property size, number of bathrooms, and total facilities as influential in house prices. Condominiums typically command higher average prices, while flats tend to be more affordable among different property types.

  3. Nine indicators, including amenities, facilities, age, floors, units, size, bathrooms, bedrooms, and completion year, were analyzed for their correlations. Property size, bathrooms, and bedrooms showed strong interplay, with price influenced significantly by floors, size, bathrooms, and facilities.

  4. The Random Forest model emerged as a standout performer in terms of accuracy, demonstrating low Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values. Whereas for classification, the base decision tree perform best to predict whether the property tenure types should be Freehold or Leasehold.

5.2 Area of Improvement

However, areas of improvement have been identified:

  1. Insufficient Data: The study acknowledges the challenge of insufficient data. Future research could periodically supplement the dataset from Mudah.com to ensure more comprehensive and accurate predictions.

  2. Raw Data Processing: While various factors were considered in raw data processing, there is a recognition that some discarded data columns may have potential impacts on prices. Further research and market surveys, incorporating techniques like natural language processing, could enhance data preprocessing.

  3. Model Applicability Clarification: Although the Random Forest model performed well in classification modeling, further exploration into its commercial applicability is deemed necessary.