| Group Member’s Name | Student ID | Role |
|---|---|---|
| YANG HUILAN | S2171624 | Data Pre-Processing |
| ALVIN CHUA CHEE SIANG | 22094960 | Exploratory Data Analysis |
| CUI SHIYU | 22086544 | Data Modelling |
| ZAHRA SYAHIDA BINTI EHWAN | 22092140 | Data Evaluation |
| LIANG RUIJIE | 22100508 | Data Interpretation |
Our team has been has been assign with a project to conduct a comprehensive analysis and predictive modeling of condominium prices in Malaysia. The Malaysian real estate market is dynamic, diverse, and highly competitive, influenced by various economic factors like population growth, urbanization, and changing consumer preferences. Demographic shifts, particularly an aging society, are altering housing demands, leading developers to diversify offerings. The market exhibits regional variations in supply and demand, causing price fluctuations.
Real estate companies navigate this complexity through extensive market research to understand and forecast trends. They strive to balance attractive pricing with profitability by investing in locations with promising future potential. The research project seeks to leverage data analytics to address these challenges. The goal is to analyze factors affecting condominium prices in Malaysia, developing predictive models for price estimation and to detect if the selected property should be leasehold or freehold. The study aims to provide the client with deep market insights, reliable data support, and strategic recommendations for future building planning and design.
The initial dataset, sourced from Mudah.com, an online marketplace platform, comprises 4,000 records and 32 columns with detailed information on property listings, prices, and descriptions.
1.Exploratory Data Analysis (EDA): Understand the relationships and correlations among different variables related to condominium properties.
2.Classification Modeling: Classify tenure types (e.g., Freehold or Leasehold) based on various property-related features.
3.Regression Modeling: Predict the prices of condominiums using linear regression, random forest regression, and decision tree regression models.
This section is for setting up the environment and loading necessary libraries.
# Clear all variables
#rm(list = ls(all.names = TRUE))
# Libraries for data manipulation
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.3.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.0
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Libraries for data visualization
library(ggplot2)
library(lattice)
# Libraries for pre-processing
#install.packages("caret")
library(caret)
## Warning: package 'caret' was built under R version 4.3.2
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
Reading the data file and performing basic data exploration.
# Reading the data
data <- read.csv('houses.csv', fileEncoding = "UTF-8")
# Initial Data Exploration
head(data)
## description
## 1 Iconic Building @ KL SETAPAK\nNew launching & Latest condo !!!!! 🔥\nHouse with luxury hotel concept 😍👑\n💎 Freehold\n🔑 Dual key\n🛌 2 / 3 / 4 rooms\n💰Affordable and Low entry price\n💼 100% full furnish, move in with a luggage\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp link :\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\n🏝5 🌟 facilities : Sky lounge, Sky bridge, Sky garden\n🚗 6km to KLCC/ Bkt Bintang\n🍱 Food Heaven\n📈 Freehold Appreciation\n👑 Luxury Hotel Drop-off Lobby\n🏊🏻 Infinity Pool\n🏡 Sky Garden, Sky Lounge, Sky bridge\nFacilities:\nLevel 10 – Elevated lawn for Yoga, Jogging Trail, Jacuzzi, Infinity Pool, Wading Pool, Pool Deck, Play land, Sunbathe Terrace, Sun Lounge, Gymnamsium, Viewing Deck, Squash, Futsal, Half Basketball court, central lobby and relaxing yard.\nLevel 50 – Viewing Terrace, Barbeque area, Gathering space, Turf Mound, Multistep Seating Lounge, Rooftop Lounge and open stage.\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp / Call \nShow contact number\n ( Eugine ) for more details & showroom viewing 🔥\nWhatsapp link :\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\nhttps://hotnewcondo.wasap.my\nContinue Reading\nPROPERTY HIGHLIGHTS\nNEW!\nBedroom\n4\nBathroom\n2\nProperty Size\n1000 sq.ft.\nNearby School\nSekolah Menengah Pendidikan Khas Cacat Penglihatan\nNearby Mall\nSetapak Central\nSee more details
## 2 FOR SALE @ RM250,000\nIntroduction:\n~ Pangsapuri Kenanga @ Kampung Lapan\n~ 980 sqft\n~ 3 Bedrooms & 2 Bathrooms\n~ Bathroom with Water Heather\n~ Master Bedroom with Aircond\n~ Walk Up Apartment\n~ Bare Unit\nFacilities\n~ Enter with access card\n~ Swimming Pool\n~ Nice environment\nNearby:\n- Strategic Location\n- Jonker Walk, UNESCO World Heritage Site\n- Dataran Pahlawan\n- Hospital Mahkota\n- Kota Laksamana\n- Melaka Central (Bus Station)\n>>> https://www.youtube.com/@propertysoldier6688\n>>> https://www.facebook.com/propertysoldiermelaka\n>>> You may call me up for other house inquiries, we have other houses ready for sale,\nMelaka Property Wanted For Sale: House, Apartment, Condominium, Shop, Factory, Land\nand etc.\n#Free Consultation Loan Service & Lawyer advise\nMy Profile:\nWe are a property management company.\nIn this industry for over eight years.\nProvide Service :\n* Sales & Purchase\n* Super Host Home Stay Management\n* Long-Term Contract Renting\n* House Cleaning Service\n* Design & Renovation\nContinue Reading
## 3 [Below Market] Sri Lavender Apartment,Tmn Sepakat Indah,100% FULL LOAN\nRM 230,000\n(💥BELOW MARKET VALUE💥)\nMARKET VALUE: RM 330,000\n# LPPSA | KWSP AC 2\n# Mark up Price Available\n# 100% Full Loan Available\n# Cash Back Available\n# Below Market Price\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\n=============================\nProperty Type: Apartment\nTitle type: Freehold\nBedrooms: 3\nBathroom: 2\nSize: 1000 sq.ft.\nProperty Details:\n- 7th Floor with Balcony\n- Freehold, Open title & Strata Ready\n- 1 Covered parking lot(can be seen from the unit ) with smart access card\n- Corner unit with awesome scenery from any angle of the unit.\n=============================\nFacilities:\nMini Market, Playground, 24 Hour Security, Balcony/Patio, Cable TV\n \nLocation :\n- 3 min to Silk Highway\n- 7 min to Plus Highway/Uniten\n- 5 min to Bandar Baru Bangi(nearest to seksyen 7)\n- 10 min to IOI Mall/Hospital Serdang/UPM\n- 10 min to Kajang/MRT Stadium Kajang\nPublic Transport\n- 2 min walking distance to Smart Selangor(free) bus stop/MRT Feeder Bus(MRT Kajang)\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\nPlease contact me for arrange viewing \nShow contact number\nContinue Reading\nPROPERTY HIGHLIGHTS\nNEW!\nBedroom\n3\nBathroom\n2\nProperty Size\n1000 sq.ft.\nSee more details
## 4 Flat Pandan Indah\nJalan Pandan Indah 3/3\nNon Bumi lot, 100% loan\n=====================\n• Walk up flat\n• Non Bumi lot\n• Leasehold with strata title\n• Build up 592sqft\n• 3 bedroom,1 bathroom\n• Flooring fully tiles\n• Build in concreate table top\n• Well Maintain and Good Condition\nHenrick Tan\nShow contact number\nShow contact number\nSenior Negotiator
## 5 * Open-concept Soho with balcony, unblock view\n* fully furnished studio unit 467 sq feet\n*consists of built in kitchen cabinets with induction cookers, microwave oven, washing machine, fridge, led tv, aircond, ceiling fan, water heater, alarm bell system, dining table sets with chairs, curtains, bed with mattress plus wardrobe & sofa, installed iron grill for front door\n*Best for the newly married\nPROPERTY HIGHLIGHTS\nNEW!\nBedroom\n1\nBathroom\n1\nProperty Size\n467 sq.ft.\nNearby School\nSekolah Jenis Kebangsaan (T) Ladang Midlands\nNearby Mall\ni-Soho i-City\nSee more details
## 6 D'Piazza, Bayan Baru for SALE\nDetails:\n✅1100 sqf\n✅3 bedroom\n✅2bathroom\n✅1 car park\n📌Original unit, airconds, heater \nInterested please call \nShow contact number
## Bedroom Bathroom Property.Size
## 1 4 2 1000 sq.ft.
## 2 3 2 980 sq.ft.
## 3 3 2 1000 sq.ft.
## 4 3 1 592 sq.ft.
## 5 1 1 467 sq.ft.
## 6 3 2 1100 sq.ft.
## Nearby.School Nearby.Mall Ad.List
## 1 Sekolah Menengah Pendidikan Khas Cacat Penglihatan Setapak Central 98187451
## 2 101683090
## 3 103792905
## 4 103806240
## 5 Sekolah Jenis Kebangsaan (T) Ladang Midlands i-Soho i-City 103806234
## 6 103739787
## Category
## 1 Apartment / Condominium, For sale
## 2 Apartment / Condominium, For sale
## 3 Apartment / Condominium, For sale
## 4 Apartment / Condominium, For sale
## 5 Apartment / Condominium, For sale
## 6 Apartment / Condominium, For sale
## Facilities
## 1 -
## 2 Parking, Security, Swimming Pool, Playground, Barbeque area, Jogging Track
## 3 Playground, Minimart, Jogging Track, Barbeque area, Parking, Security, Lift
## 4 Parking, Playground, Minimart, Jogging Track
## 5 Minimart, Gymnasium, Parking, Security
## 6 Parking, Swimming Pool, Multipurpose hall, Sauna, Minimart, Barbeque area, Security, Playground, Gymnasium, Tennis Court, Lift
## Building.Name Developer Tenure.Type
## 1 Kenwingston Platz Kenwingston Group Freehold
## 2 Kenanga (Park View Court) - Freehold
## 3 Sri Lavender Apartment TLS Group Freehold
## 4 Flat Pandan Indah - Leasehold
## 5 i-Soho @ i-City i-Berhad Freehold
## 6 D'Piazza Condominium X-Scan Penang Sdn Bhd Freehold
## Address
## 1 Jalan Gombak, Setapak, Kuala Lumpur
## 2 Jalan Kenanga 3/8, Melaka City, Melaka
## 3 Jalan Sepakat Indah 2/1, Taman Sepakat Indah 2, Kajang, Selangor
## 4 jalan pandan indah 3/3, Selangor, Ampang
## 5 Jalan Plumbum 7/102, Shah Alam, Selangor
## 6 Jalan Mayang Pasir 2, Bayan Baru, Penang
## Completion.Year X..of.Floors Total.Units Property.Type Parking.Lot
## 1 - - - Service Residence 2
## 2 - - - Apartment 1
## 3 2007 13 445 Apartment 1
## 4 - - - Flat 1
## 5 - 43 956 Studio -
## 6 2010 19 706 Condominium 1
## Floor.Range Land.Title Firm.Type Firm.Number REN.Number
## 1 - Non Bumi Lot VE 30338 -
## 2 Low Non Bumi Lot E 30812 REN 15862
## 3 Medium Non Bumi Lot - - -
## 4 - Non Bumi Lot E 11584 REN 16279
## 5 Low Bumi Lot E 31916 -
## 6 Low Non Bumi Lot E 11307 REN 61472
## Bus.Stop
## 1 Bus Stop Starparc Point\nBus Stop Setapak Central\nBus Stop Setapak Sentral (Opp)\nBus Stop Columbia Hospital\nBus Stop PV12 Residence\nBus Stop PV15 Platinum\nBus Stop PV12 Condominium (Opp)\nBus Stop Sri Utama Schools (Opp)\nBus Stop CIMB Genting Klang\nBus Stop 1 Sri Utama Schools\nBus Stop Aeon Big Danau Kota\nBus Stop 2 Sri Utama Schools\nBus Stop Setapak Commercial\nBus Stop 1 Setapak Food Court\nBus Stop Setapak Industrial Area\nBus Stop 2 Setapak Food Court\nBus Stop 2 Medan Makmur Setapak\nBus Stop 1 Medan Makmur Setapak\nBus Stop Langkawi Apartment (Opp)\nBus Stop BHP Genting Klang\nBus Stop Jalan Kilang\nBus Stop PV128 Setapak
## 2
## 3
## 4
## 5 Bus Stop at Persiaran Permai 1\nBus Stop at Persiaran Permai 2\nBus Stop at Pusat Komersial Seksyen 7 (Timur)\nBus Stop at Persiaran Bestari 1\nBus Stop at Jalan Plumbum N7/N\nBus Stop at UITM Shah Alam (Barat)\nBus Stop at Jakel (Seksyen 7)\nBus Stop at Jalan Sungai Rasau 1\nBus Stop at Federal Highway 1\nBus Stop at Jalan Sungai Rasau 2\nBus Stop at Jakel 2 (Seksyen 7)\nBus Stop at Pusat Kesihatan
## 6
## Mall
## 1 Setapak Central
## 2
## 3
## 4
## 5 i-Soho i-City\nGulati\nCentral i-City Shopping Centre
## 6
## Park
## 1 Park at Taman Tasik Danau Kota, Setapak, Kuala Lumpur, Malaysia\nPark at Taman Danau Kota, Setapak, Kuala Lumpur, Malaysia\nPark 1 at Setapak Garden, Setapak, Kuala Lumpur, Malaysia
## 2
## 3
## 4
## 5 Park 2 at Section 7, Shah Alam\nPark 1 at Section 7, Shah Alam
## 6
## School
## 1 Sekolah Menengah Pendidikan Khas Cacat Penglihatan\nSekolah Kebangsaan Danau Kota\nSJK (C) Wangsa Maju\nKolej Vokasional Setapak\nSri Utama Schools\nSK Danau Kota (2)\nSMK Danau Kota
## 2
## 3
## 4
## 5 Sekolah Jenis Kebangsaan (T) Ladang Midlands\nSekolah Kebangsaan Seksyen 7
## 6
## Hospital price
## 1 Columbia Asia Hospital RM 340 000
## 2 RM 250 000
## 3 RM 230 000
## 4 RM 158 000
## 5 Osel Clinic (Shah Alam)\nHospital Shah Alam RM 305 000
## 6 RM 425 000
## Highway Nearby.Railway.Station Railway.Station
## 1
## 2
## 3 SILK Sg Ramal (T) Toll Plaza
## 4
## 5
## 6
str(data)
## 'data.frame': 4000 obs. of 32 variables:
## $ description : chr "Iconic Building @ KL SETAPAK\nNew launching & Latest condo !!!!! 🔥\nHouse with luxury hotel concept 😍👑\n💎 Freeh"| __truncated__ "FOR SALE @ RM250,000\nIntroduction:\n~ Pangsapuri Kenanga @ Kampung Lapan\n~ 980 sqft\n~ 3 Bedrooms & 2 Bathroo"| __truncated__ "[Below Market] Sri Lavender Apartment,Tmn Sepakat Indah,100% FULL LOAN\nRM 230,000\n(💥BELOW MARKET VALUE💥)\nMAR"| __truncated__ "Flat Pandan Indah\nJalan Pandan Indah 3/3\nNon Bumi lot, 100% loan\n=====================\n• Walk up flat\n• No"| __truncated__ ...
## $ Bedroom : chr "4" "3" "3" "3" ...
## $ Bathroom : chr "2" "2" "2" "1" ...
## $ Property.Size : chr "1000 sq.ft." "980 sq.ft." "1000 sq.ft." "592 sq.ft." ...
## $ Nearby.School : chr "Sekolah Menengah Pendidikan Khas Cacat Penglihatan" "" "" "" ...
## $ Nearby.Mall : chr "Setapak Central" "" "" "" ...
## $ Ad.List : int 98187451 101683090 103792905 103806240 103806234 103739787 103690767 103615852 103615849 102460346 ...
## $ Category : chr "Apartment / Condominium, For sale" "Apartment / Condominium, For sale" "Apartment / Condominium, For sale" "Apartment / Condominium, For sale" ...
## $ Facilities : chr "-" "Parking, Security, Swimming Pool, Playground, Barbeque area, Jogging Track" "Playground, Minimart, Jogging Track, Barbeque area, Parking, Security, Lift" "Parking, Playground, Minimart, Jogging Track" ...
## $ Building.Name : chr "Kenwingston Platz" "Kenanga (Park View Court)" "Sri Lavender Apartment" "Flat Pandan Indah" ...
## $ Developer : chr "Kenwingston Group" "-" "TLS Group" "-" ...
## $ Tenure.Type : chr "Freehold" "Freehold" "Freehold" "Leasehold" ...
## $ Address : chr "Jalan Gombak, Setapak, Kuala Lumpur" "Jalan Kenanga 3/8, Melaka City, Melaka" "Jalan Sepakat Indah 2/1, Taman Sepakat Indah 2, Kajang, Selangor" "jalan pandan indah 3/3, Selangor, Ampang" ...
## $ Completion.Year : chr "-" "-" "2007" "-" ...
## $ X..of.Floors : chr "-" "-" "13" "-" ...
## $ Total.Units : chr "-" "-" "445" "-" ...
## $ Property.Type : chr "Service Residence" "Apartment" "Apartment" "Flat" ...
## $ Parking.Lot : chr "2" "1" "1" "1" ...
## $ Floor.Range : chr "-" "Low" "Medium" "-" ...
## $ Land.Title : chr "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" ...
## $ Firm.Type : chr "VE" "E" "-" "E" ...
## $ Firm.Number : chr "30338" "30812" "-" "11584" ...
## $ REN.Number : chr "-" "REN 15862" "-" "REN 16279" ...
## $ Bus.Stop : chr "Bus Stop Starparc Point\nBus Stop Setapak Central\nBus Stop Setapak Sentral (Opp)\nBus Stop Columbia Hospital\n"| __truncated__ "" "" "" ...
## $ Mall : chr "Setapak Central" "" "" "" ...
## $ Park : chr "Park at Taman Tasik Danau Kota, Setapak, Kuala Lumpur, Malaysia\nPark at Taman Danau Kota, Setapak, Kuala Lumpu"| __truncated__ "" "" "" ...
## $ School : chr "Sekolah Menengah Pendidikan Khas Cacat Penglihatan\nSekolah Kebangsaan Danau Kota\nSJK (C) Wangsa Maju\nKolej V"| __truncated__ "" "" "" ...
## $ Hospital : chr "Columbia Asia Hospital" "" "" "" ...
## $ price : chr "RM 340 000" "RM 250 000" "RM 230 000" "RM 158 000" ...
## $ Highway : chr "" "" "SILK Sg Ramal (T) Toll Plaza" "" ...
## $ Nearby.Railway.Station: chr "" "" "" "" ...
## $ Railway.Station : chr "" "" "" "" ...
summary(data)
## description Bedroom Bathroom Property.Size
## Length:4000 Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Nearby.School Nearby.Mall Ad.List Category
## Length:4000 Length:4000 Min. : 30964923 Length:4000
## Class :character Class :character 1st Qu.:102384201 Class :character
## Mode :character Mode :character Median :103350207 Mode :character
## Mean :102443246
## 3rd Qu.:103782293
## Max. :103806285
## Facilities Building.Name Developer Tenure.Type
## Length:4000 Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Address Completion.Year X..of.Floors Total.Units
## Length:4000 Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Property.Type Parking.Lot Floor.Range Land.Title
## Length:4000 Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Firm.Type Firm.Number REN.Number Bus.Stop
## Length:4000 Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Mall Park School Hospital
## Length:4000 Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## price Highway Nearby.Railway.Station
## Length:4000 Length:4000 Length:4000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Railway.Station
## Length:4000
## Class :character
## Mode :character
##
##
##
This part involves cleaning the data and preparing it for analysis. First check for missing values and then remove duplicates. After that, we extract and transform the data from the “Description” and “Facility” columns, creating new binary columns based on specific keywords.
# Check for missing values - counts
column_counts <- colSums(!is.na(data))
column_counts
## description Bedroom Bathroom
## 4000 4000 4000
## Property.Size Nearby.School Nearby.Mall
## 4000 4000 4000
## Ad.List Category Facilities
## 4000 4000 4000
## Building.Name Developer Tenure.Type
## 4000 4000 4000
## Address Completion.Year X..of.Floors
## 4000 4000 4000
## Total.Units Property.Type Parking.Lot
## 4000 4000 4000
## Floor.Range Land.Title Firm.Type
## 4000 4000 4000
## Firm.Number REN.Number Bus.Stop
## 4000 4000 4000
## Mall Park School
## 4000 4000 4000
## Hospital price Highway
## 4000 4000 4000
## Nearby.Railway.Station Railway.Station
## 4000 4000
# Remove duplicates
data <- distinct(data)
summary(data)
## description Bedroom Bathroom Property.Size
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Nearby.School Nearby.Mall Ad.List Category
## Length:3815 Length:3815 Min. : 30964923 Length:3815
## Class :character Class :character 1st Qu.:102383709 Class :character
## Mode :character Mode :character Median :103343288 Mode :character
## Mean :102446511
## 3rd Qu.:103782204
## Max. :103806285
## Facilities Building.Name Developer Tenure.Type
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Address Completion.Year X..of.Floors Total.Units
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Property.Type Parking.Lot Floor.Range Land.Title
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Firm.Type Firm.Number REN.Number Bus.Stop
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Mall Park School Hospital
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## price Highway Nearby.Railway.Station
## Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Railway.Station
## Length:3815
## Class :character
## Mode :character
##
##
##
# Define a list of keywords for each relevant column
garden_keywords <- c('garden')
Hospital_keywords <- c('hospital')
security_keywords <- c('security', 'access card', 'gated')
lift_keywords <- c('lift')
sauna_keywords <- c('sauna')
Basketball_keywords <- c('basketball')
parking_keywords <- c('parking')
Badminton_keywords <- c('badminton')
swimming_pool_keywords <- c('swimming pool', 'infinity pool', 'pool')
playground_keywords <- c('playground')
tennis_keywords <- c('tennis')
squash_keywords <- c('squash')
surau_keywords <- c('surau')
gymnasium_keywords <- c('gymnasium','gym')
barbeque_area_keywords <- c('barbeque area')
minimart_keywords <- c('minimart', 'supermarket', 'mart')
multipurpose_hall_keywords <- c('multipurpose hall','hall')
clubhouse_keywords <- c('club house')
jogging_track_keywords <- c('jogging track','jogging')
mrt_keywords <- c('mrt','lrt','erl','ktm')
# Convert the 'description' column to lowercase
data$description <- tolower(data$description)
data$Facilities <- tolower(data$Facilities)
#Data Extraction from semi-structure description data Create new columns for each relevant feature
data$garden <- ifelse(sapply(data$description, function(x) any(grepl(paste(garden_keywords, collapse = '|'), x))), 1, 0)
data$securitynew <- ifelse(sapply(data$description, function(x) any(grepl(paste(security_keywords, collapse = '|'), x))), 1, 0)
data$liftnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(lift_keywords, collapse = '|'), x))), 1, 0)
data$saunanew <- ifelse(sapply(data$description, function(x) any(grepl(paste(sauna_keywords, collapse = '|'), x))), 1, 0)
data$tennisnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(tennis_keywords, collapse = '|'), x))), 1, 0)
data$squashnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(squash_keywords, collapse = '|'), x))), 1, 0)
data$surau <- ifelse(sapply(data$description, function(x) any(grepl(paste(surau_keywords, collapse = '|'), x))), 1, 0)
data$parkingnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(parking_keywords, collapse = '|'), x))), 1, 0)
data$swimmingpoolnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(swimming_pool_keywords, collapse = '|'), x))), 1, 0)
data$playgroundnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(playground_keywords, collapse = '|'), x))), 1, 0)
data$gymnasiumnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(gymnasium_keywords, collapse = '|'), x))), 1, 0)
data$barbequeareanew<- ifelse(sapply(data$description, function(x) any(grepl(paste(barbeque_area_keywords, collapse = '|'), x))), 1, 0)
data$minimartnew <- ifelse(sapply(data$description, function(x) any(grepl(paste(minimart_keywords, collapse = '|'), x))), 1, 0)
data$multipurposehallnew<- ifelse(sapply(data$description, function(x) any(grepl(paste(multipurpose_hall_keywords, collapse = '|'), x))), 1, 0)
data$joggingtracknew <- ifelse(sapply(data$description, function(x) any(grepl(paste(jogging_track_keywords, collapse = '|'), x))), 1, 0)
data$hospital <- ifelse(sapply(data$description, function(x) any(grepl(paste(Hospital_keywords, collapse = '|'), x))), 1, 0)
data$mrt.lrt <- ifelse(sapply(data$description, function(x) any(grepl(paste(mrt_keywords, collapse = '|'), x))), 1, 0)
data$basketball <- ifelse(sapply(data$description, function(x) any(grepl(paste(Basketball_keywords, collapse = '|'), x))), 1, 0)
data$badminton <- ifelse(sapply(data$description, function(x) any(grepl(paste(Badminton_keywords, collapse = '|'), x))), 1, 0)
data$clubhousenew <- ifelse(sapply(data$description, function(x) any(grepl(paste(clubhouse_keywords, collapse = '|'), x))), 1, 0)
summary(data)
## description Bedroom Bathroom Property.Size
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Nearby.School Nearby.Mall Ad.List Category
## Length:3815 Length:3815 Min. : 30964923 Length:3815
## Class :character Class :character 1st Qu.:102383709 Class :character
## Mode :character Mode :character Median :103343288 Mode :character
## Mean :102446511
## 3rd Qu.:103782204
## Max. :103806285
## Facilities Building.Name Developer Tenure.Type
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Address Completion.Year X..of.Floors Total.Units
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Property.Type Parking.Lot Floor.Range Land.Title
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Firm.Type Firm.Number REN.Number Bus.Stop
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Mall Park School Hospital
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## price Highway Nearby.Railway.Station
## Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Railway.Station garden securitynew liftnew
## Length:3815 Min. :0.0000 Min. :0.0000 Min. :0.00000
## Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Mode :character Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.1012 Mean :0.2375 Mean :0.07837
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000
## saunanew tennisnew squashnew surau
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.0000
## Mean :0.02805 Mean :0.02595 Mean :0.01232 Mean :0.0789
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.0000
## parkingnew swimmingpoolnew playgroundnew gymnasiumnew
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1678 Mean :0.2013 Mean :0.1471 Mean :0.1261
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## barbequeareanew minimartnew multipurposehallnew joggingtracknew
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.01389 Mean :0.09174 Mean :0.08598 Mean :0.03827
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## hospital mrt.lrt basketball badminton
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1127 Mean :0.2571 Mean :0.02333 Mean :0.03853
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## clubhousenew
## Min. :0.000000
## 1st Qu.:0.000000
## Median :0.000000
## Mean :0.007339
## 3rd Qu.:0.000000
## Max. :1.000000
Here we have more complex data manipulation. This includes applying one-hot encoding to the ‘Facilities’ column, logically merging similar columns, and removing additional columns that are no longer required for our analysis.
# Separate the facilities into multiple columns (one-hot encoding)
data <- data %>%
separate_rows(Facilities, sep = ", ") %>%
mutate(value = 1) %>%
spread(Facilities, value, fill = 0)
summary(data)
## description Bedroom Bathroom Property.Size
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Nearby.School Nearby.Mall Ad.List Category
## Length:3815 Length:3815 Min. : 30964923 Length:3815
## Class :character Class :character 1st Qu.:102383709 Class :character
## Mode :character Mode :character Median :103343288 Mode :character
## Mean :102446511
## 3rd Qu.:103782204
## Max. :103806285
## Building.Name Developer Tenure.Type Address
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Completion.Year X..of.Floors Total.Units Property.Type
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Parking.Lot Floor.Range Land.Title Firm.Type
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Firm.Number REN.Number Bus.Stop Mall
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Park School Hospital price
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Highway Nearby.Railway.Station Railway.Station garden
## Length:3815 Length:3815 Length:3815 Min. :0.0000
## Class :character Class :character Class :character 1st Qu.:0.0000
## Mode :character Mode :character Mode :character Median :0.0000
## Mean :0.1012
## 3rd Qu.:0.0000
## Max. :1.0000
## securitynew liftnew saunanew tennisnew
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.2375 Mean :0.07837 Mean :0.02805 Mean :0.02595
## 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000
## squashnew surau parkingnew swimmingpoolnew
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.01232 Mean :0.0789 Mean :0.1678 Mean :0.2013
## 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## playgroundnew gymnasiumnew barbequeareanew minimartnew
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.1471 Mean :0.1261 Mean :0.01389 Mean :0.09174
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## multipurposehallnew joggingtracknew hospital mrt.lrt
## Min. :0.00000 Min. :0.00000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.0000 Median :0.0000
## Mean :0.08598 Mean :0.03827 Mean :0.1127 Mean :0.2571
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.00000 Max. :1.00000 Max. :1.0000 Max. :1.0000
## basketball badminton clubhousenew -
## Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.0000
## Median :0.00000 Median :0.00000 Median :0.000000 Median :0.0000
## Mean :0.02333 Mean :0.03853 Mean :0.007339 Mean :0.1594
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.0000
## Max. :1.00000 Max. :1.00000 Max. :1.000000 Max. :1.0000
## 10 barbeque area club house gymnasium
## Min. :0.0000000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.0002621 Mean :0.3714 Mean :0.1583 Mean :0.4938
## 3rd Qu.:0.0000000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## jogging track lift minimart multipurpose hall
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :1.0000 Median :0.0000 Median :0.0000
## Mean :0.3554 Mean :0.5245 Mean :0.3992 Mean :0.3269
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## parking playground sauna security
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :1.0000 Median :1.0000 Median :0.0000 Median :1.0000
## Mean :0.7596 Mean :0.6826 Mean :0.2663 Mean :0.7554
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## squash court swimming pool tennis court
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.1499 Mean :0.6005 Mean :0.1714
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000
data$hospital1 <- ifelse(grepl(".", data$Hospital), 1, 0)
data$busstop <- ifelse(grepl(".", data$Bus.Stop), 1, 0)
data$school <- ifelse(grepl(".", data$School), 1, 0)
data$mall <- ifelse(grepl(".", data$Mall), 1, 0)
data$railway <- ifelse(grepl(".", data$Railway.Station), 1, 0)
data$highway <- ifelse(grepl(".", data$Highway), 1, 0)
# Merge similar columns with Logical OR
data$barbequearea <- as.integer(data$"barbeque area" | data$barbequeareanew)
data$clubhouse <- as.integer(data$"club house" | data$clubhousenew)
data$gymnasium <- as.integer(data$gymnasium | data$gymnasiumnew)
data$joggingtrack <- as.integer(data$"jogging track" | data$joggingtracknew)
data$lift <- as.integer(data$lift | data$liftnew)
data$minimart <- as.integer(data$minimart | data$minimartnew)
data$multipurposehall <- as.integer(data$"multipurpose hall" | data$multipurposehallnew)
data$parking <- as.integer(data$parking | data$parkingnew)
data$playground <- as.integer(data$playground | data$playgroundnew)
data$sauna <- as.integer(data$sauna | data$saunanew)
data$security <- as.integer(data$security | data$securitynew)
data$squashcourt <- as.integer(data$"squash court"| data$squashnew)
data$swimmingpool <- as.integer(data$"swimming pool" | data$swimmingpoolnew)
data$tennis <- as.integer(data$"tennis court" | data$tennisnew)
data$hospital <- as.integer(data$hospital1 | data$hospital )
data$railway<-as.integer(data$railway|data$mrt.lrt)
# Drop the additional columns
data <- data[, !(names(data) %in% c("barbeque area","barbequeareanew","club house","clubhousenew","gymnasiumnew", "joggingtracknew","jogging track","liftnew","minimartnew","multipurposehallnew" ,"multipurpose hall","parkingnew","playgroundnew","saunanew","securitynew", "squashnew","squash court","swimming pool","swimmingpoolnew","tennis court", "tennisnew","Bus.Stop","schoolnew","Hospital","hospital1","School","Mall","Railway.Station", "mrt.lrt","Highway"))]
At the end of data cleaning, processing steps include handling missing values, extracting specific data from columns, replacing placeholder values with NA, and converting the data to the correct type for further analysis.
# Removal of irrelevant columns
data <- dplyr::select(data, -description, -Ad.List, -Firm.Type, -Firm.Number, -REN.Number,-Category,-Park,-Nearby.School,-Nearby.Mall,-Nearby.Railway.Station)
data <- dplyr::select(data, -c('-', '10'))
summary(data)
## Bedroom Bathroom Property.Size Building.Name
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Developer Tenure.Type Address Completion.Year
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## X..of.Floors Total.Units Property.Type Parking.Lot
## Length:3815 Length:3815 Length:3815 Length:3815
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Floor.Range Land.Title price garden
## Length:3815 Length:3815 Length:3815 Min. :0.0000
## Class :character Class :character Class :character 1st Qu.:0.0000
## Mode :character Mode :character Mode :character Median :0.0000
## Mean :0.1012
## 3rd Qu.:0.0000
## Max. :1.0000
## surau hospital basketball badminton
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.00000
## Mean :0.0789 Mean :0.1798 Mean :0.02333 Mean :0.03853
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
## gymnasium lift minimart parking
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:1.0000
## Median :1.0000 Median :1.000 Median :0.0000 Median :1.0000
## Mean :0.5174 Mean :0.557 Mean :0.4385 Mean :0.7924
## 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000
## playground sauna security busstop
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.7101 Mean :0.2771 Mean :0.7885 Mean :0.1793
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## school mall railway highway
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.2406 Mean :0.1201 Mean :0.2587 Mean :0.03591
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
## barbequearea clubhouse joggingtrack multipurposehall
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.3767 Mean :0.1638 Mean :0.3712 Mean :0.3672
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## squashcourt swimmingpool tennis
## Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :1.0000 Median :0.0000
## Mean :0.157 Mean :0.6265 Mean :0.1811
## 3rd Qu.:0.000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000
# Removing rows where Address is NA
data <- data %>% filter(!is.na(Address))
# Extracting State and City
# Define a set of Malay states
malay_states <- c(
'Johor', 'Kedah', 'Kelantan', 'Perak', 'Selangor', 'Melaka', 'Negeri Sembilan',
'Pahang', 'Perlis', 'Penang', 'Sabah', 'Sarawak', 'Sarawak', 'Terengganu',
'Kuala Lumpur', 'Labuan', 'Putrajaya'
)
# Extract state information and create a new column 'state'
data <- data %>%
mutate(State = str_extract(Address, paste(malay_states, collapse = '|')))
#Remove data without address
data <- subset(data, Address != "-")
# Extract state information and create a new column 'city'
data$City <- ifelse(
sapply(strsplit(as.character(data$Address), ", "), function(x) ifelse(length(x) > 1, x[length(x) - 1], NA)) %in% malay_states,
sapply(strsplit(as.character(data$Address), ", "), function(x) x[length(x)]),
sapply(strsplit(as.character(data$Address), ", "), function(x) ifelse(length(x) > 1, x[length(x) - 1], NA))
)
# Replace "-" with 0 or NA and convert to numeric or appropriate type
data$Bedroom <- as.numeric(gsub("-", "0", data$Bedroom))
data$Bathroom <- as.numeric(gsub("-", "0", data$Bathroom))
data$No.of.Floors <- as.numeric(gsub("-", NA, data$X..of.Floors))
data$Total.Units <- as.numeric(gsub("-", NA, data$Total.Units))
data$Parking.Lot <- as.numeric(gsub("-", "0", data$Parking.Lot))
data$Property.Size <- as.numeric(gsub(" sq.ft.", "", gsub("-", "0", data$Property.Size)))
data$price <- as.numeric(gsub("[^0-9]", "", data$price))
data$Building.Name <- ifelse(data$Building.Name == "-", NA, data$Building.Name)
data$Developer <- ifelse(data$Developer == "-", NA, data$Developer)
data$Completion.Year <- ifelse(data$Completion.Year == "-", NA, data$Completion.Year)
data$Floor.Range <- ifelse(data$Floor.Range == "-", NA, data$Floor.Range)
# Remove renamed column
data <- subset(data, select = -c(X..of.Floors, Address))
Exploratory Data Analysis (EDA) is a crucial step in the data analysis process. It involves exploring and understanding the structure, patterns, and characteristics of a dataset before applying more advanced statistical methods or machine learning algorithms.
This section is to examine the first few rows of the dataset to understand its structure. Check the data types of each variable (numeric, categorical, datetime, etc.). Identify missing values and outliers.
#view first six rows of dataset
head(data)
## # A tibble: 6 × 40
## Bedroom Bathroom Property.Size Building.Name Developer Tenure.Type
## <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 4 2 1000 Kenwingston Platz Kenwings… Freehold
## 2 3 2 980 Kenanga (Park View Court) <NA> Freehold
## 3 3 2 1000 Sri Lavender Apartment TLS Group Freehold
## 4 3 1 592 Flat Pandan Indah <NA> Leasehold
## 5 1 1 467 i-Soho @ i-City i-Berhad Freehold
## 6 3 2 1100 D'Piazza Condominium X-Scan P… Freehold
## # ℹ 34 more variables: Completion.Year <chr>, Total.Units <dbl>,
## # Property.Type <chr>, Parking.Lot <dbl>, Floor.Range <chr>,
## # Land.Title <chr>, price <dbl>, garden <dbl>, surau <dbl>, hospital <int>,
## # basketball <dbl>, badminton <dbl>, gymnasium <int>, lift <int>,
## # minimart <int>, parking <int>, playground <int>, sauna <int>,
## # security <int>, busstop <dbl>, school <dbl>, mall <dbl>, railway <int>,
## # highway <dbl>, barbequearea <int>, clubhouse <int>, joggingtrack <int>, …
# display the dimensions of the dataset
dim(data)
## [1] 3730 40
#data structure
str(data)
## tibble [3,730 × 40] (S3: tbl_df/tbl/data.frame)
## $ Bedroom : num [1:3730] 4 3 3 3 1 3 3 3 3 3 ...
## $ Bathroom : num [1:3730] 2 2 2 1 1 2 2 2 2 2 ...
## $ Property.Size : num [1:3730] 1000 980 1000 592 467 1100 780 852 918 850 ...
## $ Building.Name : chr [1:3730] "Kenwingston Platz" "Kenanga (Park View Court)" "Sri Lavender Apartment" "Flat Pandan Indah" ...
## $ Developer : chr [1:3730] "Kenwingston Group" NA "TLS Group" NA ...
## $ Tenure.Type : chr [1:3730] "Freehold" "Freehold" "Freehold" "Leasehold" ...
## $ Completion.Year : chr [1:3730] NA NA "2007" NA ...
## $ Total.Units : num [1:3730] NA NA 445 NA 956 706 281 NA NA 435 ...
## $ Property.Type : chr [1:3730] "Service Residence" "Apartment" "Apartment" "Flat" ...
## $ Parking.Lot : num [1:3730] 2 1 1 1 0 1 1 1 1 2 ...
## $ Floor.Range : chr [1:3730] NA "Low" "Medium" NA ...
## $ Land.Title : chr [1:3730] "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" "Non Bumi Lot" ...
## $ price : num [1:3730] 340000 250000 230000 158000 305000 425000 230000 200000 275000 300000 ...
## $ garden : num [1:3730] 1 0 0 0 0 0 0 0 0 0 ...
## $ surau : num [1:3730] 0 0 0 0 0 0 0 0 1 0 ...
## $ hospital : int [1:3730] 1 1 1 0 1 0 0 0 0 1 ...
## $ basketball : num [1:3730] 1 0 0 0 0 0 0 0 0 0 ...
## $ badminton : num [1:3730] 0 0 0 0 0 0 0 0 0 1 ...
## $ gymnasium : int [1:3730] 1 0 0 0 1 1 0 1 1 0 ...
## $ lift : int [1:3730] 0 0 1 0 0 1 0 1 1 0 ...
## $ minimart : int [1:3730] 0 0 1 1 1 1 0 1 1 0 ...
## $ parking : int [1:3730] 0 1 1 1 1 1 1 1 1 1 ...
## $ playground : int [1:3730] 0 1 1 1 0 1 1 1 1 0 ...
## $ sauna : int [1:3730] 0 0 0 0 0 1 0 0 0 0 ...
## $ security : int [1:3730] 0 1 1 0 1 1 1 1 1 1 ...
## $ busstop : num [1:3730] 1 0 0 0 1 0 0 1 1 1 ...
## $ school : num [1:3730] 1 0 0 0 1 0 0 1 1 0 ...
## $ mall : num [1:3730] 1 0 0 0 1 0 0 0 1 0 ...
## $ railway : int [1:3730] 0 0 1 0 0 0 0 0 0 0 ...
## $ highway : num [1:3730] 0 0 1 0 0 0 0 0 0 0 ...
## $ barbequearea : int [1:3730] 1 1 1 0 0 1 1 1 0 0 ...
## $ clubhouse : int [1:3730] 0 0 0 0 0 0 0 1 0 0 ...
## $ joggingtrack : int [1:3730] 1 1 1 1 0 0 0 1 0 0 ...
## $ multipurposehall: int [1:3730] 0 0 0 0 0 1 0 1 1 1 ...
## $ squashcourt : int [1:3730] 1 0 0 0 0 0 0 0 0 0 ...
## $ swimmingpool : int [1:3730] 1 1 0 0 0 1 0 1 0 1 ...
## $ tennis : int [1:3730] 0 0 0 0 0 1 0 0 0 0 ...
## $ State : chr [1:3730] "Kuala Lumpur" "Melaka" "Selangor" "Selangor" ...
## $ City : chr [1:3730] "Setapak" "Melaka City" "Kajang" "Ampang" ...
## $ No.of.Floors : num [1:3730] NA NA 13 NA 43 19 5 12 NA 435 ...
# list data types for each features
sapply(data,class)
## Bedroom Bathroom Property.Size Building.Name
## "numeric" "numeric" "numeric" "character"
## Developer Tenure.Type Completion.Year Total.Units
## "character" "character" "character" "numeric"
## Property.Type Parking.Lot Floor.Range Land.Title
## "character" "numeric" "character" "character"
## price garden surau hospital
## "numeric" "numeric" "numeric" "integer"
## basketball badminton gymnasium lift
## "numeric" "numeric" "integer" "integer"
## minimart parking playground sauna
## "integer" "integer" "integer" "integer"
## security busstop school mall
## "integer" "numeric" "numeric" "numeric"
## railway highway barbequearea clubhouse
## "integer" "numeric" "integer" "integer"
## joggingtrack multipurposehall squashcourt swimmingpool
## "integer" "integer" "integer" "integer"
## tennis State City No.of.Floors
## "integer" "character" "character" "numeric"
#Change numeric to integer as it will represent discrete variable
data$Bedroom<-as.integer(data$Bedroom)
data$Bathroom<-as.integer(data$Bathroom)
data$Completion.Year<-as.integer(data$Completion.Year)
data$Total.Units<-as.integer(data$Total.Units)
data$Parking.Lot <-as.integer(data$Parking.Lot)
data$garden <-as.integer(data$garden)
data$surau <-as.integer(data$surau)
data$surau <-as.integer(data$surau)
data$basketball <-as.integer(data$basketball)
data$badminton <-as.integer(data$badminton)
data$busstop <-as.integer(data$busstop)
data$school <-as.integer(data$school)
data$mall <-as.integer(data$mall)
data$highway <-as.integer(data$highway)
data$No.of.Floors <-as.integer(data$No.of.Floors)
data$Developer <- as.factor(data$Developer)
data$Tenure.Type <- as.factor(data$Tenure.Type)
data$Property.Type <- as.factor(data$Property.Type)
data$Floor.Range <- as.factor(data$Floor.Range)
data$Land.Title <- as.factor(data$Land.Title)
data$State <- as.factor(data$State)
data$City <- as.factor(data$City)
#summarize dataset
summary(data)
## Bedroom Bathroom Property.Size Building.Name
## Min. : 1.000 Min. :1.000 Min. : 1 Length:3730
## 1st Qu.: 3.000 1st Qu.:2.000 1st Qu.: 750 Class :character
## Median : 3.000 Median :2.000 Median : 900 Mode :character
## Mean : 2.916 Mean :2.018 Mean : 1038
## 3rd Qu.: 3.000 3rd Qu.:2.000 3rd Qu.: 1116
## Max. :10.000 Max. :8.000 Max. :122774
##
## Developer Tenure.Type Completion.Year
## Ideal Property Group : 67 Freehold :2264 Min. :1985
## Belleview Group : 61 Leasehold:1466 1st Qu.:2006
## Asia Green Group : 51 Median :2014
## IJM LAND BERHAD : 49 Mean :2011
## Syarikat Perumahan Negara Berhad: 29 3rd Qu.:2017
## (Other) :1908 Max. :2026
## NA's :1565 NA's :1829
## Total.Units Property.Type Parking.Lot Floor.Range
## Min. : 1.0 Condominium :1585 Min. : 0.000 High : 781
## 1st Qu.: 290.0 Apartment :1402 1st Qu.: 0.000 Low : 646
## Median : 462.0 Service Residence: 474 Median : 1.000 Medium:1315
## Mean : 613.3 Flat : 233 Mean : 1.046 NA's : 988
## 3rd Qu.: 754.0 Others : 14 3rd Qu.: 2.000
## Max. :7810.0 Studio : 13 Max. :10.000
## NA's :1721 (Other) : 9
## Land.Title price garden surau
## Bumi Lot : 604 Min. : 38000 Min. :0.0000 Min. :0.00000
## Malay Reserved: 7 1st Qu.: 250000 1st Qu.:0.0000 1st Qu.:0.00000
## Non Bumi Lot :3119 Median : 350000 Median :0.0000 Median :0.00000
## Mean : 421198 Mean :0.1016 Mean :0.07962
## 3rd Qu.: 490000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :6016000 Max. :1.0000 Max. :1.00000
##
## hospital basketball badminton gymnasium
## Min. :0.000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.000 Median :0.00000 Median :0.00000 Median :1.0000
## Mean :0.181 Mean :0.02386 Mean :0.03941 Mean :0.5185
## 3rd Qu.:0.000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.00000 Max. :1.00000 Max. :1.0000
##
## lift minimart parking playground
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :1.0000 Median :1.0000
## Mean :0.5601 Mean :0.4421 Mean :0.7944 Mean :0.7115
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
##
## sauna security busstop school
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :1.0000 Median :0.0000 Median :0.0000
## Mean :0.2796 Mean :0.7879 Mean :0.1834 Mean :0.2461
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
##
## mall railway highway barbequearea
## Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.00000 Median :0.0000
## Mean :0.1228 Mean :0.2617 Mean :0.03673 Mean :0.3807
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000
##
## clubhouse joggingtrack multipurposehall squashcourt
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.1651 Mean :0.3729 Mean :0.3697 Mean :0.1563
## 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
##
## swimmingpool tennis State City
## Min. :0.0000 Min. :0.0000 Selangor :1224 Johor Bahru: 235
## 1st Qu.:0.0000 1st Qu.:0.0000 Penang : 952 Cheras : 193
## Median :1.0000 Median :0.0000 Kuala Lumpur: 671 Ayer Itam : 187
## Mean :0.6249 Mean :0.1818 Johor : 393 Jelutong : 166
## 3rd Qu.:1.0000 3rd Qu.:0.0000 Sabah : 160 Shah Alam : 164
## Max. :1.0000 Max. :1.0000 Sarawak : 120 Bayan Lepas: 120
## (Other) : 210 (Other) :2665
## No.of.Floors
## Min. : 2.00
## 1st Qu.: 12.00
## Median : 20.00
## Mean : 21.77
## 3rd Qu.: 28.00
## Max. :504.00
## NA's :1570
# Check for unique values in each column
unique_counts <- sapply(data, n_distinct)
print(unique_counts)
## Bedroom Bathroom Property.Size Building.Name
## 8 8 839 1937
## Developer Tenure.Type Completion.Year Total.Units
## 581 2 41 502
## Property.Type Parking.Lot Floor.Range Land.Title
## 8 9 4 3
## price garden surau hospital
## 555 2 2 2
## basketball badminton gymnasium lift
## 2 2 2 2
## minimart parking playground sauna
## 2 2 2 2
## security busstop school mall
## 2 2 2 2
## railway highway barbequearea clubhouse
## 2 2 2 2
## joggingtrack multipurposehall squashcourt swimmingpool
## 2 2 2 2
## tennis State City No.of.Floors
## 2 16 188 61
# Check for missing values in each column
sapply(data, function(x) sum(is.na(x)))
## Bedroom Bathroom Property.Size Building.Name
## 0 0 0 0
## Developer Tenure.Type Completion.Year Total.Units
## 1565 0 1829 1721
## Property.Type Parking.Lot Floor.Range Land.Title
## 0 0 988 0
## price garden surau hospital
## 0 0 0 0
## basketball badminton gymnasium lift
## 0 0 0 0
## minimart parking playground sauna
## 0 0 0 0
## security busstop school mall
## 0 0 0 0
## railway highway barbequearea clubhouse
## 0 0 0 0
## joggingtrack multipurposehall squashcourt swimmingpool
## 0 0 0 0
## tennis State City No.of.Floors
## 0 0 0 1570
# Calculate the percentage of missing values in each column
sapply(data, function(x) sum(is.na(x)) / nrow(data)) * 100
## Bedroom Bathroom Property.Size Building.Name
## 0.00000 0.00000 0.00000 0.00000
## Developer Tenure.Type Completion.Year Total.Units
## 41.95710 0.00000 49.03485 46.13941
## Property.Type Parking.Lot Floor.Range Land.Title
## 0.00000 0.00000 26.48794 0.00000
## price garden surau hospital
## 0.00000 0.00000 0.00000 0.00000
## basketball badminton gymnasium lift
## 0.00000 0.00000 0.00000 0.00000
## minimart parking playground sauna
## 0.00000 0.00000 0.00000 0.00000
## security busstop school mall
## 0.00000 0.00000 0.00000 0.00000
## railway highway barbequearea clubhouse
## 0.00000 0.00000 0.00000 0.00000
## joggingtrack multipurposehall squashcourt swimmingpool
## 0.00000 0.00000 0.00000 0.00000
## tennis State City No.of.Floors
## 0.00000 0.00000 0.00000 42.09115
Create visualizations to understand the distribution of variables.
# Based on the numeric columns to include in the box plot
numeric_columns <- c("Property.Size", "Completion.Year", "Total.Units", "Parking.Lot", "price", "No.of.Floors")
# A separate box plots for each numeric variable
par(mfrow = c(3, 2)) # Set the layout to 3 rows and 2 columns for the plots
for (col in numeric_columns) {
boxplot(data[, col], col = "skyblue", main = paste("Box Plot for", col),horizontal = TRUE)
}
# Reset the layout to default
par(mfrow = c(1, 1))
#create histogram of values for Property.Size
ggplot(data=data, aes(x=Property.Size)) +
geom_histogram(fill="steelblue", color="black") +
ggtitle("Histogram of Property Size")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#create histogram of values for Completion.Year
ggplot(data=data, aes(x=Completion.Year)) +
geom_histogram(fill="steelblue", color="black") +
ggtitle("Histogram of Property's Completion Year")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1829 rows containing non-finite values (`stat_bin()`).
#create histogram of values for Total.Units
ggplot(data=data, aes(x=Total.Units)) +
geom_histogram(fill="steelblue", color="black") +
ggtitle("Histogram of Property's Total Units")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1721 rows containing non-finite values (`stat_bin()`).
#create histogram of values for Parking.Lot
ggplot(data=data, aes(x=Parking.Lot)) +
geom_histogram(fill="steelblue", color="black") +
ggtitle("Histogram of Property's Total Parking Lot")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#create histogram of values for price
ggplot(data=data, aes(x=price)) +
geom_histogram(fill="steelblue", color="black") +
ggtitle("Histogram of Property's Price Values")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#create histogram of values for No.of.Floors
ggplot(data=data, aes(x=No.of.Floors)) +
geom_histogram(fill="steelblue", color="black") +
ggtitle("Histogram of Property's No of Floors")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 1570 rows containing non-finite values (`stat_bin()`).
# Top 10 categories for building name
top_10_bncounts <- head(sort(table(data$Building.Name), decreasing = TRUE), 10)
# Labels for the top 10 counts
top_10_bnlabels <- names(top_10_bncounts)
# Bar plot for the top 10 building name's categories with labels
barplot(top_10_bncounts, col = "skyblue",
main = "Bar Plot of Building.Name (Top 10)", ylab = "Count",
names.arg = top_10_bnlabels, las = 2)
# Top 10 categories for developer
top_10_dcounts <- head(sort(table(data$Developer), decreasing = TRUE), 10)
# Labels for the top 10 counts
top_10_dlabels <- names(top_10_dcounts)
# Bar plot for the top 10 developer categories with labels
barplot(top_10_dcounts, col = "skyblue",
main = "Bar Plot of Developer (Top 10)", ylab = "Count",
names.arg = top_10_dlabels, las = 2)
# Bar plot based on the number of bedrooms
barplot(table(data$Bedroom), col = "skyblue", main = "Bar Plot of Bedroom", xlab = "No of Bedroom", ylab = "Count")
# Bar plot based on the number of Bathroom
barplot(table(data$Bathroom), col = "skyblue", main = "Bar Plot of Bathroom", xlab = "No of Bathroom", ylab = "Count")
# Bar plot based on the number of Tenure.Type
ggplot(data, aes(x = Tenure.Type, fill = Tenure.Type)) + geom_bar(color = "black") + ggtitle("Bar Plot of Tenure Type") + xlab("Tenure Type") + ylab("Count") + theme_minimal()
# Bar plot based on the number of Property.Type
ggplot(data, aes(x = Property.Type, fill = Property.Type)) + geom_bar(color = "black") + ggtitle("Bar Plot of Property Type") + xlab("Property Type") + ylab("Count") + theme_minimal()
# Bar plot based on the number of Floor.Range
ggplot(data, aes(x = Floor.Range, fill = Floor.Range)) + geom_bar(color = "black") + ggtitle("Bar Plot of Floor Range") + xlab("Floor Range") + ylab("Count") + theme_minimal()
# bar plot based on the number of State
ggplot(data, aes(x = State, fill = State)) + geom_bar(color = "black") + ggtitle("Bar Plot of State") + xlab("State") + ylab("Count") + theme_minimal()+theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Top 10 city categories
top_10_ccounts <- head(sort(table(data$City), decreasing = TRUE), 10)
top_10_clabels <- names(top_10_ccounts)
top_10_cdata <- data.frame(City = names(top_10_ccounts), Freq = as.vector(top_10_ccounts))
# Reorder the levels of City based on the frequency count
top_10_cdata$City <- factor(top_10_cdata$City, levels = top_10_cdata$City[order(top_10_cdata$Freq)])
ggplot(top_10_cdata, aes(x = City, y = Freq, fill = City)) +
geom_bar(stat = "identity", color = "black") +
ggtitle("Bar Plot of City (Top 10)") +
xlab("City") +
ylab("Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# List of variables to create plots for
variables <- c('garden', 'surau', 'basketball', 'badminton', 'gymnasium',
'lift', 'playground', 'sauna', 'security', 'barbequearea',
'clubhouse', 'joggingtrack')
# List to store individual ggplot objects
plots <- list()
#Loop through each variable and create a plot
for (variable in variables) {
plot <- ggplot(data, aes(x = factor(.data[[variable]]), fill = factor(.data[[variable]]))) +
geom_bar() +
scale_fill_manual(values = c("red", "green"), labels = c("No", "Yes")) +
scale_x_discrete(labels = c("No", "Yes")) +
labs(x = variable, y = "Count") +
theme_minimal() +
theme(legend.position = "none") # Remove the legend
# Add the plot to the list
plots[[variable]] <- plot
}
# Arrange and print the plots
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.3.2
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(grobs = plots, ncol = 3)
# List of variables to create plots for
variables1<- c('multipurposehall', 'squashcourt',
'swimmingpool', 'tennis', 'hospital', 'school', 'minimart',
'mall', 'parking', 'busstop', 'railway', 'highway')
# Create a list to store individual ggplot objects
plots <- list()
#Loop through each variable and create a plot
for (variable in variables1) {
plot <- ggplot(data, aes(x = factor(.data[[variable]]), fill = factor(.data[[variable]]))) +
geom_bar() +
scale_fill_manual(values = c("red", "green"), labels = c("No", "Yes")) +
scale_x_discrete(labels = c("No", "Yes")) +
labs(x = variable, y = "Count") +
theme_minimal() +
theme(legend.position = "none") # Remove the legend
# Add the plot to the list
plots[[variable]] <- plot
}
# Arrange and print the plots
library(gridExtra)
grid.arrange(grobs = plots, ncol = 3)
#create scatterplot of Property.Siz vs. price, using cut as color variable
ggplot(data=data, aes(y=Property.Size, x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Completion.Year vs. price, using cut as color variable
ggplot(data=data, aes(y=Completion.Year, x=price, color=State)) +
geom_point()
## Warning: Removed 1829 rows containing missing values (`geom_point()`).
#create scatterplot of Bedroom vs. price, using cut as color variable
ggplot(data=data, aes(y=Bedroom , x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Bathroom vs. price, using cut as color variable
ggplot(data=data, aes(y=Bathroom , x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Bedroom vs. price, using cut as color variable
ggplot(data=data, aes(y=Parking.Lot , x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Total.Units vs. price, using cut as color variable
ggplot(data=data, aes(y=Total.Units , x=price, color=Floor.Range)) +
geom_point()
## Warning: Removed 1721 rows containing missing values (`geom_point()`).
#create scatterplot of Bedroom vs. price, using cut as color variable
ggplot(data=data, aes(y=No.of.Floors , x=price, color=Floor.Range)) +
geom_point()
## Warning: Removed 1570 rows containing missing values (`geom_point()`).
correlation_test <- cor.test(data$Property.Size, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$Property.Size and data$price
## t = 11.368, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1518334 0.2138725
## sample estimates:
## cor
## 0.1830352
#There is a positive correlation between the property size and the price. As the property size increases, the price tends to increase.
correlation_test <- cor.test(data$Total.Units, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$Total.Units and data$price
## t = -5.3906, df = 2007, p-value = 7.847e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.16235060 -0.07613132
## sample estimates:
## cor
## -0.1194662
#There is a negative correlation between the total number of units and the price. As the total number of units increases, the price tends to decrease.
correlation_test <- cor.test(data$No.of.Floors, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$No.of.Floors and data$price
## t = 4.8263, df = 2158, p-value = 1.489e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.0614283 0.1448811
## sample estimates:
## cor
## 0.1033366
#There is a positive correlation between the number of floors and the price. As the number of floors increases, the price tends to increase.
correlation_test <- cor.test(data$Bathroom, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$Bathroom and data$price
## t = 43.172, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5555288 0.5983366
## sample estimates:
## cor
## 0.5773293
#There is a strong positive correlation between the number of bathrooms and the price. As the number of bathrooms increases, the price tends to increase significantly.
correlation_test <- cor.test(data$Bedroom, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$Bedroom and data$price
## t = 20.499, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2891398 0.3468308
## sample estimates:
## cor
## 0.3182799
#There is a moderate positive correlation between the number of bedrooms and the price. As the number of bedrooms increases, the price tends to increase.
correlation_test <- cor.test(data$Parking.Lot, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$Parking.Lot and data$price
## t = 29.774, df = 3728, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4120109 0.4638772
## sample estimates:
## cor
## 0.4383088
#There is a strong positive correlation between the presence of a parking lot and the price. Properties with parking lots tend to have higher prices.
correlation_test <- cor.test(data$Completion.Year, data$price)
print(correlation_test)
##
## Pearson's product-moment correlation
##
## data: data$Completion.Year and data$price
## t = 9.8411, df = 1899, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1770773 0.2626386
## sample estimates:
## cor
## 0.2202816
#There is a positive correlation between the completion year and the price. Generally, more recently completed properties tend to have higher prices.
#Use ANOVA when comparing means of a numeric variable across different groups (categorical variable with more than two levels).Use Chi-Square when testing the association or independence between two categorical variables.
# Assuming 'Category' is a categorical variable and 'DependentOutput' is numeric
chi_sq_result <- chisq.test(data$Tenure.Type, data$price)
## Warning in chisq.test(data$Tenure.Type, data$price): Chi-squared approximation
## may be incorrect
print(chi_sq_result)
##
## Pearson's Chi-squared test
##
## data: data$Tenure.Type and data$price
## X-squared = 739.94, df = 554, p-value = 1.885e-07
# Check if the p-value is less than 0.05
if (chi_sq_result$p.value <= 0.05) {
cat("The association is statistically significant.\n")
} else {
cat("The association is not statistically significant.\n")
}
## The association is statistically significant.
#We perform ANOVA test when the category has more than 2 parameter
# Perform ANOVA for Developer and Price
anova_result <- aov(price ~ Developer, data = data)
# Print the result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## Developer 579 1.676e+14 2.895e+11 3.322 <2e-16 ***
## Residuals 1585 1.381e+14 8.715e+10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1565 observations deleted due to missingness
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Developer", "Pr(>F)"] <= 0.05) {
cat("The means are significantly different.\n")
} else {
cat("There is no significant difference in means.\n")
}
## The means are significantly different.
# Perform ANOVA for Floor.Range and Price
anova_result <- aov(price ~ Floor.Range, data = data)
# Print the result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## Floor.Range 2 4.817e+12 2.409e+12 22.05 3.15e-10 ***
## Residuals 2739 2.991e+14 1.092e+11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 988 observations deleted due to missingness
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Floor.Range", "Pr(>F)"] <= 0.05) {
cat("The means are significantly different.\n")
} else {
cat("There is no significant difference in means.\n")
}
## The means are significantly different.
# Perform ANOVA for Property.Type and Price
anova_result <- aov(price ~ Property.Type, data = data)
# Print the result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## Property.Type 7 8.376e+13 1.197e+13 140.7 <2e-16 ***
## Residuals 3722 3.166e+14 8.507e+10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Property.Type", "Pr(>F)"] <= 0.05) {
cat("The means are significantly different.\n")
} else {
cat("There is no significant difference in means.\n")
}
## The means are significantly different.
# Perform ANOVA for Land.Title and Price
anova_result <- aov(price ~ Land.Title, data = data)
# Print the result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## Land.Title 2 1.235e+13 6.176e+12 59.32 <2e-16 ***
## Residuals 3727 3.880e+14 1.041e+11
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["Land.Title", "Pr(>F)"] <= 0.05) {
cat("The means are significantly different.\n")
} else {
cat("There is no significant difference in means.\n")
}
## The means are significantly different.
## Perform ANOVA for State and Price
anova_result <- aov(price ~ State, data = data)
# Print the result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## State 15 4.891e+13 3.261e+12 34.45 <2e-16 ***
## Residuals 3714 3.515e+14 9.463e+10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["State", "Pr(>F)"] <= 0.05) {
cat("The means are significantly different.\n")
} else {
cat("There is no significant difference in means.\n")
}
## The means are significantly different.
## Perform ANOVA for City and Price
anova_result <- aov(price ~ City, data = data)
# Print the result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## City 187 1.089e+14 5.822e+11 7.075 <2e-16 ***
## Residuals 3542 2.915e+14 8.230e+10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Check if the p-value is less than 0.05
if (summary(anova_result)[[1]]["City", "Pr(>F)"] <= 0.05) {
cat("The means are significantly different.\n")
} else {
cat("There is no significant difference in means.\n")
}
## The means are significantly different.
Data pre-processing and feature engineering are crucial steps in the data analysis and machine learning pipeline. They involve preparing and transforming the raw data to make it suitable for analysis or model training.
#Outlier Removal
#Using Tukey's method for identifying outliers in Property Size.
# Set a threshold for outliers (e.g., values outside the range of Q1 - 1.5*IQR to Q3 + 1.5*IQR)
Q1 <- quantile(data$Property.Size, 0.25)
Q3 <- quantile(data$Property.Size, 0.75)
IQR <- Q3 - Q1
# Define the lower and upper bounds for outliers
lower_bound <- Q1 - 1.5* IQR
upper_bound <- Q3 + 1.5 * IQR
# Remove outliers
data <- data[data$Property.Size >= lower_bound & data$Property.Size <= upper_bound, ]
#Outlier Adjustment
Adata <- data[complete.cases(data$No.of.Floors), ]
Bdata <- data[complete.cases(data$Total.Units), ]
Cdata <- data[complete.cases(data$Parking.Lot), ]
Ddata <- data[complete.cases(data$Bathroom), ]
# Winsorization function
winsorize <- function(x, trim = 0.05) {
q <- quantile(x, c(trim, 1 - trim), na.rm = TRUE)
x[x < q[1]] <- q[1]
x[x > q[2]] <- q[2]
return(x)
}
# Apply winsorization to No.of.Floors
Adjusted_medianNF <- median(winsorize(Adata$No.of.Floors))
Adjusted_medianTU <- median(winsorize(Bdata$Total.Units))
Adjusted_medianPL <- median(winsorize(Cdata$Parking.Lot))
Adjusted_medianb <- median(winsorize(Ddata$Bathroom))
# Impute outliers with the median
data$No.of.Floors[which(data$No.of.Floors >120)]<-Adjusted_medianNF
data$Total.Units[which(data$Total.Units >4000)] <-Adjusted_medianTU
data$Parking.Lot[which(data$Parking.Lot >6)] <-Adjusted_medianPL
data$Bathroom[which(data$Bathroom >6)] <-Adjusted_medianb
##Imputation
# Install and load the necessary packages
library(VIM)
## Warning: package 'VIM' was built under R version 4.3.2
## Loading required package: colorspace
## Warning: package 'colorspace' was built under R version 4.3.2
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
##
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
##
## sleep
# Perform k-nearest neighbors imputation on 'Completion.Year'
imputed_data <- kNN(data , variable = "Completion.Year")
imputed_data1 <- kNN(data , variable = "Floor.Range")
imputed_data2 <-kNN(data,variable="No.of.Floors")
imputed_data3 <-kNN(data,variable="Total.Units")
imputed_data4 <-kNN(data,variable="Developer")
# Replace the imputed 'Completion.Year' back into the original dataset
data$Completion.Year <- imputed_data$Completion.Year
data$Floor.Range <- imputed_data1$Floor.Range
data$No.of.Floors <- imputed_data2$No.of.Floors
data$Total.Units <- imputed_data3$Total.Units
data$Developer <- imputed_data4$Developer
#Feature engineering
#Calculate the age of the property by subtracting the Completion Year from the current year.
data <- subset(data, Completion.Year < 2024)
data$Property.Age <- as.numeric(format(Sys.Date(), "%Y")) - data$Completion.Year
#Check for missing values in each column
sapply(data, function(x) sum(is.na(x)))
## Bedroom Bathroom Property.Size Building.Name
## 0 0 0 0
## Developer Tenure.Type Completion.Year Total.Units
## 0 0 0 0
## Property.Type Parking.Lot Floor.Range Land.Title
## 0 0 0 0
## price garden surau hospital
## 0 0 0 0
## basketball badminton gymnasium lift
## 0 0 0 0
## minimart parking playground sauna
## 0 0 0 0
## security busstop school mall
## 0 0 0 0
## railway highway barbequearea clubhouse
## 0 0 0 0
## joggingtrack multipurposehall squashcourt swimmingpool
## 0 0 0 0
## tennis State City No.of.Floors
## 0 0 0 0
## Property.Age
## 0
#Count the total number of facilities provided for each property.
data$Total_Facilities <- rowSums(data[, c('garden', 'surau', 'basketball', 'badminton', 'gymnasium', 'lift', 'playground', 'sauna', 'security', 'barbequearea', 'clubhouse', 'joggingtrack', 'multipurposehall', 'squashcourt', 'swimmingpool', 'tennis')])
#Count the total number of surrounding amenities near for each property.
data$Total_surroundingamenities<- rowSums(data[, c('hospital','school','minimart','mall')])
#Count the total number of transportation advantage for each property.
data$Total_surroundingamenities<- rowSums(data[, c('parking','busstop','railway','highway')])
# Drop the specified columns
data <- data[, -which(names(data) %in% c(
'garden', 'surau', 'basketball', 'badminton', 'gymnasium', 'lift', 'playground', 'sauna', 'security',
'barbequearea', 'clubhouse', 'joggingtrack', 'multipurposehall', 'squashcourt', 'swimmingpool', 'tennis',
'hospital', 'school', 'minimart', 'mall', 'parking', 'busstop', 'railway', 'highway'
))]
summary(data)
## Bedroom Bathroom Property.Size Building.Name
## Min. :1.00 Min. :1.000 Min. : 280.0 Length:3510
## 1st Qu.:3.00 1st Qu.:2.000 1st Qu.: 736.0 Class :character
## Median :3.00 Median :2.000 Median : 887.0 Mode :character
## Mean :2.86 Mean :1.943 Mean : 928.3
## 3rd Qu.:3.00 3rd Qu.:2.000 3rd Qu.:1091.0
## Max. :5.00 Max. :5.000 Max. :1661.0
##
## Developer Tenure.Type Completion.Year
## Tasek Maju Realty Sdn Bhd : 101 Freehold :2118 Min. :1985
## Ideal Property Group : 98 Leasehold:1392 1st Qu.:2005
## Syarikat Perumahan Negara Berhad: 90 Median :2014
## Belleview Group : 85 Mean :2011
## Hunza Properties Berhad : 73 3rd Qu.:2016
## Asia Green Group : 59 Max. :2023
## (Other) :3004
## Total.Units Property.Type Parking.Lot Floor.Range
## Min. : 1.0 Condominium :1437 Min. :0.0000 High : 939
## 1st Qu.: 280.0 Apartment :1366 1st Qu.:0.0000 Low : 800
## Median : 420.0 Service Residence: 453 Median :1.0000 Medium:1771
## Mean : 518.5 Flat : 230 Mean :0.9823
## 3rd Qu.: 670.0 Studio : 13 3rd Qu.:2.0000
## Max. :3600.0 Others : 7 Max. :5.0000
## (Other) : 4
## Land.Title price State City
## Bumi Lot : 576 Min. : 38000 Selangor :1184 Johor Bahru: 214
## Malay Reserved: 4 1st Qu.: 248000 Penang : 890 Cheras : 190
## Non Bumi Lot :2930 Median : 344000 Kuala Lumpur: 628 Ayer Itam : 180
## Mean : 382530 Johor : 363 Jelutong : 163
## 3rd Qu.: 470000 Sabah : 143 Shah Alam : 161
## Max. :2300000 Sarawak : 109 Kajang : 117
## (Other) : 193 (Other) :2485
## No.of.Floors Property.Age Total_Facilities Total_surroundingamenities
## Min. : 2.00 Min. : 1.0 Min. : 0.000 Min. :0.000
## 1st Qu.: 5.25 1st Qu.: 8.0 1st Qu.: 2.000 1st Qu.:1.000
## Median :16.00 Median :10.0 Median : 5.000 Median :1.000
## Mean :17.39 Mean :13.3 Mean : 5.202 Mean :1.281
## 3rd Qu.:24.00 3rd Qu.:19.0 3rd Qu.: 8.000 3rd Qu.:2.000
## Max. :63.00 Max. :39.0 Max. :15.000 Max. :4.000
##
Explore patterns within subgroups of the data.
# list data types for each features
sapply(data,class)
## Bedroom Bathroom
## "integer" "numeric"
## Property.Size Building.Name
## "numeric" "character"
## Developer Tenure.Type
## "factor" "factor"
## Completion.Year Total.Units
## "integer" "numeric"
## Property.Type Parking.Lot
## "factor" "numeric"
## Floor.Range Land.Title
## "factor" "factor"
## price State
## "numeric" "factor"
## City No.of.Floors
## "factor" "numeric"
## Property.Age Total_Facilities
## "numeric" "numeric"
## Total_surroundingamenities
## "numeric"
#correlation test
numeric_data <- data[, sapply(data, is.numeric)]
# Now, you can use pairs function
pairs(numeric_data)
sum(is.na(numeric_data))
## [1] 0
correlation_matrix <- cor(numeric_data)
print(correlation_matrix)
## Bedroom Bathroom Property.Size
## Bedroom 1.00000000 0.59292229 0.52388971
## Bathroom 0.59292229 1.00000000 0.60440645
## Property.Size 0.52388971 0.60440645 1.00000000
## Completion.Year -0.09555783 0.03538928 0.14232579
## Total.Units -0.07776785 -0.03417073 -0.06417774
## Parking.Lot 0.17974323 0.28891621 0.39726460
## price 0.13985341 0.38646225 0.62066965
## No.of.Floors -0.01478028 0.17618437 0.25103494
## Property.Age 0.09555783 -0.03538928 -0.14232579
## Total_Facilities 0.01521986 0.19884093 0.32962134
## Total_surroundingamenities 0.05811323 0.07991480 0.07324648
## Completion.Year Total.Units Parking.Lot price
## Bedroom -0.09555783 -0.07776785 0.17974323 0.13985341
## Bathroom 0.03538928 -0.03417073 0.28891621 0.38646225
## Property.Size 0.14232579 -0.06417774 0.39726460 0.62066965
## Completion.Year 1.00000000 0.21945101 0.20697150 0.27664688
## Total.Units 0.21945101 1.00000000 0.07962889 0.04601732
## Parking.Lot 0.20697150 0.07962889 1.00000000 0.45607481
## price 0.27664688 0.04601732 0.45607481 1.00000000
## No.of.Floors 0.30651056 0.37117103 0.36711262 0.48473039
## Property.Age -1.00000000 -0.21945101 -0.20697150 -0.27664688
## Total_Facilities 0.28646226 0.18208918 0.38858592 0.37049140
## Total_surroundingamenities -0.02971119 0.11298602 0.07970466 0.05032444
## No.of.Floors Property.Age Total_Facilities
## Bedroom -0.01478028 0.09555783 0.01521986
## Bathroom 0.17618437 -0.03538928 0.19884093
## Property.Size 0.25103494 -0.14232579 0.32962134
## Completion.Year 0.30651056 -1.00000000 0.28646226
## Total.Units 0.37117103 -0.21945101 0.18208918
## Parking.Lot 0.36711262 -0.20697150 0.38858592
## price 0.48473039 -0.27664688 0.37049140
## No.of.Floors 1.00000000 -0.30651056 0.47627068
## Property.Age -0.30651056 1.00000000 -0.28646226
## Total_Facilities 0.47627068 -0.28646226 1.00000000
## Total_surroundingamenities 0.11974253 0.02971119 0.33400489
## Total_surroundingamenities
## Bedroom 0.05811323
## Bathroom 0.07991480
## Property.Size 0.07324648
## Completion.Year -0.02971119
## Total.Units 0.11298602
## Parking.Lot 0.07970466
## price 0.05032444
## No.of.Floors 0.11974253
## Property.Age 0.02971119
## Total_Facilities 0.33400489
## Total_surroundingamenities 1.00000000
#create scatterplot of Property.Siz vs. price, using cut as color variable
ggplot(data=data, aes(y=price, x=Property.Size, color=Property.Type)) +
geom_point()
#create scatterplot of Completion.Year vs. price, using cut as color variable
ggplot(data=data, aes(y=Completion.Year, x=price, color=State)) +
geom_point()
#create scatterplot of Bedroom vs. price, using cut as color variable
ggplot(data=data, aes(y=Bedroom , x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Bathroom vs. price, using cut as color variable
ggplot(data=data, aes(y=Bathroom , x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Bedroom vs. price, using cut as color variable
ggplot(data=data, aes(y=Parking.Lot , x=price, color=Property.Type)) +
geom_point()
#create scatterplot of Total.Units vs. price, using cut as color variable
ggplot(data=data, aes(y=Total.Units , x=price, color=Floor.Range)) +
geom_point()
#create scatterplot of Bedroom vs. price, using cut as color variable
ggplot(data=data, aes(y=No.of.Floors , x=price, color=Floor.Range)) +
geom_point()
# Select numeric variables
numeric_data <- subset(data, select = c("Bedroom", "Bathroom", "Property.Size", "Completion.Year", "Total.Units", "price", "No.of.Floors", "Property.Age", "Total_Facilities", "Total_surroundingamenities"))
# Calculate the correlation matrix
cor_matrix <- cor(numeric_data)
# Convert the correlation matrix to long format
cor_long <- reshape2::melt(cor_matrix)
# Create a ggplot heatmap
ggplot(cor_long, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient(low = "white", high = 'red') +
labs(title = "Correlation Heatmap", x = "Variables", y = "Variables")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
Positive Correlations:
The number of bedrooms and bathrooms have a strong positive correlation (0.57). Property size and the number of bedrooms also show a positive correlation (0.52). Property size and the number of bathrooms also show a positive correlation (0.58). Property size and the price of the property exhibit a positive correlation (0.62). The number of bathrooms and the price of the property have a positive correlation (0.37). Total number of units and the number of floors show a positive correlation (0.37). Parking lot and the price of the property have a positive correlation (0.46). The positive correlation (0.37) between the total number of facilities and the price implies that properties with more facilities may have higher prices.
Negative Correlations:
The completion year and property age have a negative correlation (-1.00), indicating that as the completion year increases, the property age decreases. The price and Property.Age have a slight negative correlation ( -0.28).This suggests that, on average, newer properties may be priced higher than older ones.
Other Observations:
The completion year and the number of floors, as well as the completion year and the total number of units, have positive correlations. Property age has negative correlations with completion year and the total number of facilities.
Strong Correlations:
There are strong positive correlations between “Total_Facilities” and the number of floors (0.48), as well as between “Total_surroundingamenities” and “Total_Facilities” (0.33). The strong positive correlation (0.48) between the total number of facilities and the number of floors suggests that properties with more floors may have more facilities.
#bar plot for each categorical variable with mean price
ggplot(data, aes(x = Floor.Range, y = price, fill = Floor.Range)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across Floor Ranges",
x = "Floor.Range",
y = "Mean Price")
ggplot(data, aes(x = Tenure.Type, y = price, fill = Tenure.Type)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across Tenure Type",
x = " Tenure Type",
y = "Mean Price")
ggplot(data, aes(x = Property.Type, y = price, fill = Property.Type)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across Property Type",
x = "Property Type",
y = "Mean Price")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data, aes(x = Land.Title, y = price, fill = Land.Title)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across Land Title",
x = "Land Title",
y = "Mean Price")
ggplot(data, aes(x = State, y = price, fill = Property.Type)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across State",
x = "State",
y = "Mean Price")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data, aes(x = Bedroom, y = price, fill = Property.Type)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across Bedroom",
x = "Bedroom",
y = "Mean Price")
ggplot(data, aes(x = Bathroom, y = price, fill = Property.Type)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Price across Bathroom",
x = "Bathroom",
y = "Mean Price")
ggplot(data, aes(x = Property.Type, y = Property.Size, fill = Property.Type)) +
stat_summary(fun = "mean", geom = "bar", position = "dodge") +
labs(title = "Comparison of Mean Property Size across Property Type",
x = "Property.Type",
y = "Property.Size")+theme(axis.text.x = element_text(angle = 45, hjust = 1))
The property price tends to increase with a higher floor range, suggesting a positive correlation between floor range and average property price.
Freehold properties are generally more expensive than leasehold properties.
Condominiums typically have the highest average prices, while flats tend to have lower prices among different property types.
Non-bumi lot properties are generally priced higher than bumi lot properties.
Penang stands out with the highest property prices, likely attributed to the greater availability of service residences and condominiums compared to other areas.
Service residences, among various property types, tend to have the highest number of bathrooms.
Townhouse condos are observed to have the highest prices compared to other property types.
Property size exhibits a notable correlation with property type, with flats commonly being smaller and condominiums or service residences having larger sizes and higher prices.
Property prices are influenced by various factors, including the number of floors, property size, the number of bathrooms, and the total number of facilities.
The heatmap reveals correlations indicating that property size, the number of bathrooms, and the number of bedrooms are interrelated. Additionally, property price is notably affected by the number of floors, property size, the number of bathrooms, and the number of facilities.
In this section, we load and prepare the dataset for analysis. The data is read from a CSV file, and specific columns are selected and converted to factors. This step is crucial for understanding the structure and type of data we are dealing with, which informs further data processing and analysis.
# Reading the data - replace 'file_path' with the actual path of your CSV file
data <- data[, c(1, 2, 3, 7, 8, 10, 16, 17, 18 ,19, 4, 5, 9, 11 ,12, 14, 15, 13, 6 )]
data <- as.data.frame(lapply(data, as.factor))
sapply(data,class)
## Bedroom Bathroom
## "factor" "factor"
## Property.Size Completion.Year
## "factor" "factor"
## Total.Units Parking.Lot
## "factor" "factor"
## No.of.Floors Property.Age
## "factor" "factor"
## Total_Facilities Total_surroundingamenities
## "factor" "factor"
## Building.Name Developer
## "factor" "factor"
## Property.Type Floor.Range
## "factor" "factor"
## Land.Title State
## "factor" "factor"
## City price
## "factor" "factor"
## Tenure.Type
## "factor"
# Classification And REgression Training
library(caret)
# Classification and Visualisation (Naive Bayes)
library(klaR)
## Warning: package 'klaR' was built under R version 4.3.2
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
# Classification and Regression with Random Forest
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.3.2
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
# Classification functions for k-nearest neighbour
library(class)
## Warning: package 'class' was built under R version 4.3.2
# Machine Learning Benchmark Problems
library(mlbench)
## Warning: package 'mlbench' was built under R version 4.3.2
# Multivariate regression methods (PLSR)
library(pls)
## Warning: package 'pls' was built under R version 4.3.2
##
## Attaching package: 'pls'
## The following object is masked from 'package:caret':
##
## R2
## The following object is masked from 'package:stats':
##
## loadings
library(caret)
library(klaR)
library(randomForest)
library(class)
library(mlbench)
library(pls)
library(ggplot2)
trainIndex<-createDataPartition(data$Tenure.Type, p=0.80,list=F)
data_train<-data[trainIndex,]
data_test<-data[-trainIndex,]
The dataset is split into training and testing sets. Post model fitting, we predict on the test data and evaluate the model’s performance using a confusion matrix. Visualization of the confusion matrix is also provided for better understanding of the model’s performance.
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 1
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 2
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 3
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 4
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 5
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 6
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 7
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 8
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 9
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 10
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 11
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 12
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 13
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 14
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 15
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 16
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 17
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 18
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 19
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 20
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 21
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 22
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 23
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 24
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 25
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 26
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 27
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 28
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 29
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 30
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 31
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 32
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 33
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 34
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 35
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 36
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 37
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 38
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 39
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 40
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 41
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 42
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 43
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 44
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 45
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 46
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 47
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 48
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 49
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 50
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 51
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 52
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 53
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 54
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 55
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 56
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 57
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 58
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 59
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 60
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 61
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 62
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 63
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 64
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 65
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 66
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 67
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 68
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 69
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 70
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 71
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 72
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 73
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 74
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 75
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 76
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 77
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 78
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 79
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 80
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 81
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 82
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 83
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 84
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 85
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 86
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 87
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 88
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 89
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 90
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 91
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 92
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 93
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 94
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 95
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 96
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 97
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 98
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 99
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 100
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 101
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 102
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 103
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 104
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 105
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 106
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 107
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 108
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 109
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 110
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 111
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 112
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 113
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 114
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 115
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 116
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 117
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 118
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 119
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 120
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 121
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 122
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 123
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 124
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 125
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 126
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 127
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 128
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 129
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 130
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 131
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 132
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 133
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 134
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 135
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 136
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 137
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 138
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 139
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 140
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 141
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 142
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 143
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 144
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 145
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 146
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 147
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 148
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 149
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 150
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 151
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 152
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 153
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 154
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 155
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 156
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 157
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 158
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 159
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 160
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 161
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 162
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 163
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 164
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 165
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 166
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 167
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 168
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 169
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 170
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 171
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 172
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 173
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 174
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 175
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 176
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 177
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 178
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 179
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 180
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 181
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 182
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 183
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 184
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 185
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 186
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 187
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 188
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 189
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 190
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 191
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 192
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 193
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 194
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 195
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 196
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 197
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 198
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 199
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 200
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 201
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 202
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 203
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 204
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 205
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 206
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 207
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 208
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 209
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 210
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 211
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 212
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 213
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 214
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 215
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 216
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 217
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 218
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 219
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 220
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 221
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 222
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 223
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 224
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 225
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 226
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 227
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 228
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 229
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 230
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 231
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 232
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 233
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 234
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 235
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 236
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 237
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 238
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 239
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 240
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 241
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 242
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 243
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 244
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 245
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 246
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 247
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 248
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 249
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 250
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 251
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 252
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 253
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 254
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 255
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 256
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 257
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 258
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 259
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 260
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 261
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 262
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 263
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 264
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 265
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 266
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 267
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 268
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 269
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 270
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 271
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 272
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 273
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 274
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 275
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 276
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 277
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 278
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 279
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 280
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 281
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 282
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 283
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 284
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 285
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 286
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 287
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 288
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 289
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 290
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 291
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 292
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 293
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 294
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 295
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 296
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 297
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 298
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 299
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 300
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 301
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 302
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 303
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 304
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 305
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 306
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 307
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 308
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 309
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 310
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 311
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 312
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 313
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 314
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 315
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 316
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 317
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 318
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 319
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 320
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 321
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 322
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 323
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 324
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 325
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 326
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 327
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 328
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 329
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 330
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 331
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 332
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 333
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 334
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 335
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 336
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 337
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 338
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 339
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 340
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 341
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 342
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 343
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 344
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 345
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 346
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 347
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 348
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 349
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 350
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 351
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 352
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 353
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 354
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 355
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 356
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 357
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 358
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 359
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 360
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 361
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 362
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 363
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 364
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 365
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 366
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 367
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 368
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 369
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 370
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 371
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 372
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 373
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 374
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 375
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 376
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 377
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 378
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 379
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 380
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 381
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 382
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 383
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 384
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 385
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 386
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 387
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 388
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 389
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 390
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 391
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 392
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 393
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 394
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 395
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 396
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 397
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 398
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 399
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 400
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 401
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 402
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 403
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 404
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 405
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 406
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 407
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 408
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 409
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 410
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 411
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 412
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 413
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 414
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 415
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 416
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 417
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 418
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 419
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 420
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 421
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 422
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 423
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 424
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 425
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 426
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 427
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 428
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 429
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 430
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 431
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 432
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 433
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 434
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 435
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 436
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 437
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 438
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 439
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 440
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 441
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 442
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 443
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 444
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 445
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 446
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 447
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 448
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 449
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 450
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 451
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 452
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 453
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 454
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 455
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 456
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 457
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 458
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 459
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 460
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 461
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 462
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 463
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 464
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 465
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 466
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 467
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 468
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 469
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 470
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 471
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 472
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 473
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 474
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 475
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 476
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 477
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 478
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 479
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 480
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 481
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 482
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 483
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 484
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 485
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 486
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 487
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 488
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 489
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 490
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 491
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 492
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 493
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 494
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 495
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 496
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 497
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 498
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 499
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 500
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 501
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 502
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 503
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 504
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 505
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 506
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 507
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 508
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 509
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 510
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 511
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 512
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 513
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 514
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 515
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 516
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 517
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 518
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 519
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 520
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 521
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 522
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 523
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 524
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 525
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 526
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 527
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 528
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 529
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 530
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 531
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 532
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 533
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 534
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 535
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 536
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 537
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 538
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 539
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 540
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 541
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 542
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 543
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 544
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 545
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 546
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 547
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 548
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 549
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 550
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 551
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 552
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 553
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 554
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 555
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 556
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 557
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 558
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 559
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 560
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 561
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 562
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 563
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 564
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 565
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 566
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 567
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 568
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 569
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 570
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 571
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 572
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 573
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 574
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 575
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 576
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 577
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 578
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 579
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 580
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 581
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 582
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 583
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 584
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 585
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 586
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 587
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 588
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 589
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 590
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 591
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 592
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 593
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 594
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 595
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 596
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 597
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 598
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 599
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 600
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 601
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 602
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 603
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 604
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 605
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 606
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 607
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 608
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 609
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 610
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 611
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 612
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 613
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 614
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 615
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 616
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 617
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 618
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 619
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 620
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 621
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 622
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 623
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 624
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 625
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 626
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 627
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 628
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 629
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 630
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 631
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 632
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 633
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 634
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 635
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 636
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 637
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 638
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 639
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 640
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 641
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 642
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 643
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 644
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 645
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 646
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 647
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 648
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 649
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 650
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 651
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 652
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 653
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 654
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 655
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 656
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 657
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 658
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 659
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 660
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 661
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 662
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 663
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 664
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 665
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 666
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 667
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 668
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 669
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 670
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 671
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 672
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 673
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 674
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 675
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 676
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 677
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 678
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 679
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 680
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 681
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 682
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 683
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 684
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 685
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 686
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 687
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 688
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 689
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 690
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 691
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 692
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 693
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 694
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 695
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 696
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 697
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 698
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 699
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 700
## Warning in FUN(X[[i]], ...): Numerical 0 probability for all classes with
## observation 701
## Confusion Matrix and Statistics
##
## Reference
## Prediction Freehold Leasehold
## Freehold 366 44
## Leasehold 57 234
##
## Accuracy : 0.8559
## 95% CI : (0.8277, 0.8811)
## No Information Rate : 0.6034
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.7014
##
## Mcnemar's Test P-Value : 0.2325
##
## Sensitivity : 0.8652
## Specificity : 0.8417
## Pos Pred Value : 0.8927
## Neg Pred Value : 0.8041
## Prevalence : 0.6034
## Detection Rate : 0.5221
## Detection Prevalence : 0.5849
## Balanced Accuracy : 0.8535
##
## 'Positive' Class : Freehold
##
## Recall: 0.8652482
In this part, we develop a Support Vector Machine (SVM) model with a radial kernel. This model is again aimed at classifying ‘Tenure Type’. We train the model, make predictions, and assess its performance through a confusion matrix. The recall metric is also calculated to evaluate the model’s ability to correctly identify positive classes.
## Warning: package 'e1071' was built under R version 4.3.2
## Confusion Matrix and Statistics
##
## Reference
## Prediction Freehold Leasehold
## Freehold 423 278
## Leasehold 0 0
##
## Accuracy : 0.6034
## 95% CI : (0.5661, 0.6398)
## No Information Rate : 0.6034
## P-Value [Acc > NIR] : 0.5165
##
## Kappa : 0
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.6034
## Neg Pred Value : NaN
## Prevalence : 0.6034
## Detection Rate : 0.6034
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : Freehold
##
## Recall: 0.8652482
This section focuses on implementing a Decision Tree model for classification. The process involves training the model on the training dataset and then making predictions on the test set. The model’s effectiveness is evaluated using a confusion matrix, and its visualization is provided for an intuitive understanding of the model’s accuracy.
## Confusion Matrix and Statistics
##
## Reference
## Prediction Freehold Leasehold
## Freehold 393 33
## Leasehold 30 245
##
## Accuracy : 0.9101
## 95% CI : (0.8865, 0.9302)
## No Information Rate : 0.6034
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8119
##
## Mcnemar's Test P-Value : 0.8011
##
## Sensitivity : 0.9291
## Specificity : 0.8813
## Pos Pred Value : 0.9225
## Neg Pred Value : 0.8909
## Prevalence : 0.6034
## Detection Rate : 0.5606
## Detection Prevalence : 0.6077
## Balanced Accuracy : 0.9052
##
## 'Positive' Class : Freehold
##
## Recall: 0.8652482
This part deals with predicting a continuous variable (e.g., price). A linear regression model is built, and predictions are made on the test dataset. The model’s accuracy is assessed through various metrics like MAE (Mean Absolute Error), RMSE (Root Mean Squared Error), and R-squared. Scatter plots are also generated to visually compare actual vs. predicted values.
## Bedroom Bathroom
## "numeric" "numeric"
## Property.Size Completion.Year
## "numeric" "numeric"
## Total.Units Parking.Lot
## "numeric" "numeric"
## No.of.Floors Property.Age
## "numeric" "numeric"
## Total_Facilities Total_surroundingamenities
## "numeric" "numeric"
## Building.Name Developer
## "numeric" "numeric"
## Property.Type Floor.Range
## "numeric" "numeric"
## Land.Title State
## "numeric" "numeric"
## City Tenure.Type
## "numeric" "numeric"
## price
## "numeric"
##
## Call:
## lm(formula = price ~ Bedroom + Bathroom + Property.Size + Total.Units +
## Parking.Lot + No.of.Floors + Property.Age, data = data_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -371.22 -44.79 -3.86 41.18 328.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 93.52484 9.33434 10.019 < 2e-16 ***
## Bedroom -39.14701 3.18450 -12.293 < 2e-16 ***
## Bathroom 18.14541 4.56543 3.975 7.26e-05 ***
## Property.Size 0.37943 0.01092 34.747 < 2e-16 ***
## Total.Units -0.05711 0.01286 -4.441 9.35e-06 ***
## Parking.Lot 24.23669 2.02536 11.967 < 2e-16 ***
## No.of.Floors 3.56045 0.15565 22.875 < 2e-16 ***
## Property.Age -0.61872 0.20615 -3.001 0.00271 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 71.78 on 2450 degrees of freedom
## Multiple R-squared: 0.6398, Adjusted R-squared: 0.6388
## F-statistic: 621.7 on 7 and 2450 DF, p-value: < 2.2e-16
## Length of actual values: 1052
## Length of predicted values: 1052
## Mean Absolute Error (MAE): 52.9102
## Root Mean Squared Error (RMSE): 68.16736
## R-squared: 0.6397955
The model is trained to predict a continuous variable, likely ‘price’, and its performance is evaluated using MAE, RMSE, and R-squared. A scatter plot comparing actual and predicted values is also presented.
library(dplyr)
library(randomForest)
data2 <- data %>% dplyr::select(-Building.Name, -City, -Developer)
# Assuming data2 is your dataframe
# Assuming price is the column you want to predict, and you have other columns as features
set.seed(777)
# Split your data into training and testing sets
sample_index <- sample(1:nrow(data2), 0.8 * nrow(data2))
train_data <- data2[sample_index, ]
test_data <- data2[-sample_index, ]
# Create a random forest model
model_rf <- randomForest(price ~ ., data = train_data)
# Make predictions on the test set
predictions_rf <- predict(model_rf, newdata = test_data)
# Evaluate the model
# Example: Mean Absolute Error (MAE)
mae_rf <- mean(abs(predictions_rf - test_data$price))
cat("Random Forest Mean Absolute Error (MAE):", mae_rf, "\n")
## Random Forest Mean Absolute Error (MAE): 43.52476
# Calculate R-squared
rsquared_rf <- 1 - (sum((test_data$price - predictions_rf)^2) / sum((test_data$price - mean(test_data$price))^2))
cat("Random Forest R-squared:", rsquared_rf, "\n")
## Random Forest R-squared: 0.740809
# Calculate RMSE
rmse_rf <- sqrt(mean((predictions_rf - test_data$price)^2))
cat("Random Forest RMSE:", rmse_rf, "\n")
## Random Forest RMSE: 61.66981
plot(test_data$price, predictions_rf, main = "Random Forest: Actual vs Predicted", xlab = "Actual", ylab = "Predicted", pch = 20,
col = "blue")
abline(0, 1, col = "red", lwd = 2)
In this final section, a Decision Tree model is constructed for regression analysis. After training the model, predictions are made, and the model’s accuracy is evaluated through MAE, RMSE, and R-squared. Additionally, the decision tree is visualized for a comprehensive understanding of the model structure.
library(rpart)
set.seed(777)
sample_index <- sample(1:nrow(data2), 0.8 * nrow(data2))
train_data <- data2[sample_index, ]
test_data <- data2[-sample_index, ]
# Create a decision tree model
model_tree <- rpart(price ~ ., data = train_data)
# Visualize the decision tree
plot(model_tree)
text(model_tree, cex = 0.7)
# Make predictions on the test set
predictions_tree <- predict(model_tree, newdata = test_data)
# Evaluate the model
# Example: Mean Absolute Error (MAE)
mae_tree <- mean(abs(predictions_tree - test_data$price))
cat("Decision Tree Mean Absolute Error (MAE):", mae_tree, "\n")
## Decision Tree Mean Absolute Error (MAE): 59.66233
# Calculate R-squared
rsquared_tree <- 1 - (sum((test_data$price - predictions_tree)^2) / sum((test_data$price - mean(test_data$price))^2))
# Calculate RMSE
rmse_tree <- sqrt(mean((predictions_tree - test_data$price)^2))
cat("Decision Tree R-squared:", rsquared_tree, "\n")
## Decision Tree R-squared: 0.5802981
cat("Decision Tree RMSE:", rmse_tree, "\n")
## Decision Tree RMSE: 78.47533
plot(test_data$price, predictions_tree, main = "Decision Tree: Actual vs Predicted", xlab = "Actual", ylab = "Predicted", pch = 20,
col = "blue")
abline(0, 1, col = "red", lwd = 2)
# Initialize an empty data frame to store the summary of all models
# Initialize an empty data frame to store the summary of all models
# Assuming you have already calculated the confusion matrices for each model
# cm for Naive Bayes, cm_svm for SVM, and cm_tree for Decision Tree
# Initialize an empty data frame to store the summary of all models
models_summary <- data.frame(
Model = character(),
Accuracy = numeric(),
Recall = numeric(),
Precision = numeric(),
F1_Score = numeric(),
stringsAsFactors = FALSE
)
# Naive Bayes performance metrics
nb_recall <- cm$byClass['Sensitivity']
nb_precision <- cm$byClass['Pos Pred Value']
nb_F1 <- 2 * (nb_precision * nb_recall) / (nb_precision + nb_recall)
nb_accuracy <- cm$overall['Accuracy']
# Add Naive Bayes to summary
models_summary <- rbind(models_summary, data.frame(
Model = "Naive Bayes",
Accuracy = nb_accuracy,
Recall = nb_recall,
Precision = nb_precision,
F1_Score = nb_F1
))
# SVM performance metrics
svm_recall <- cm_svm$byClass['Sensitivity']
svm_precision <- cm_svm$byClass['Pos Pred Value']
svm_F1 <- 2 * (svm_precision * svm_recall) / (svm_precision + svm_recall)
svm_accuracy <- cm_svm$overall['Accuracy']
# Add SVM to summary
models_summary <- rbind(models_summary, data.frame(
Model = "SVM",
Accuracy = svm_accuracy,
Recall = svm_recall,
Precision = svm_precision,
F1_Score = svm_F1
))
# Decision Tree performance metrics
dt_recall <- cm_tree$byClass['Sensitivity']
dt_precision <- cm_tree$byClass['Pos Pred Value']
dt_F1 <- 2 * (dt_precision * dt_recall) / (dt_precision + dt_recall)
dt_accuracy <- cm_tree$overall['Accuracy']
# Add Decision Tree to summary
models_summary <- rbind(models_summary, data.frame(
Model = "Decision Tree",
Accuracy = dt_accuracy,
Recall = dt_recall,
Precision = dt_precision,
F1_Score = dt_F1
))
# Print the summary table
print(models_summary)
## Model Accuracy Recall Precision F1_Score
## Accuracy Naive Bayes 0.8559201 0.8652482 0.8926829 0.8787515
## Accuracy1 SVM 0.6034237 1.0000000 0.6034237 0.7526690
## Accuracy2 Decision Tree 0.9101284 0.9290780 0.9225352 0.9257951
Best Performance in Classification: Decision Tree This model has the highest accuracy and balanced accuracy, indicating a better overall performance in correctly classifying the data. Balanced accuracy is particularly important as it takes into account the imbalance in the dataset. Naive Bayes also performs well, but Decision Tree edges ahead with a higher accuracy and balanced accuracy.
The reason why Decision Tree might be preferred:
No feature scaling: Decision trees don’t require feature scaling, such as normalization or normalization. This makes them very convenient when dealing with features with different scales.
Handle non-linear data: Decision trees work well with data with non-linear relationships. For nonlinear problems that many other algorithms, such as linear regression, struggle with, decision trees can provide a better solution.
Not affected by outliers: Decision trees are not sensitive to outliers. Because of the way decision trees are segmented, outliers usually only affect a small portion of the tree.
models_summary2 <- data.frame(
Model = character(),
mae = numeric(),
rsquared = numeric(),
rmse = numeric(),
stringsAsFactors = FALSE
)
# Linear Regression performance metrics
lmmae <- mae1
lmrsquared <- rsquared1
lmrmse <- rmse1
# Adding linear regression metrics to summary
models_summary2 <- rbind(models_summary2, data.frame(
Model = "Linear Regression",
mae = lmmae,
rsquared = lmrsquared,
rmse = lmrmse
))
# Random Forest performance metrics
rf_mae <- mae_rf
rf_rsquared <- rsquared_rf
rf_rmse <- rmse_rf
# Adding Random Forest metrics to summary
models_summary2 <- rbind(models_summary2, data.frame(
Model = "Random Forest",
mae = rf_mae,
rsquared = rf_rsquared,
rmse = rf_rmse
))
# Decision Tree performance metrics
dt_mae <- mae_tree
dt_rsquared <- rsquared_tree
dt_rmse <- rmse_tree
# Adding Decision Tree metrics to summary
models_summary2 <- rbind(models_summary2, data.frame(
Model = "Decision Tree",
mae = dt_mae,
rsquared = dt_rsquared,
rmse = dt_rmse
))
# Print the summary table
print(models_summary2)
## Model mae rsquared rmse
## 1 Linear Regression 52.91020 0.6397955 68.16736
## 2 Random Forest 43.52476 0.7408090 61.66981
## 3 Decision Tree 59.66233 0.5802981 78.47533
Best Performance in Regression: Random Forest. This model has the lowest MAE and RMSE, indicating it has the least average error in predictions and the predictions are closer to the actual values. Additionally, it has the highest R-squared value, suggesting the best fit of the model to the data. Lower MAE and RMSE are critical for a good regression model as they directly relate to the accuracy of the predictions. Higher R-squared value indicates that the model explains a greater proportion of variance in the dependent variable.
Reason for best performance is Random Forest
Random Forest is often considered superior to individual Decision Trees and Linear Regression in certain scenarios due to its ensemble learning approach.
The reason why Random Forest might be preferred:
Reduced Overfitting:
Random Forest are prone to overfitting, meaning they may capture noise in the training data and perform poorly on new, unseen data. Random Forest helps mitigate this issue by combining the predictions of multiple trees, reducing the risk of overfitting.
Improved Generalization:
Random Forest generally provides better generalization to unseen data compared to a single Decision Tree or Linear Regression model. The ensemble nature of Random Forest helps in capturing a more accurate and robust representation of the underlying patterns in the data.
Handling Non-linearity:
Linear Regression assumes a linear relationship between the input features and the target variable. Random Forest, on the other hand, can capture non-linear relationships in the data more effectively, making it suitable for a wider range of problems.
Random Forest is less sensitive to outliers than Linear Regression. Outliers can heavily influence the coefficients in a linear model, leading to a skewed representation of the data.
Handling Missing Values:
Random Forest can handle missing values in the dataset without the need for imputation. It uses the majority voting mechanism during the tree-building process, making it robust to missing data.
For classification tasks, the Decision Tree model demonstrates the best performance based on the result of confusion metrics that interpret the accuracy, recall and F1-score. For the regression tasks, the Random Forest model demonstrates the best performance based on MAE, RSME and R-square. The Decision Tree combines high accuracy and balanced accuracy in classification with the lowest prediction errors and the highest explanation of variance in regression. The Random Forest is a versatile model capable of handling different types of data and tasks effectively, likely due to its ability to manage high-dimensional data and protect against overfitting which benefit in regression model.
In conclusion, this study addresses the challenging task of predicting condominium prices in the dynamic Malaysian property market. Employing a comprehensive data analysis methodology that incorporates classification and regression techniques, the research delves beyond mere price prediction to explore underlying factors, including the impact of property tenure types. The findings from this in-depth exploration have revealed crucial insights:
Penang stands out with the highest property prices, likely attributed to the greater availability of service residences and condominiums compared to other areas, while Selangor has the highest number of total facilities provided.
Visualizing the data through charts has uncovered trends and relationships, highlighting factors such as parking lot availability, number of floors, property size, number of bathrooms, and total facilities as influential in house prices. Condominiums typically command higher average prices, while flats tend to be more affordable among different property types.
Nine indicators, including amenities, facilities, age, floors, units, size, bathrooms, bedrooms, and completion year, were analyzed for their correlations. Property size, bathrooms, and bedrooms showed strong interplay, with price influenced significantly by floors, size, bathrooms, and facilities.
The Random Forest model emerged as a standout performer in terms of accuracy, demonstrating low Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) values. Whereas for classification, the base decision tree perform best to predict whether the property tenure types should be Freehold or Leasehold.
However, areas of improvement have been identified:
Insufficient Data: The study acknowledges the challenge of insufficient data. Future research could periodically supplement the dataset from Mudah.com to ensure more comprehensive and accurate predictions.
Raw Data Processing: While various factors were considered in raw data processing, there is a recognition that some discarded data columns may have potential impacts on prices. Further research and market surveys, incorporating techniques like natural language processing, could enhance data preprocessing.
Model Applicability Clarification: Although the Random Forest model performed well in classification modeling, further exploration into its commercial applicability is deemed necessary.