Jakarta, the bustling capital city of Indonesia, is a hub for various
economic activities, including a vibrant used car market. The demand for
used cars in Jakarta has been steadily increasing due to various factors
such as affordability, the rising cost of new vehicles, and the rapid
urbanization leading to a growing middle class. In Jakarta, popular used
car models include compact cars, sedans, and SUVs from brands like
Toyota, Honda, Suzuki, and
Daihatsu. These brands are favored for their reliability,
affordability, and availability of spare parts. Used car prices in
Jakarta vary significantly based on the car's age,
condition, mileage, and brand. On
average, used cars can cost anywhere from IDR 50 million for older
models to over IDR 300 million for newer, high-end models. The market
has seen stable growth in pricing, with occasional fluctuations due to
economic conditions and regulatory changes.
knitr::include_graphics("assets/bmw.jpg")
Let’s try to import the data from csv file
data <- read.csv("data_input/used_car_data_new.csv")
now let’s inspect the data
head() functionhead(data)
## id_ucd id_merk type model color year
## 1 1 Mazda CX-5 SUV Hitam 2016
## 2 44 Mazda 3 5 HB 6E MINIBUS HITAM METALIK 2016
## 3 160 Mazda CX 5 SUV PUIH METALIK 2014
## 4 90 Hyundai I-20 MINIBUS MERAH 2010
## 5 25 Range Rover RANGE ROVER 4.2L V8 AT JEEP S.C.HDTP HITAM 2007
## 6 135 Range Rover RANGE ROVER SPORT HSE SUV PUTIH 2014
## id_transmission id_fuel_type door cylinder_size cylinder_total turbo mileage
## 1 2 Gasoline 5 2500 4 0 72000
## 2 2 Gasoline 5 1998 4 0 55000
## 3 2 Gasoline 5 2498 4 0 95000
## 4 2 Gasoline 5 1396 4 0 100000
## 5 2 Gasoline 5 4187 8 0 95
## 6 2 Gasoline 5 2994 4 0 43000
## license_plate price_cash price_credit nilai_jual_pkb pkb_pokok stnk_date
## 1 B 2961 SXV 2.95e+08 2.80e+08 335000000 NULL 3/23/2024
## 2 B 2713 BYI 3.40e+08 3.60e+08 273000000 5733000 10/22/2023
## 3 B 1458 TYZ 2.87e+08 2.77e+08 265000000 5565000 10/20/2024
## 4 B 1849 TKP 8.50e+07 1.00e+08 88000000 3234000 11/16/2025
## 5 B 2907 SXV 5.35e+08 5.50e+08 536000000 11256000 10/16/2024
## 6 B 8647 GK 1.25e+09 1.15e+09 1212000000 25452000 8/27/2023
## levy_date swdkllj total_levy showroom_name
## 1 3/23/2027 NULL NULL O2 Auto
## 2 10/22/2023 143000 5876000 GARAGE 88
## 3 10/20/2024 153000 5718000 BEST MOBILINDO
## 4 11/16/2023 143000 3377000 MAJU JAYA MOTOR
## 5 10/16/2023 143000 11399000 ARJUNA MOTOR DEALER MOBIL BEKAS
## 6 8/27/2023 143000 25595000 SPECTRUM MOTOR
## showroom_location
## 1 Bintaro, Banten
## 2 Jl.Radio 1 No.19, Kebayoran Baru, Jakarta Selatan
## 3 Jl. Industri Raya No.10, RW.10, Danau Sunter Dll, Kec. Pademangan, Kota Jkt Utara
## 4 JL. Mangga Dua Raya, No. 8, RW.5, Ancol, Kec. Pademangan, Jkt Utara
## 5 jln raya jatiwaringin no 2 pangkalan jati kalimalang Jakarta timur
## 6 Bursa otomotif, Jl. Sunter Kirana Raya, RT.10/RW.11, Sunter Jaya, Tanjung Priok, JAKARTA UTARA
## showroom_regency photo_1 photo_2 photo_3 link created_at
## 1 Banten NULL NULL NULL NULL 4/3/2023 9:31
## 2 Jakarta Selatan 2018-11-18.jpg NULL NULL NULL 4/11/2023 17:50
## 3 Jakarta Utara BEST MOBILINDO.jpg NULL NULL NULL 4/15/2023 21:08
## 4 Jakarta Utara 2020-02-29.jpg NULL NULL NULL 4/13/2023 4:38
## 5 Jakarta Timur 2021-01-22.jpg NULL NULL NULL 4/11/2023 7:29
## 6 Jakarta Utara SPECTRUM MOTOR.jpg NULL NULL NULL 4/14/2023 0:56
## updated_at
## 1 4/3/2023 9:31
## 2 4/11/2023 17:50
## 3 4/15/2023 21:08
## 4 4/13/2023 4:38
## 5 4/11/2023 7:29
## 6 4/14/2023 0:56
str() functionstr(data)
## 'data.frame': 210 obs. of 31 variables:
## $ id_ucd : int 1 44 160 90 25 135 19 136 3 28 ...
## $ id_merk : chr "Mazda" "Mazda" "Mazda" "Hyundai" ...
## $ type : chr "CX-5" "3 5 HB 6E" "CX 5" "I-20" ...
## $ model : chr "SUV" "MINIBUS" "SUV" "MINIBUS" ...
## $ color : chr "Hitam" "HITAM METALIK" "PUIH METALIK" "MERAH" ...
## $ year : int 2016 2016 2014 2010 2007 2014 2011 2012 2017 2011 ...
## $ id_transmission : int 2 2 2 2 2 2 2 2 2 2 ...
## $ id_fuel_type : chr "Gasoline" "Gasoline" "Gasoline" "Gasoline" ...
## $ door : int 5 5 5 5 5 5 4 2 5 5 ...
## $ cylinder_size : int 2500 1998 2498 1396 4187 2994 5654 3604 1400 2967 ...
## $ cylinder_total : int 4 4 4 4 8 4 8 4 4 6 ...
## $ turbo : int 0 0 0 0 0 0 0 0 1 0 ...
## $ mileage : int 72000 55000 95000 100000 95 43000 45000 51000 42000 75000 ...
## $ license_plate : chr "B 2961 SXV" "B 2713 BYI" "B 1458 TYZ" "B 1849 TKP" ...
## $ price_cash : num 2.95e+08 3.40e+08 2.87e+08 8.50e+07 5.35e+08 1.25e+09 4.25e+08 7.75e+08 2.05e+08 2.89e+08 ...
## $ price_credit : num 2.80e+08 3.60e+08 2.77e+08 1.00e+08 5.50e+08 1.15e+09 4.40e+08 7.50e+08 1.95e+08 3.04e+08 ...
## $ nilai_jual_pkb : chr "335000000" "273000000" "265000000" "88000000" ...
## $ pkb_pokok : chr "NULL" "5733000" "5565000" "3234000" ...
## $ stnk_date : chr "3/23/2024" "10/22/2023" "10/20/2024" "11/16/2025" ...
## $ levy_date : chr "3/23/2027" "10/22/2023" "10/20/2024" "11/16/2023" ...
## $ swdkllj : chr "NULL" "143000" "153000" "143000" ...
## $ total_levy : chr "NULL" "5876000" "5718000" "3377000" ...
## $ showroom_name : chr "O2 Auto" "GARAGE 88" "BEST MOBILINDO" "MAJU JAYA MOTOR" ...
## $ showroom_location: chr "Bintaro, Banten" "Jl.Radio 1 No.19, Kebayoran Baru, Jakarta Selatan" "Jl. Industri Raya No.10, RW.10, Danau Sunter Dll, Kec. Pademangan, Kota Jkt Utara" "JL. Mangga Dua Raya, No. 8, RW.5, Ancol, Kec. Pademangan, Jkt Utara" ...
## $ showroom_regency : chr "Banten" "Jakarta Selatan" "Jakarta Utara" "Jakarta Utara" ...
## $ photo_1 : chr "NULL" "2018-11-18.jpg" "BEST MOBILINDO.jpg" "2020-02-29.jpg" ...
## $ photo_2 : chr "NULL" "NULL" "NULL" "NULL" ...
## $ photo_3 : chr "NULL" "NULL" "NULL" "NULL" ...
## $ link : chr "NULL" "NULL" "NULL" "NULL" ...
## $ created_at : chr "4/3/2023 9:31" "4/11/2023 17:50" "4/15/2023 21:08" "4/13/2023 4:38" ...
## $ updated_at : chr "4/3/2023 9:31" "4/11/2023 17:50" "4/15/2023 21:08" "4/13/2023 4:38" ...
summary() functionsummary(data)
## id_ucd id_merk type model
## Min. : 1.00 Length:210 Length:210 Length:210
## 1st Qu.: 53.25 Class :character Class :character Class :character
## Median :105.50 Mode :character Mode :character Mode :character
## Mean :105.50
## 3rd Qu.:157.75
## Max. :210.00
## color year id_transmission id_fuel_type
## Length:210 Min. :1993 Min. :1.000 Length:210
## Class :character 1st Qu.:2013 1st Qu.:2.000 Class :character
## Mode :character Median :2016 Median :2.000 Mode :character
## Mean :2015 Mean :1.857
## 3rd Qu.:2018 3rd Qu.:2.000
## Max. :2022 Max. :2.000
## door cylinder_size cylinder_total turbo
## Min. :2.0 Min. : 998 Min. :3.000 Min. :0.00000
## 1st Qu.:5.0 1st Qu.:1495 1st Qu.:4.000 1st Qu.:0.00000
## Median :5.0 Median :1498 Median :4.000 Median :0.00000
## Mean :4.9 Mean :1839 Mean :4.119 Mean :0.05714
## 3rd Qu.:5.0 3rd Qu.:1998 3rd Qu.:4.000 3rd Qu.:0.00000
## Max. :5.0 Max. :5663 Max. :8.000 Max. :1.00000
## mileage license_plate price_cash price_credit
## Min. : 55 Length:210 Min. :6.250e+07 Min. :5.750e+07
## 1st Qu.: 45000 Class :character 1st Qu.:1.326e+08 1st Qu.:1.435e+08
## Median : 71000 Mode :character Median :1.790e+08 Median :1.950e+08
## Mean : 75075 Mean :2.622e+08 Mean :2.832e+08
## 3rd Qu.:100000 3rd Qu.:2.750e+08 3rd Qu.:2.848e+08
## Max. :185000 Max. :2.400e+09 Max. :2.250e+09
## nilai_jual_pkb pkb_pokok stnk_date levy_date
## Length:210 Length:210 Length:210 Length:210
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## swdkllj total_levy showroom_name showroom_location
## Length:210 Length:210 Length:210 Length:210
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## showroom_regency photo_1 photo_2 photo_3
## Length:210 Length:210 Length:210 Length:210
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## link created_at updated_at
## Length:210 Length:210 Length:210
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
Let’s change id_merk, model, color, id_transmission, id_fuel_type, turbo and showroom regency into factor
data$id_merk <- as.factor(data$id_merk)
data$model <- as.factor(data$model)
data$color <- as.factor(data$color)
data$id_transmission <- as.factor(data$id_transmission)
data$id_fuel_type <- as.factor(data$id_fuel_type)
data$turbo <- as.factor(data$turbo)
data$showroom_regency <- as.factor(data$showroom_regency)
Now let’s change stnk_date and levy_date into date format instead of character
data$stnk_date <- as.Date(data$stnk_date, format = "%m/%d/%y")
data$levy_date <- as.Date(data$levy_date, format = "%m/%d/%y")
summary(data)
## id_ucd id_merk type model
## Min. : 1.00 Toyota :68 Length:210 MINIBUS :91
## 1st Qu.: 53.25 Honda :41 Class :character Minibus :39
## Median :105.50 Daihatsu :26 Mode :character SEDAN :30
## Mean :105.50 Mitsubishi :15 SUV :18
## 3rd Qu.:157.75 Nissan :12 JEEP L.C.HDTP:12
## Max. :210.00 Mercedes Benz: 9 JEEP S.C.HDTP: 9
## (Other) :39 (Other) :11
## color year id_transmission id_fuel_type
## HITAM METALIK :39 Min. :1993 1: 30 Diesel : 14
## HITAM :30 1st Qu.:2013 2:180 Gasoline:196
## SILVER METALIK :29 Median :2016
## PUTIH :27 Mean :2015
## ABU ABU METALIK:19 3rd Qu.:2018
## MERAH :11 Max. :2022
## (Other) :55
## door cylinder_size cylinder_total turbo mileage
## Min. :2.0 Min. : 998 Min. :3.000 0:198 Min. : 55
## 1st Qu.:5.0 1st Qu.:1495 1st Qu.:4.000 1: 12 1st Qu.: 45000
## Median :5.0 Median :1498 Median :4.000 Median : 71000
## Mean :4.9 Mean :1839 Mean :4.119 Mean : 75075
## 3rd Qu.:5.0 3rd Qu.:1998 3rd Qu.:4.000 3rd Qu.:100000
## Max. :5.0 Max. :5663 Max. :8.000 Max. :185000
##
## license_plate price_cash price_credit nilai_jual_pkb
## Length:210 Min. :6.250e+07 Min. :5.750e+07 Length:210
## Class :character 1st Qu.:1.326e+08 1st Qu.:1.435e+08 Class :character
## Mode :character Median :1.790e+08 Median :1.950e+08 Mode :character
## Mean :2.622e+08 Mean :2.832e+08
## 3rd Qu.:2.750e+08 3rd Qu.:2.848e+08
## Max. :2.400e+09 Max. :2.250e+09
##
## pkb_pokok stnk_date levy_date
## Length:210 Min. :2020-01-06 Min. :2020-01-06
## Class :character 1st Qu.:2020-03-28 1st Qu.:2020-03-24
## Mode :character Median :2020-06-28 Median :2020-06-28
## Mean :2020-06-29 Mean :2020-06-30
## 3rd Qu.:2020-09-29 3rd Qu.:2020-10-12
## Max. :2020-12-30 Max. :2020-12-30
## NA's :9 NA's :9
## swdkllj total_levy showroom_name showroom_location
## Length:210 Length:210 Length:210 Length:210
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## showroom_regency photo_1 photo_2
## Banten :36 Length:210 Length:210
## Jakarta Barat :13 Class :character Class :character
## Jakarta Pusat :30 Mode :character Mode :character
## Jakarta Selatan:25
## Jakarta Timur :73
## Jakarta Utara :33
##
## photo_3 link created_at updated_at
## Length:210 Length:210 Length:210 Length:210
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
Now let’s check if our data has some missing values
anyNA(data)
## [1] TRUE
using function anyNA() it is found that the data has
missing values, we should check where the missing values are
colSums(is.na(data))
## id_ucd id_merk type model
## 0 0 0 0
## color year id_transmission id_fuel_type
## 0 0 0 0
## door cylinder_size cylinder_total turbo
## 0 0 0 0
## mileage license_plate price_cash price_credit
## 0 0 0 0
## nilai_jual_pkb pkb_pokok stnk_date levy_date
## 0 0 9 9
## swdkllj total_levy showroom_name showroom_location
## 0 0 0 0
## showroom_regency photo_1 photo_2 photo_3
## 0 0 0 0
## link created_at updated_at
## 0 0 0
the missing values are found on column
stnk_dateandlevy_date
let’s explore the data to answer some of the business question
using aggregate() function, we found that SEDAN and JEEP
has the highest average price
aggregate(price_cash ~ model, data = data, FUN = mean)
## model price_cash
## 1 JEEP 775000000
## 2 JEEP L.C.HDTP 313916667
## 3 JEEP S.C.HDTP 489222222
## 4 MICRO/MINIBUS 156250000
## 5 Minibus 180038462
## 6 MINIBUS 165302187
## 7 MINIVAN 158000000
## 8 MPV 408750000
## 9 Sedan 815000000
## 10 SEDAN 353163333
## 11 SUV 568527778
using table() and prop.table() function we
found that Jakarta Timur has the largest number of used car sold
table(data$showroom_regency)
##
## Banten Jakarta Barat Jakarta Pusat Jakarta Selatan Jakarta Timur
## 36 13 30 25 73
## Jakarta Utara
## 33
prop.table(table(data$showroom_regency))
##
## Banten Jakarta Barat Jakarta Pusat Jakarta Selatan Jakarta Timur
## 0.17142857 0.06190476 0.14285714 0.11904762 0.34761905
## Jakarta Utara
## 0.15714286
using xtabs() function we found that Toyota & Honda
Minibus is the largest population in the market
xtabs( ~ model + id_merk, data = data)
## id_merk
## model Audi BMW Chevrolet Daihatsu Datsun Ford Honda Hyundai Isuzu
## JEEP 0 0 0 0 0 0 0 0 0
## JEEP L.C.HDTP 1 0 0 0 0 0 1 0 0
## JEEP S.C.HDTP 0 0 0 0 0 0 3 0 0
## MICRO/MINIBUS 0 0 0 1 0 0 0 0 0
## Minibus 0 0 1 9 0 0 3 1 0
## MINIBUS 0 0 0 16 1 0 23 1 1
## MINIVAN 0 0 0 0 0 0 1 0 0
## MPV 0 0 0 0 0 0 0 0 0
## Sedan 0 1 0 0 0 0 0 0 0
## SEDAN 0 5 0 0 0 1 6 0 0
## SUV 0 1 0 0 0 0 4 2 0
## id_merk
## model Jaguar Jeep Lexus Mazda Mercedes Benz Mitsubishi Nissan Porsche
## JEEP 0 1 0 0 0 0 0 0
## JEEP L.C.HDTP 0 1 0 0 1 3 1 0
## JEEP S.C.HDTP 0 0 1 0 1 1 0 0
## MICRO/MINIBUS 0 0 0 0 0 1 0 0
## Minibus 0 0 0 0 0 2 0 0
## MINIBUS 0 0 0 2 0 5 10 0
## MINIVAN 0 0 0 0 0 0 0 0
## MPV 0 0 0 0 0 0 0 0
## Sedan 0 0 0 0 0 0 0 0
## SEDAN 1 0 0 0 7 0 0 0
## SUV 0 0 1 2 0 3 1 1
## id_merk
## model Range Rover Suzuki Toyota VW Wuling
## JEEP 0 0 0 0 0
## JEEP L.C.HDTP 0 0 3 1 0
## JEEP S.C.HDTP 1 0 2 0 0
## MICRO/MINIBUS 0 0 2 0 0
## Minibus 0 0 23 0 0
## MINIBUS 0 7 23 0 2
## MINIVAN 0 0 0 0 0
## MPV 0 0 4 0 0
## Sedan 0 0 0 0 0
## SEDAN 0 1 9 0 0
## SUV 1 0 2 0 0
As already known in Jakarta, popular models are minibus and sedan and the popular brands are Toyota, Honda, Daihatsu. Apparently the regency that has the most used car listed is Jakarta Timur.