Overview

Jakarta, the bustling capital city of Indonesia, is a hub for various economic activities, including a vibrant used car market. The demand for used cars in Jakarta has been steadily increasing due to various factors such as affordability, the rising cost of new vehicles, and the rapid urbanization leading to a growing middle class. In Jakarta, popular used car models include compact cars, sedans, and SUVs from brands like Toyota, Honda, Suzuki, and Daihatsu. These brands are favored for their reliability, affordability, and availability of spare parts. Used car prices in Jakarta vary significantly based on the car's age, condition, mileage, and brand. On average, used cars can cost anywhere from IDR 50 million for older models to over IDR 300 million for newer, high-end models. The market has seen stable growth in pricing, with occasional fluctuations due to economic conditions and regulatory changes.

knitr::include_graphics("assets/bmw.jpg")

Import & Inspect Data

Let’s try to import the data from csv file

data <- read.csv("data_input/used_car_data_new.csv")

now let’s inspect the data

using head() function

head(data)
##   id_ucd     id_merk                   type         model         color year
## 1      1       Mazda                   CX-5           SUV         Hitam 2016
## 2     44       Mazda              3 5 HB 6E       MINIBUS HITAM METALIK 2016
## 3    160       Mazda                   CX 5           SUV  PUIH METALIK 2014
## 4     90     Hyundai                   I-20       MINIBUS         MERAH 2010
## 5     25 Range Rover RANGE ROVER 4.2L V8 AT JEEP S.C.HDTP         HITAM 2007
## 6    135 Range Rover  RANGE ROVER SPORT HSE           SUV         PUTIH 2014
##   id_transmission id_fuel_type door cylinder_size cylinder_total turbo mileage
## 1               2     Gasoline    5          2500              4     0   72000
## 2               2     Gasoline    5          1998              4     0   55000
## 3               2     Gasoline    5          2498              4     0   95000
## 4               2     Gasoline    5          1396              4     0  100000
## 5               2     Gasoline    5          4187              8     0      95
## 6               2     Gasoline    5          2994              4     0   43000
##   license_plate price_cash price_credit nilai_jual_pkb pkb_pokok  stnk_date
## 1    B 2961 SXV   2.95e+08     2.80e+08      335000000      NULL  3/23/2024
## 2    B 2713 BYI   3.40e+08     3.60e+08      273000000   5733000 10/22/2023
## 3    B 1458 TYZ   2.87e+08     2.77e+08      265000000   5565000 10/20/2024
## 4    B 1849 TKP   8.50e+07     1.00e+08       88000000   3234000 11/16/2025
## 5    B 2907 SXV   5.35e+08     5.50e+08      536000000  11256000 10/16/2024
## 6     B 8647 GK   1.25e+09     1.15e+09     1212000000  25452000  8/27/2023
##    levy_date swdkllj total_levy                   showroom_name
## 1  3/23/2027    NULL       NULL                         O2 Auto
## 2 10/22/2023  143000    5876000                       GARAGE 88
## 3 10/20/2024  153000    5718000                  BEST MOBILINDO
## 4 11/16/2023  143000    3377000                 MAJU JAYA MOTOR
## 5 10/16/2023  143000   11399000 ARJUNA MOTOR DEALER MOBIL BEKAS
## 6  8/27/2023  143000   25595000                  SPECTRUM MOTOR
##                                                                                showroom_location
## 1                                                                                Bintaro, Banten
## 2                                              Jl.Radio 1 No.19, Kebayoran Baru, Jakarta Selatan
## 3              Jl. Industri Raya No.10, RW.10, Danau Sunter Dll, Kec. Pademangan, Kota Jkt Utara
## 4                            JL. Mangga Dua Raya, No. 8, RW.5, Ancol, Kec. Pademangan, Jkt Utara
## 5                             jln raya jatiwaringin no 2 pangkalan jati kalimalang Jakarta timur
## 6 Bursa otomotif, Jl. Sunter Kirana Raya, RT.10/RW.11, Sunter Jaya, Tanjung Priok, JAKARTA UTARA
##   showroom_regency            photo_1 photo_2 photo_3 link      created_at
## 1           Banten               NULL    NULL    NULL NULL   4/3/2023 9:31
## 2  Jakarta Selatan     2018-11-18.jpg    NULL    NULL NULL 4/11/2023 17:50
## 3    Jakarta Utara BEST MOBILINDO.jpg    NULL    NULL NULL 4/15/2023 21:08
## 4    Jakarta Utara     2020-02-29.jpg    NULL    NULL NULL  4/13/2023 4:38
## 5    Jakarta Timur     2021-01-22.jpg    NULL    NULL NULL  4/11/2023 7:29
## 6    Jakarta Utara SPECTRUM MOTOR.jpg    NULL    NULL NULL  4/14/2023 0:56
##        updated_at
## 1   4/3/2023 9:31
## 2 4/11/2023 17:50
## 3 4/15/2023 21:08
## 4  4/13/2023 4:38
## 5  4/11/2023 7:29
## 6  4/14/2023 0:56

using str() function

str(data)
## 'data.frame':    210 obs. of  31 variables:
##  $ id_ucd           : int  1 44 160 90 25 135 19 136 3 28 ...
##  $ id_merk          : chr  "Mazda" "Mazda" "Mazda" "Hyundai" ...
##  $ type             : chr  "CX-5" "3 5 HB 6E" "CX 5" "I-20" ...
##  $ model            : chr  "SUV" "MINIBUS" "SUV" "MINIBUS" ...
##  $ color            : chr  "Hitam" "HITAM METALIK" "PUIH METALIK" "MERAH" ...
##  $ year             : int  2016 2016 2014 2010 2007 2014 2011 2012 2017 2011 ...
##  $ id_transmission  : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ id_fuel_type     : chr  "Gasoline" "Gasoline" "Gasoline" "Gasoline" ...
##  $ door             : int  5 5 5 5 5 5 4 2 5 5 ...
##  $ cylinder_size    : int  2500 1998 2498 1396 4187 2994 5654 3604 1400 2967 ...
##  $ cylinder_total   : int  4 4 4 4 8 4 8 4 4 6 ...
##  $ turbo            : int  0 0 0 0 0 0 0 0 1 0 ...
##  $ mileage          : int  72000 55000 95000 100000 95 43000 45000 51000 42000 75000 ...
##  $ license_plate    : chr  "B 2961 SXV" "B 2713 BYI" "B 1458 TYZ" "B 1849 TKP" ...
##  $ price_cash       : num  2.95e+08 3.40e+08 2.87e+08 8.50e+07 5.35e+08 1.25e+09 4.25e+08 7.75e+08 2.05e+08 2.89e+08 ...
##  $ price_credit     : num  2.80e+08 3.60e+08 2.77e+08 1.00e+08 5.50e+08 1.15e+09 4.40e+08 7.50e+08 1.95e+08 3.04e+08 ...
##  $ nilai_jual_pkb   : chr  "335000000" "273000000" "265000000" "88000000" ...
##  $ pkb_pokok        : chr  "NULL" "5733000" "5565000" "3234000" ...
##  $ stnk_date        : chr  "3/23/2024" "10/22/2023" "10/20/2024" "11/16/2025" ...
##  $ levy_date        : chr  "3/23/2027" "10/22/2023" "10/20/2024" "11/16/2023" ...
##  $ swdkllj          : chr  "NULL" "143000" "153000" "143000" ...
##  $ total_levy       : chr  "NULL" "5876000" "5718000" "3377000" ...
##  $ showroom_name    : chr  "O2 Auto" "GARAGE 88" "BEST MOBILINDO" "MAJU JAYA MOTOR" ...
##  $ showroom_location: chr  "Bintaro, Banten" "Jl.Radio 1 No.19, Kebayoran Baru, Jakarta Selatan" "Jl. Industri Raya No.10, RW.10, Danau Sunter Dll, Kec. Pademangan, Kota Jkt Utara" "JL. Mangga Dua Raya, No. 8, RW.5, Ancol, Kec. Pademangan, Jkt Utara" ...
##  $ showroom_regency : chr  "Banten" "Jakarta Selatan" "Jakarta Utara" "Jakarta Utara" ...
##  $ photo_1          : chr  "NULL" "2018-11-18.jpg" "BEST MOBILINDO.jpg" "2020-02-29.jpg" ...
##  $ photo_2          : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ photo_3          : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ link             : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ created_at       : chr  "4/3/2023 9:31" "4/11/2023 17:50" "4/15/2023 21:08" "4/13/2023 4:38" ...
##  $ updated_at       : chr  "4/3/2023 9:31" "4/11/2023 17:50" "4/15/2023 21:08" "4/13/2023 4:38" ...

using summary() function

summary(data)
##      id_ucd         id_merk              type              model          
##  Min.   :  1.00   Length:210         Length:210         Length:210        
##  1st Qu.: 53.25   Class :character   Class :character   Class :character  
##  Median :105.50   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :105.50                                                           
##  3rd Qu.:157.75                                                           
##  Max.   :210.00                                                           
##     color                year      id_transmission id_fuel_type      
##  Length:210         Min.   :1993   Min.   :1.000   Length:210        
##  Class :character   1st Qu.:2013   1st Qu.:2.000   Class :character  
##  Mode  :character   Median :2016   Median :2.000   Mode  :character  
##                     Mean   :2015   Mean   :1.857                     
##                     3rd Qu.:2018   3rd Qu.:2.000                     
##                     Max.   :2022   Max.   :2.000                     
##       door     cylinder_size  cylinder_total      turbo        
##  Min.   :2.0   Min.   : 998   Min.   :3.000   Min.   :0.00000  
##  1st Qu.:5.0   1st Qu.:1495   1st Qu.:4.000   1st Qu.:0.00000  
##  Median :5.0   Median :1498   Median :4.000   Median :0.00000  
##  Mean   :4.9   Mean   :1839   Mean   :4.119   Mean   :0.05714  
##  3rd Qu.:5.0   3rd Qu.:1998   3rd Qu.:4.000   3rd Qu.:0.00000  
##  Max.   :5.0   Max.   :5663   Max.   :8.000   Max.   :1.00000  
##     mileage       license_plate        price_cash         price_credit      
##  Min.   :    55   Length:210         Min.   :6.250e+07   Min.   :5.750e+07  
##  1st Qu.: 45000   Class :character   1st Qu.:1.326e+08   1st Qu.:1.435e+08  
##  Median : 71000   Mode  :character   Median :1.790e+08   Median :1.950e+08  
##  Mean   : 75075                      Mean   :2.622e+08   Mean   :2.832e+08  
##  3rd Qu.:100000                      3rd Qu.:2.750e+08   3rd Qu.:2.848e+08  
##  Max.   :185000                      Max.   :2.400e+09   Max.   :2.250e+09  
##  nilai_jual_pkb      pkb_pokok          stnk_date          levy_date        
##  Length:210         Length:210         Length:210         Length:210        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    swdkllj           total_levy        showroom_name      showroom_location 
##  Length:210         Length:210         Length:210         Length:210        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  showroom_regency     photo_1            photo_2            photo_3         
##  Length:210         Length:210         Length:210         Length:210        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      link            created_at         updated_at       
##  Length:210         Length:210         Length:210        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 

Simple Data Cleansing

Change Data Type

Let’s change id_merk, model, color, id_transmission, id_fuel_type, turbo and showroom regency into factor

data$id_merk <- as.factor(data$id_merk)
data$model <- as.factor(data$model)
data$color <- as.factor(data$color)
data$id_transmission <- as.factor(data$id_transmission)
data$id_fuel_type <- as.factor(data$id_fuel_type)
data$turbo <- as.factor(data$turbo)
data$showroom_regency <- as.factor(data$showroom_regency)

Now let’s change stnk_date and levy_date into date format instead of character

data$stnk_date <- as.Date(data$stnk_date, format = "%m/%d/%y")
data$levy_date <- as.Date(data$levy_date, format = "%m/%d/%y")

summary(data)
##      id_ucd                id_merk       type                     model   
##  Min.   :  1.00   Toyota       :68   Length:210         MINIBUS      :91  
##  1st Qu.: 53.25   Honda        :41   Class :character   Minibus      :39  
##  Median :105.50   Daihatsu     :26   Mode  :character   SEDAN        :30  
##  Mean   :105.50   Mitsubishi   :15                      SUV          :18  
##  3rd Qu.:157.75   Nissan       :12                      JEEP L.C.HDTP:12  
##  Max.   :210.00   Mercedes Benz: 9                      JEEP S.C.HDTP: 9  
##                   (Other)      :39                      (Other)      :11  
##              color         year      id_transmission   id_fuel_type
##  HITAM METALIK  :39   Min.   :1993   1: 30           Diesel  : 14  
##  HITAM          :30   1st Qu.:2013   2:180           Gasoline:196  
##  SILVER METALIK :29   Median :2016                                 
##  PUTIH          :27   Mean   :2015                                 
##  ABU ABU METALIK:19   3rd Qu.:2018                                 
##  MERAH          :11   Max.   :2022                                 
##  (Other)        :55                                                
##       door     cylinder_size  cylinder_total  turbo      mileage      
##  Min.   :2.0   Min.   : 998   Min.   :3.000   0:198   Min.   :    55  
##  1st Qu.:5.0   1st Qu.:1495   1st Qu.:4.000   1: 12   1st Qu.: 45000  
##  Median :5.0   Median :1498   Median :4.000           Median : 71000  
##  Mean   :4.9   Mean   :1839   Mean   :4.119           Mean   : 75075  
##  3rd Qu.:5.0   3rd Qu.:1998   3rd Qu.:4.000           3rd Qu.:100000  
##  Max.   :5.0   Max.   :5663   Max.   :8.000           Max.   :185000  
##                                                                       
##  license_plate        price_cash         price_credit       nilai_jual_pkb    
##  Length:210         Min.   :6.250e+07   Min.   :5.750e+07   Length:210        
##  Class :character   1st Qu.:1.326e+08   1st Qu.:1.435e+08   Class :character  
##  Mode  :character   Median :1.790e+08   Median :1.950e+08   Mode  :character  
##                     Mean   :2.622e+08   Mean   :2.832e+08                     
##                     3rd Qu.:2.750e+08   3rd Qu.:2.848e+08                     
##                     Max.   :2.400e+09   Max.   :2.250e+09                     
##                                                                               
##   pkb_pokok           stnk_date            levy_date         
##  Length:210         Min.   :2020-01-06   Min.   :2020-01-06  
##  Class :character   1st Qu.:2020-03-28   1st Qu.:2020-03-24  
##  Mode  :character   Median :2020-06-28   Median :2020-06-28  
##                     Mean   :2020-06-29   Mean   :2020-06-30  
##                     3rd Qu.:2020-09-29   3rd Qu.:2020-10-12  
##                     Max.   :2020-12-30   Max.   :2020-12-30  
##                     NA's   :9            NA's   :9           
##    swdkllj           total_levy        showroom_name      showroom_location 
##  Length:210         Length:210         Length:210         Length:210        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##         showroom_regency   photo_1            photo_2         
##  Banten         :36      Length:210         Length:210        
##  Jakarta Barat  :13      Class :character   Class :character  
##  Jakarta Pusat  :30      Mode  :character   Mode  :character  
##  Jakarta Selatan:25                                           
##  Jakarta Timur  :73                                           
##  Jakarta Utara  :33                                           
##                                                               
##    photo_3              link            created_at         updated_at       
##  Length:210         Length:210         Length:210         Length:210        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
## 

Check for Missing Values

Now let’s check if our data has some missing values

anyNA(data)
## [1] TRUE

using function anyNA() it is found that the data has missing values, we should check where the missing values are

colSums(is.na(data))
##            id_ucd           id_merk              type             model 
##                 0                 0                 0                 0 
##             color              year   id_transmission      id_fuel_type 
##                 0                 0                 0                 0 
##              door     cylinder_size    cylinder_total             turbo 
##                 0                 0                 0                 0 
##           mileage     license_plate        price_cash      price_credit 
##                 0                 0                 0                 0 
##    nilai_jual_pkb         pkb_pokok         stnk_date         levy_date 
##                 0                 0                 9                 9 
##           swdkllj        total_levy     showroom_name showroom_location 
##                 0                 0                 0                 0 
##  showroom_regency           photo_1           photo_2           photo_3 
##                 0                 0                 0                 0 
##              link        created_at        updated_at 
##                 0                 0                 0

the missing values are found on column stnk_date and levy_date

Data Exploration

let’s explore the data to answer some of the business question

What is the model that has highest average cash price?

using aggregate() function, we found that SEDAN and JEEP has the highest average price

aggregate(price_cash ~ model, data = data, FUN = mean)
##            model price_cash
## 1           JEEP  775000000
## 2  JEEP L.C.HDTP  313916667
## 3  JEEP S.C.HDTP  489222222
## 4  MICRO/MINIBUS  156250000
## 5        Minibus  180038462
## 6        MINIBUS  165302187
## 7        MINIVAN  158000000
## 8            MPV  408750000
## 9          Sedan  815000000
## 10         SEDAN  353163333
## 11           SUV  568527778

How many used cars are being sold in each regency?

using table() and prop.table() function we found that Jakarta Timur has the largest number of used car sold

table(data$showroom_regency)
## 
##          Banten   Jakarta Barat   Jakarta Pusat Jakarta Selatan   Jakarta Timur 
##              36              13              30              25              73 
##   Jakarta Utara 
##              33
prop.table(table(data$showroom_regency))
## 
##          Banten   Jakarta Barat   Jakarta Pusat Jakarta Selatan   Jakarta Timur 
##      0.17142857      0.06190476      0.14285714      0.11904762      0.34761905 
##   Jakarta Utara 
##      0.15714286

Which model and brand has the largest population in the market?

using xtabs() function we found that Toyota & Honda Minibus is the largest population in the market

xtabs( ~ model + id_merk, data = data)
##                id_merk
## model           Audi BMW Chevrolet Daihatsu Datsun Ford Honda Hyundai Isuzu
##   JEEP             0   0         0        0      0    0     0       0     0
##   JEEP L.C.HDTP    1   0         0        0      0    0     1       0     0
##   JEEP S.C.HDTP    0   0         0        0      0    0     3       0     0
##   MICRO/MINIBUS    0   0         0        1      0    0     0       0     0
##   Minibus          0   0         1        9      0    0     3       1     0
##   MINIBUS          0   0         0       16      1    0    23       1     1
##   MINIVAN          0   0         0        0      0    0     1       0     0
##   MPV              0   0         0        0      0    0     0       0     0
##   Sedan            0   1         0        0      0    0     0       0     0
##   SEDAN            0   5         0        0      0    1     6       0     0
##   SUV              0   1         0        0      0    0     4       2     0
##                id_merk
## model           Jaguar Jeep Lexus Mazda Mercedes Benz Mitsubishi Nissan Porsche
##   JEEP               0    1     0     0             0          0      0       0
##   JEEP L.C.HDTP      0    1     0     0             1          3      1       0
##   JEEP S.C.HDTP      0    0     1     0             1          1      0       0
##   MICRO/MINIBUS      0    0     0     0             0          1      0       0
##   Minibus            0    0     0     0             0          2      0       0
##   MINIBUS            0    0     0     2             0          5     10       0
##   MINIVAN            0    0     0     0             0          0      0       0
##   MPV                0    0     0     0             0          0      0       0
##   Sedan              0    0     0     0             0          0      0       0
##   SEDAN              1    0     0     0             7          0      0       0
##   SUV                0    0     1     2             0          3      1       1
##                id_merk
## model           Range Rover Suzuki Toyota VW Wuling
##   JEEP                    0      0      0  0      0
##   JEEP L.C.HDTP           0      0      3  1      0
##   JEEP S.C.HDTP           1      0      2  0      0
##   MICRO/MINIBUS           0      0      2  0      0
##   Minibus                 0      0     23  0      0
##   MINIBUS                 0      7     23  0      2
##   MINIVAN                 0      0      0  0      0
##   MPV                     0      0      4  0      0
##   Sedan                   0      0      0  0      0
##   SEDAN                   0      1      9  0      0
##   SUV                     1      0      2  0      0

Conclusion

As already known in Jakarta, popular models are minibus and sedan and the popular brands are Toyota, Honda, Daihatsu. Apparently the regency that has the most used car listed is Jakarta Timur.