Pengenalan

Saya Briandamar Kencana seorang penggemar data. Saya lulus dengan gelar sarjana statistik. Saya memiliki beberapa pengalaman kerja di bidang data, seperti magang di momobil.id, independent research and advisory Indonesia, staf data analis di Alif Aza Asia dan asisten peneliti di Bank Indonesia, dan kemudian saya melanjutkan karir saya sebagai analis di sebuah bank komersial di Indonesia. Selain itu, saya telah berpartisipasi dalam beberapa kursus pelatihan data di Algoritma Data Science School, Coursera, Udemy dan Dicoding.

Informasi Data

Data ini merupakan data yang diperoleh dari Kagle yang merupakan data harga pasar mobil di AS.

Covid-19 Data in Indonesia

Covid-19 Data in Indonesia

Permasalahan

Sebuah perusahaan mobil Cina Geely Auto ingin memasuki pasar AS dengan mendirikan unit manufaktur mereka di sana dan memproduksi mobil secara lokal untuk memberikan persaingan kepada rekan-rekan mereka di AS dan Eropa. Mereka telah mengontrak perusahaan konsultan mobil untuk memahami faktor-faktor yang menentukan harga mobil. Secara khusus, mereka ingin memahami faktor-faktor yang mempengaruhi harga mobil di pasar Amerika, karena itu mungkin sangat berbeda dari pasar Cina. Perusahaan ingin tahu:

  • Variabel mana yang signifikan dalam memprediksi harga mobil,
  • Seberapa baik variabel-variabel itu menggambarkan harga mobil

Berdasarkan berbagai survei pasar, perusahaan konsultan telah mengumpulkan kumpulan data besar dari berbagai jenis mobil di pasar Amerika.

Metode and Tujuan

Metode yang akan digunakan pada artikel ini adalah analisis regresi. Analisis Regresi adalah metode yang digunakan untuk mengukur pengaruh variabel bebas terhadap variabel tergantung. Analisis Regresi juga bisa digunakan untuk memprediksi variabel tergantung dengan menggunakan variabel bebas. Gujarati (2006) mendefinisikan analisis regresi sebagai kajian terhadap hubungan satu variabel yang disebut sebagai variabel yang diterangkan (the explained variabel) dengan satu atau dua variabel yang menerangkan (the explanatory).

Regresi

Regresi

Tujuan :

Kami diharuskan untuk memodelkan harga mobil dengan variabel independen yang tersedia. Ini akan digunakan oleh manajemen untuk memahami bagaimana sebenarnya harga bervariasi dengan variabel independen. Dengan demikian mereka dapat memanipulasi desain mobil, strategi bisnis dll untuk memenuhi tingkat harga tertentu. Lebih lanjut, model ini akan menjadi cara yang baik bagi manajemen untuk memahami dinamika harga pasar baru.

Data Preparation

library(dplyr)
library(ggplot2)

carprice <- read.csv("CarPrice_Assignment.csv")

rmarkdown::paged_table(carprice)
unique(carprice$CarName)
#>   [1] "alfa-romero giulia"              "alfa-romero stelvio"            
#>   [3] "alfa-romero Quadrifoglio"        "audi 100 ls"                    
#>   [5] "audi 100ls"                      "audi fox"                       
#>   [7] "audi 5000"                       "audi 4000"                      
#>   [9] "audi 5000s (diesel)"             "bmw 320i"                       
#>  [11] "bmw x1"                          "bmw x3"                         
#>  [13] "bmw z4"                          "bmw x4"                         
#>  [15] "bmw x5"                          "chevrolet impala"               
#>  [17] "chevrolet monte carlo"           "chevrolet vega 2300"            
#>  [19] "dodge rampage"                   "dodge challenger se"            
#>  [21] "dodge d200"                      "dodge monaco (sw)"              
#>  [23] "dodge colt hardtop"              "dodge colt (sw)"                
#>  [25] "dodge coronet custom"            "dodge dart custom"              
#>  [27] "dodge coronet custom (sw)"       "honda civic"                    
#>  [29] "honda civic cvcc"                "honda accord cvcc"              
#>  [31] "honda accord lx"                 "honda civic 1500 gl"            
#>  [33] "honda accord"                    "honda civic 1300"               
#>  [35] "honda prelude"                   "honda civic (auto)"             
#>  [37] "isuzu MU-X"                      "isuzu D-Max "                   
#>  [39] "isuzu D-Max V-Cross"             "jaguar xj"                      
#>  [41] "jaguar xf"                       "jaguar xk"                      
#>  [43] "maxda rx3"                       "maxda glc deluxe"               
#>  [45] "mazda rx2 coupe"                 "mazda rx-4"                     
#>  [47] "mazda glc deluxe"                "mazda 626"                      
#>  [49] "mazda glc"                       "mazda rx-7 gs"                  
#>  [51] "mazda glc 4"                     "mazda glc custom l"             
#>  [53] "mazda glc custom"                "buick electra 225 custom"       
#>  [55] "buick century luxus (sw)"        "buick century"                  
#>  [57] "buick skyhawk"                   "buick opel isuzu deluxe"        
#>  [59] "buick skylark"                   "buick century special"          
#>  [61] "buick regal sport coupe (turbo)" "mercury cougar"                 
#>  [63] "mitsubishi mirage"               "mitsubishi lancer"              
#>  [65] "mitsubishi outlander"            "mitsubishi g4"                  
#>  [67] "mitsubishi mirage g4"            "mitsubishi montero"             
#>  [69] "mitsubishi pajero"               "Nissan versa"                   
#>  [71] "nissan gt-r"                     "nissan rogue"                   
#>  [73] "nissan latio"                    "nissan titan"                   
#>  [75] "nissan leaf"                     "nissan juke"                    
#>  [77] "nissan note"                     "nissan clipper"                 
#>  [79] "nissan nv200"                    "nissan dayz"                    
#>  [81] "nissan fuga"                     "nissan otti"                    
#>  [83] "nissan teana"                    "nissan kicks"                   
#>  [85] "peugeot 504"                     "peugeot 304"                    
#>  [87] "peugeot 504 (sw)"                "peugeot 604sl"                  
#>  [89] "peugeot 505s turbo diesel"       "plymouth fury iii"              
#>  [91] "plymouth cricket"                "plymouth satellite custom (sw)" 
#>  [93] "plymouth fury gran sedan"        "plymouth valiant"               
#>  [95] "plymouth duster"                 "porsche macan"                  
#>  [97] "porcshce panamera"               "porsche cayenne"                
#>  [99] "porsche boxter"                  "renault 12tl"                   
#> [101] "renault 5 gtl"                   "saab 99e"                       
#> [103] "saab 99le"                       "saab 99gle"                     
#> [105] "subaru"                          "subaru dl"                      
#> [107] "subaru brz"                      "subaru baja"                    
#> [109] "subaru r1"                       "subaru r2"                      
#> [111] "subaru trezia"                   "subaru tribeca"                 
#> [113] "toyota corona mark ii"           "toyota corona"                  
#> [115] "toyota corolla 1200"             "toyota corona hardtop"          
#> [117] "toyota corolla 1600 (sw)"        "toyota carina"                  
#> [119] "toyota mark ii"                  "toyota corolla"                 
#> [121] "toyota corolla liftback"         "toyota celica gt liftback"      
#> [123] "toyota corolla tercel"           "toyota corona liftback"         
#> [125] "toyota starlet"                  "toyota tercel"                  
#> [127] "toyota cressida"                 "toyota celica gt"               
#> [129] "toyouta tercel"                  "vokswagen rabbit"               
#> [131] "volkswagen 1131 deluxe sedan"    "volkswagen model 111"           
#> [133] "volkswagen type 3"               "volkswagen 411 (sw)"            
#> [135] "volkswagen super beetle"         "volkswagen dasher"              
#> [137] "vw dasher"                       "vw rabbit"                      
#> [139] "volkswagen rabbit"               "volkswagen rabbit custom"       
#> [141] "volvo 145e (sw)"                 "volvo 144ea"                    
#> [143] "volvo 244dl"                     "volvo 245"                      
#> [145] "volvo 264gl"                     "volvo diesel"                   
#> [147] "volvo 246"
carprice$CarName <- gsub("maxda", "mazda", carprice$CarName)
carprice$CarName <- gsub("nissan", "Nissan", carprice$CarName)
carprice$CarName <- gsub("porcshce", "porsche", carprice$CarName)
carprice$CarName <- gsub("toyouta", "toyota", carprice$CarName)
carprice$CarName <- gsub("vokswagen", "volkswagen", carprice$CarName)
carprice$CarName <- gsub("vw", "volkswagen", carprice$CarName)
unique(carprice$CarName) 
#>   [1] "alfa-romero giulia"              "alfa-romero stelvio"            
#>   [3] "alfa-romero Quadrifoglio"        "audi 100 ls"                    
#>   [5] "audi 100ls"                      "audi fox"                       
#>   [7] "audi 5000"                       "audi 4000"                      
#>   [9] "audi 5000s (diesel)"             "bmw 320i"                       
#>  [11] "bmw x1"                          "bmw x3"                         
#>  [13] "bmw z4"                          "bmw x4"                         
#>  [15] "bmw x5"                          "chevrolet impala"               
#>  [17] "chevrolet monte carlo"           "chevrolet vega 2300"            
#>  [19] "dodge rampage"                   "dodge challenger se"            
#>  [21] "dodge d200"                      "dodge monaco (sw)"              
#>  [23] "dodge colt hardtop"              "dodge colt (sw)"                
#>  [25] "dodge coronet custom"            "dodge dart custom"              
#>  [27] "dodge coronet custom (sw)"       "honda civic"                    
#>  [29] "honda civic cvcc"                "honda accord cvcc"              
#>  [31] "honda accord lx"                 "honda civic 1500 gl"            
#>  [33] "honda accord"                    "honda civic 1300"               
#>  [35] "honda prelude"                   "honda civic (auto)"             
#>  [37] "isuzu MU-X"                      "isuzu D-Max "                   
#>  [39] "isuzu D-Max V-Cross"             "jaguar xj"                      
#>  [41] "jaguar xf"                       "jaguar xk"                      
#>  [43] "mazda rx3"                       "mazda glc deluxe"               
#>  [45] "mazda rx2 coupe"                 "mazda rx-4"                     
#>  [47] "mazda 626"                       "mazda glc"                      
#>  [49] "mazda rx-7 gs"                   "mazda glc 4"                    
#>  [51] "mazda glc custom l"              "mazda glc custom"               
#>  [53] "buick electra 225 custom"        "buick century luxus (sw)"       
#>  [55] "buick century"                   "buick skyhawk"                  
#>  [57] "buick opel isuzu deluxe"         "buick skylark"                  
#>  [59] "buick century special"           "buick regal sport coupe (turbo)"
#>  [61] "mercury cougar"                  "mitsubishi mirage"              
#>  [63] "mitsubishi lancer"               "mitsubishi outlander"           
#>  [65] "mitsubishi g4"                   "mitsubishi mirage g4"           
#>  [67] "mitsubishi montero"              "mitsubishi pajero"              
#>  [69] "Nissan versa"                    "Nissan gt-r"                    
#>  [71] "Nissan rogue"                    "Nissan latio"                   
#>  [73] "Nissan titan"                    "Nissan leaf"                    
#>  [75] "Nissan juke"                     "Nissan note"                    
#>  [77] "Nissan clipper"                  "Nissan nv200"                   
#>  [79] "Nissan dayz"                     "Nissan fuga"                    
#>  [81] "Nissan otti"                     "Nissan teana"                   
#>  [83] "Nissan kicks"                    "peugeot 504"                    
#>  [85] "peugeot 304"                     "peugeot 504 (sw)"               
#>  [87] "peugeot 604sl"                   "peugeot 505s turbo diesel"      
#>  [89] "plymouth fury iii"               "plymouth cricket"               
#>  [91] "plymouth satellite custom (sw)"  "plymouth fury gran sedan"       
#>  [93] "plymouth valiant"                "plymouth duster"                
#>  [95] "porsche macan"                   "porsche panamera"               
#>  [97] "porsche cayenne"                 "porsche boxter"                 
#>  [99] "renault 12tl"                    "renault 5 gtl"                  
#> [101] "saab 99e"                        "saab 99le"                      
#> [103] "saab 99gle"                      "subaru"                         
#> [105] "subaru dl"                       "subaru brz"                     
#> [107] "subaru baja"                     "subaru r1"                      
#> [109] "subaru r2"                       "subaru trezia"                  
#> [111] "subaru tribeca"                  "toyota corona mark ii"          
#> [113] "toyota corona"                   "toyota corolla 1200"            
#> [115] "toyota corona hardtop"           "toyota corolla 1600 (sw)"       
#> [117] "toyota carina"                   "toyota mark ii"                 
#> [119] "toyota corolla"                  "toyota corolla liftback"        
#> [121] "toyota celica gt liftback"       "toyota corolla tercel"          
#> [123] "toyota corona liftback"          "toyota starlet"                 
#> [125] "toyota tercel"                   "toyota cressida"                
#> [127] "toyota celica gt"                "volkswagen rabbit"              
#> [129] "volkswagen 1131 deluxe sedan"    "volkswagen model 111"           
#> [131] "volkswagen type 3"               "volkswagen 411 (sw)"            
#> [133] "volkswagen super beetle"         "volkswagen dasher"              
#> [135] "volkswagen rabbit custom"        "volvo 145e (sw)"                
#> [137] "volvo 144ea"                     "volvo 244dl"                    
#> [139] "volvo 245"                       "volvo 264gl"                    
#> [141] "volvo diesel"                    "volvo 246"
summary(carprice)
#>      car_ID      symboling         CarName            fueltype        
#>  Min.   :  1   Min.   :-2.0000   Length:205         Length:205        
#>  1st Qu.: 52   1st Qu.: 0.0000   Class :character   Class :character  
#>  Median :103   Median : 1.0000   Mode  :character   Mode  :character  
#>  Mean   :103   Mean   : 0.8341                                        
#>  3rd Qu.:154   3rd Qu.: 2.0000                                        
#>  Max.   :205   Max.   : 3.0000                                        
#>   aspiration         doornumber          carbody           drivewheel       
#>  Length:205         Length:205         Length:205         Length:205        
#>  Class :character   Class :character   Class :character   Class :character  
#>  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
#>                                                                             
#>                                                                             
#>                                                                             
#>  enginelocation       wheelbase        carlength        carwidth    
#>  Length:205         Min.   : 86.60   Min.   :141.1   Min.   :60.30  
#>  Class :character   1st Qu.: 94.50   1st Qu.:166.3   1st Qu.:64.10  
#>  Mode  :character   Median : 97.00   Median :173.2   Median :65.50  
#>                     Mean   : 98.76   Mean   :174.0   Mean   :65.91  
#>                     3rd Qu.:102.40   3rd Qu.:183.1   3rd Qu.:66.90  
#>                     Max.   :120.90   Max.   :208.1   Max.   :72.30  
#>    carheight       curbweight    enginetype        cylindernumber    
#>  Min.   :47.80   Min.   :1488   Length:205         Length:205        
#>  1st Qu.:52.00   1st Qu.:2145   Class :character   Class :character  
#>  Median :54.10   Median :2414   Mode  :character   Mode  :character  
#>  Mean   :53.72   Mean   :2556                                        
#>  3rd Qu.:55.50   3rd Qu.:2935                                        
#>  Max.   :59.80   Max.   :4066                                        
#>    enginesize     fuelsystem          boreratio        stroke     
#>  Min.   : 61.0   Length:205         Min.   :2.54   Min.   :2.070  
#>  1st Qu.: 97.0   Class :character   1st Qu.:3.15   1st Qu.:3.110  
#>  Median :120.0   Mode  :character   Median :3.31   Median :3.290  
#>  Mean   :126.9                      Mean   :3.33   Mean   :3.255  
#>  3rd Qu.:141.0                      3rd Qu.:3.58   3rd Qu.:3.410  
#>  Max.   :326.0                      Max.   :3.94   Max.   :4.170  
#>  compressionratio   horsepower       peakrpm        citympg     
#>  Min.   : 7.00    Min.   : 48.0   Min.   :4150   Min.   :13.00  
#>  1st Qu.: 8.60    1st Qu.: 70.0   1st Qu.:4800   1st Qu.:19.00  
#>  Median : 9.00    Median : 95.0   Median :5200   Median :24.00  
#>  Mean   :10.14    Mean   :104.1   Mean   :5125   Mean   :25.22  
#>  3rd Qu.: 9.40    3rd Qu.:116.0   3rd Qu.:5500   3rd Qu.:30.00  
#>  Max.   :23.00    Max.   :288.0   Max.   :6600   Max.   :49.00  
#>    highwaympg        price      
#>  Min.   :16.00   Min.   : 5118  
#>  1st Qu.:25.00   1st Qu.: 7788  
#>  Median :30.00   Median :10295  
#>  Mean   :30.75   Mean   :13277  
#>  3rd Qu.:34.00   3rd Qu.:16503  
#>  Max.   :54.00   Max.   :45400
str(carprice)
#> 'data.frame':    205 obs. of  26 variables:
#>  $ car_ID          : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ symboling       : int  3 3 1 2 2 2 1 1 1 0 ...
#>  $ CarName         : chr  "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio" "audi 100 ls" ...
#>  $ fueltype        : chr  "gas" "gas" "gas" "gas" ...
#>  $ aspiration      : chr  "std" "std" "std" "std" ...
#>  $ doornumber      : chr  "two" "two" "two" "four" ...
#>  $ carbody         : chr  "convertible" "convertible" "hatchback" "sedan" ...
#>  $ drivewheel      : chr  "rwd" "rwd" "rwd" "fwd" ...
#>  $ enginelocation  : chr  "front" "front" "front" "front" ...
#>  $ wheelbase       : num  88.6 88.6 94.5 99.8 99.4 ...
#>  $ carlength       : num  169 169 171 177 177 ...
#>  $ carwidth        : num  64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
#>  $ carheight       : num  48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
#>  $ curbweight      : int  2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
#>  $ enginetype      : chr  "dohc" "dohc" "ohcv" "ohc" ...
#>  $ cylindernumber  : chr  "four" "four" "six" "four" ...
#>  $ enginesize      : int  130 130 152 109 136 136 136 136 131 131 ...
#>  $ fuelsystem      : chr  "mpfi" "mpfi" "mpfi" "mpfi" ...
#>  $ boreratio       : num  3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
#>  $ stroke          : num  2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
#>  $ compressionratio: num  9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
#>  $ horsepower      : int  111 111 154 102 115 110 110 110 140 160 ...
#>  $ peakrpm         : int  5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
#>  $ citympg         : int  21 21 19 24 18 19 19 19 17 16 ...
#>  $ highwaympg      : int  27 27 26 30 22 25 25 25 20 22 ...
#>  $ price           : num  13495 16500 16500 13950 17450 ...
library(tidyr)
carprice<-carprice %>% 
  mutate_if(is.character,as.factor) %>% 
  separate(
    CarName,
    into = c("Brand", "Tipe"),
    sep = " ",
    fill = "right"
  ) %>% 
  select(-Tipe) %>% 
  mutate(Brand=as.factor(Brand)) %>% 
  select(-car_ID)

rmarkdown::paged_table(carprice)
# Cek nilai NA

colSums(is.na(carprice))
#>        symboling            Brand         fueltype       aspiration 
#>                0                0                0                0 
#>       doornumber          carbody       drivewheel   enginelocation 
#>                0                0                0                0 
#>        wheelbase        carlength         carwidth        carheight 
#>                0                0                0                0 
#>       curbweight       enginetype   cylindernumber       enginesize 
#>                0                0                0                0 
#>       fuelsystem        boreratio           stroke compressionratio 
#>                0                0                0                0 
#>       horsepower          peakrpm          citympg       highwaympg 
#>                0                0                0                0 
#>            price 
#>                0

Exploratory Data Analysis

carprice %>% 
  group_by(Brand) %>%    
  summarise(counts= n()) %>% 
  ggplot(aes(x=Brand, y=counts))+
   geom_bar(
    aes(x = reorder(Brand,-counts), y= counts,fill = Brand),
    stat = "identity", position = position_dodge(0.8),
    width = 0.7
  )+   theme(axis.text.x = element_text(face="bold", color="#993333", size=9, angle=45),panel.background = element_rect(fill = '#252a32'),
        panel.grid.minor = element_line(color = '#bab6b8'),
        panel.grid.major = element_line(color = '#696868'))+
  ggtitle("Brand") +
  geom_text(aes(label= counts),vjust=1.0, color="white", size=2.0)

Berdasarkan gambar diatas, brand toyota merupakan brand mobil yang paling banyak terjual, kemudian diikuti dengan brand Nissan, mazda dan honda

carprice %>% 
  group_by(Brand) %>%    
  summarise(mean_prices= mean(price)) %>% 
  arrange(-mean_prices) %>%
  ggplot(aes(x = reorder(Brand,-mean_prices),y=mean_prices)) +
  geom_bar(
    aes(fill = Brand),
    stat = "identity", position = position_dodge(0.8),
    width = 0.4
  ) +theme(axis.text.x = element_text(face="bold", color="#993333", size=9, angle=90))+
  ggtitle("Plot Rata-rata harga berdasarkan brand")+labs(x="Brand", y="Rata-rata Harga")

Rata-rata harga mobil yang paling mahal ada di brand jaguar kemudian dikuti dengan brand buick,porsche, dan BMW

carprice<-carprice %>% select(-c(Brand,symboling))
library(GGally)
ggcorr(carprice, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)

Berdasarkan gambar diatas, terlihat bahwa beberapa variabel predictor mempunyai hubungan liner terhadap price namun beberapa juga terindikasi terjadi multikolinearitas.

Data Modelling

Splitting and Modelling

Saya akan membagi data train dengan proporsi 80% dan data test dengan proporsi 20%dengan menggunakan fungsi dibawah ini.

set.seed(1)
index <- sample(nrow(carprice), nrow(carprice)*0.80)

data_train <- carprice[index,]
data_test <- carprice[-index,]

Setelah data berhasil di bagi menjadi data train dan data test, saya akan memulai dengan memodelkan data train langsung menggunakan metode feature selection stepwise.

set.seed(1)

lm.none <- lm(price~1, data_train)

lm.all <- lm(price~., data_train)

carlm_stepwise <- step(object = lm.none, 
                   scope = list(lower = lm.none, upper = lm.all), 
                   direction = "both")
#> Start:  AIC=2956.59
#> price ~ 1
#> 
#>                    Df  Sum of Sq         RSS    AIC
#> + enginesize        1 8264991200  2674518013 2727.6
#> + curbweight        1 7420830084  3518679129 2772.6
#> + horsepower        1 7135120700  3804388513 2785.4
#> + cylindernumber    6 6951498007  3988011207 2803.1
#> + carwidth          1 5930289491  5009219722 2830.5
#> + highwaympg        1 5252791711  5686717502 2851.3
#> + citympg           1 5118986835  5820522378 2855.1
#> + carlength         1 4746240093  6193269120 2865.3
#> + drivewheel        2 4414452931  6525056282 2875.8
#> + boreratio         1 4011037542  6928471672 2883.7
#> + fuelsystem        5 3787940791  7151568422 2896.9
#> + wheelbase         1 2949794228  7989714985 2907.1
#> + enginetype        6 2660975201  8278534012 2922.9
#> + carbody           4 1752989250  9186519963 2935.9
#> + enginelocation    1 1362582691  9576926522 2936.8
#> + aspiration        1  361543375 10577965838 2953.1
#> <none>                           10939509213 2956.6
#> + peakrpm           1  115685389 10823823824 2956.8
#> + fueltype          1   87488732 10852020481 2957.3
#> + carheight         1   50058822 10889450391 2957.8
#> + stroke            1   37942915 10901566298 2958.0
#> + compressionratio  1   33463605 10906045608 2958.1
#> + doornumber        1    1280342 10938228872 2958.6
#> 
#> Step:  AIC=2727.58
#> price ~ enginesize
#> 
#>                    Df  Sum of Sq         RSS    AIC
#> + enginetype        6  708852837  1965665176 2689.1
#> + cylindernumber    6  643904063  2030613950 2694.4
#> + drivewheel        2  407632835  2266885178 2704.5
#> + horsepower        1  364775892  2309742121 2705.5
#> + fuelsystem        5  473854295  2200663719 2705.6
#> + enginelocation    1  338507832  2336010181 2707.4
#> + curbweight        1  324692187  2349825827 2708.3
#> + carwidth          1  287345658  2387172356 2710.9
#> + citympg           1  274272252  2400245761 2711.8
#> + highwaympg        1  240226854  2434291159 2714.1
#> + peakrpm           1  181205153  2493312860 2718.1
#> + stroke            1  144555148  2529962865 2720.5
#> + carlength         1  122718621  2551799393 2721.9
#> + aspiration        1  115424191  2559093822 2722.3
#> + boreratio         1   79787935  2594730079 2724.6
#> + wheelbase         1   56633300  2617884713 2726.1
#> + carbody           4  150321412  2524196602 2726.1
#> <none>                            2674518013 2727.6
#> + fueltype          1   23315593  2651202420 2728.1
#> + carheight         1   21278201  2653239813 2728.3
#> + compressionratio  1   20503572  2654014442 2728.3
#> + doornumber        1    4794717  2669723297 2729.3
#> - enginesize        1 8264991200 10939509213 2956.6
#> 
#> Step:  AIC=2689.07
#> price ~ enginesize + enginetype
#> 
#>                    Df  Sum of Sq        RSS    AIC
#> + cylindernumber    5  396735336 1568929840 2662.1
#> + stroke            1  218197421 1747467755 2671.8
#> + enginelocation    1  173308915 1792356262 2675.9
#> + horsepower        1  168716802 1796948374 2676.4
#> + curbweight        1  153152636 1812512541 2677.8
#> + peakrpm           1  146310269 1819354908 2678.4
#> + carwidth          1  134479127 1831186049 2679.4
#> + fuelsystem        5  201114272 1764550905 2681.4
#> + drivewheel        2  112318730 1853346447 2683.4
#> + aspiration        1   86989469 1878675708 2683.7
#> + carbody           4  146895483 1818769693 2684.3
#> + citympg           1   55630079 1910035097 2686.4
#> + highwaympg        1   54556918 1911108259 2686.5
#> + carheight         1   54524478 1911140698 2686.5
#> + carlength         1   33000085 1932665092 2688.3
#> <none>                           1965665176 2689.1
#> + wheelbase         1   11837558 1953827618 2690.1
#> + boreratio         1   11643525 1954021652 2690.1
#> + fueltype          1    9803681 1955861495 2690.2
#> + compressionratio  1    8912271 1956752906 2690.3
#> + doornumber        1    2774973 1962890203 2690.8
#> - enginetype        6  708852837 2674518013 2727.6
#> - enginesize        1 6312868835 8278534012 2922.9
#> 
#> Step:  AIC=2662.1
#> price ~ enginesize + enginetype + cylindernumber
#> 
#>                    Df  Sum of Sq        RSS    AIC
#> + horsepower        1  273017595 1295912245 2632.8
#> + stroke            1  247760370 1321169470 2635.9
#> + curbweight        1  196273462 1372656378 2642.2
#> + peakrpm           1  138950003 1429979837 2648.9
#> + enginelocation    1  134563326 1434366514 2649.4
#> + drivewheel        2  125388591 1443541249 2652.4
#> + aspiration        1  106974563 1461955277 2652.5
#> + fuelsystem        5  174432069 1394497771 2652.8
#> + carwidth          1  100558558 1468371282 2653.2
#> + citympg           1   82192458 1486737382 2655.3
#> + highwaympg        1   58404436 1510525404 2657.9
#> + carbody           4  109474446 1459455394 2658.2
#> + boreratio         1   49715856 1519213984 2658.8
#> + carlength         1   37878000 1531051840 2660.1
#> + carheight         1   19707730 1549222110 2662.0
#> <none>                           1568929840 2662.1
#> + doornumber        1    4765451 1564164389 2663.6
#> + compressionratio  1    4155616 1564774224 2663.7
#> + fueltype          1    3255069 1565674771 2663.8
#> + wheelbase         1     494742 1568435098 2664.1
#> - cylindernumber    5  396735336 1965665176 2689.1
#> - enginetype        5  461684110 2030613950 2694.4
#> - enginesize        1 1577749669 3146679509 2774.2
#> 
#> Step:  AIC=2632.75
#> price ~ enginesize + enginetype + cylindernumber + horsepower
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + fuelsystem        5 213863249 1082048996 2613.2
#> + stroke            1 157084595 1138827649 2613.6
#> + compressionratio  1 128390790 1167521455 2617.6
#> + curbweight        1 112572079 1183340166 2619.8
#> + fueltype          1 102141265 1193770980 2621.3
#> + carbody           4 132157108 1163755137 2623.1
#> + carwidth          1  59232649 1236679595 2627.1
#> + drivewheel        2  69146545 1226765700 2627.8
#> + enginelocation    1  47965307 1247946938 2628.6
#> + carheight         1  34710761 1261201484 2630.3
#> + carlength         1  30427266 1265484979 2630.8
#> <none>                          1295912245 2632.8
#> + wheelbase         1  12710719 1283201526 2633.1
#> + aspiration        1  12590352 1283321893 2633.2
#> + peakrpm           1  10379960 1285532285 2633.4
#> + highwaympg        1   9750803 1286161442 2633.5
#> + boreratio         1   5919919 1289992325 2634.0
#> + citympg           1   5853206 1290059038 2634.0
#> + doornumber        1   3991043 1291921202 2634.2
#> - horsepower        1 273017595 1568929840 2662.1
#> - enginesize        1 372840917 1668753162 2672.2
#> - cylindernumber    5 501036129 1796948374 2676.4
#> - enginetype        5 533819009 1829731254 2679.3
#> 
#> Step:  AIC=2613.17
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + stroke            1 140678263  941370733 2592.3
#> + curbweight        1  50730963 1031318034 2607.3
#> + carbody           4  86890787  995158209 2607.4
#> + peakrpm           1  42534077 1039514919 2608.6
#> + enginelocation    1  27554626 1054494370 2610.9
#> + compressionratio  1  17272616 1064776381 2612.5
#> <none>                          1082048996 2613.2
#> + carwidth          1  12805578 1069243419 2613.2
#> + drivewheel        2  23886207 1058162790 2613.5
#> + citympg           1   4661662 1077387334 2614.5
#> + carlength         1   2509043 1079539954 2614.8
#> + carheight         1   1254573 1080794423 2615.0
#> + boreratio         1    588198 1081460798 2615.1
#> + highwaympg        1    480891 1081568106 2615.1
#> + aspiration        1    284797 1081764199 2615.1
#> + wheelbase         1      1632 1082047364 2615.2
#> + doornumber        1       177 1082048819 2615.2
#> - fuelsystem        5 213863249 1295912245 2632.8
#> - enginesize        1 196905072 1278954068 2638.6
#> - horsepower        1 312448775 1394497771 2652.8
#> - cylindernumber    5 487221643 1569270639 2664.1
#> - enginetype        5 560219831 1642268827 2671.6
#> 
#> Step:  AIC=2592.33
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + peakrpm           1  61826746  879543987 2583.2
#> + carbody           4  72451418  868919315 2587.2
#> + enginelocation    1  30212317  911158416 2589.0
#> + curbweight        1  24528103  916842630 2590.0
#> <none>                           941370733 2592.3
#> + carwidth          1  10510072  930860661 2592.5
#> + aspiration        1   4296165  937074568 2593.6
#> + boreratio         1   2517163  938853570 2593.9
#> + compressionratio  1   2207814  939162919 2593.9
#> + citympg           1    610659  940760074 2594.2
#> + wheelbase         1    500410  940870323 2594.2
#> + doornumber        1    119544  941251189 2594.3
#> + carlength         1    107297  941263436 2594.3
#> + highwaympg        1     94986  941275747 2594.3
#> + carheight         1     68611  941302122 2594.3
#> + drivewheel        2   4023076  937347657 2595.6
#> - stroke            1 140678263 1082048996 2613.2
#> - fuelsystem        5 197456916 1138827649 2613.6
#> - horsepower        1 256720021 1198090754 2629.9
#> - enginesize        1 320826141 1262196874 2638.4
#> - cylindernumber    5 476157667 1417528400 2649.5
#> - enginetype        5 541651000 1483021733 2656.9
#> 
#> Step:  AIC=2583.19
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke + peakrpm
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + curbweight        1  45771282  833772705 2576.4
#> + carbody           4  68754502  810789485 2577.8
#> + aspiration        1  31857289  847686698 2579.1
#> + carwidth          1  19378638  860165349 2581.5
#> <none>                           879543987 2583.2
#> + enginelocation    1   8394366  871149621 2583.6
#> + compressionratio  1   2356091  877187896 2584.8
#> + doornumber        1   1454992  878088995 2584.9
#> + citympg           1   1037238  878506749 2585.0
#> + carheight         1    792329  878751658 2585.0
#> + carlength         1    654141  878889846 2585.1
#> + wheelbase         1    176886  879367101 2585.2
#> + highwaympg        1    112498  879431490 2585.2
#> + boreratio         1     29594  879514393 2585.2
#> + drivewheel        2    651626  878892361 2587.1
#> - peakrpm           1  61826746  941370733 2592.3
#> - horsepower        1 121787705 1001331692 2602.5
#> - stroke            1 159970932 1039514919 2608.6
#> - fuelsystem        5 237287221 1116831208 2612.4
#> - enginesize        1 379443776 1258987763 2640.0
#> - cylindernumber    5 445347305 1324891292 2640.4
#> - enginetype        5 544576119 1424120106 2652.2
#> 
#> Step:  AIC=2576.42
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke + peakrpm + curbweight
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + carbody           4  76409403  757363302 2568.7
#> + enginelocation    1  37599633  796173072 2570.9
#> + carlength         1  21724691  812048014 2574.1
#> + highwaympg        1  16537703  817235002 2575.1
#> + wheelbase         1  16503291  817269415 2575.1
#> + aspiration        1  16459528  817313178 2575.2
#> <none>                           833772705 2576.4
#> + citympg           1   6603893  827168812 2577.1
#> + carheight         1   5844545  827928160 2577.3
#> + doornumber        1   3695899  830076806 2577.7
#> + carwidth          1   2432602  831340103 2577.9
#> + drivewheel        2  11263994  822508711 2578.2
#> + boreratio         1   1085443  832687262 2578.2
#> + compressionratio  1    193412  833579293 2578.4
#> - curbweight        1  45771282  879543987 2583.2
#> - horsepower        1  81291920  915064625 2589.7
#> - peakrpm           1  83069925  916842630 2590.0
#> - stroke            1 127733323  961506028 2597.8
#> - fuelsystem        5 179555406 1013328111 2598.4
#> - enginesize        1 184792960 1018565665 2607.2
#> - cylindernumber    5 444245942 1278018648 2636.5
#> - enginetype        5 573696769 1407469474 2652.3
#> 
#> Step:  AIC=2568.66
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke + peakrpm + curbweight + carbody
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + enginelocation    1  23877034  733486268 2565.4
#> + aspiration        1  21275290  736088013 2566.0
#> + highwaympg        1  17415047  739948255 2566.8
#> + citympg           1  11411423  745951879 2568.2
#> + carlength         1  10517499  746845804 2568.4
#> <none>                           757363302 2568.7
#> + carwidth          1   9163362  748199940 2568.7
#> + doornumber        1   4836478  752526824 2569.6
#> + boreratio         1   2390917  754972385 2570.1
#> + compressionratio  1    930531  756432771 2570.5
#> + wheelbase         1    720387  756642916 2570.5
#> + carheight         1     40741  757322561 2570.7
#> + drivewheel        2   6107167  751256135 2571.3
#> - carbody           4  76409403  833772705 2576.4
#> - curbweight        1  53426183  810789485 2577.8
#> - horsepower        1  73542247  830905549 2581.9
#> - peakrpm           1  80620847  837984149 2583.2
#> - fuelsystem        5 144197733  901561035 2587.2
#> - stroke            1 102731259  860094561 2587.5
#> - enginesize        1 118416249  875779551 2590.5
#> - cylindernumber    5 393442907 1150806209 2627.3
#> - enginetype        5 529436585 1286799887 2645.6
#> 
#> Step:  AIC=2565.41
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke + peakrpm + curbweight + carbody + enginelocation
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + aspiration        1  35698921  697787347 2559.2
#> + carwidth          1  13731321  719754947 2564.3
#> + highwaympg        1  12287973  721198295 2564.6
#> + citympg           1  10442137  723044131 2565.1
#> + carlength         1  10236669  723249599 2565.1
#> <none>                           733486268 2565.4
#> + doornumber        1   8297544  725188723 2565.5
#> + compressionratio  1   6033311  727452957 2566.1
#> + carheight         1   1098016  732388252 2567.2
#> + boreratio         1    861583  732624685 2567.2
#> + wheelbase         1       759  733485509 2567.4
#> + drivewheel        2   3645441  729840827 2568.6
#> - enginelocation    1  23877034  757363302 2568.7
#> - carbody           4  62686805  796173072 2570.9
#> - horsepower        1  40790727  774276995 2572.3
#> - peakrpm           1  48012429  781498697 2573.8
#> - fuelsystem        5  89838187  823324455 2574.4
#> - curbweight        1  71098725  804584993 2578.6
#> - stroke            1  98126452  831612720 2584.0
#> - enginesize        1 101762823  835249091 2584.7
#> - cylindernumber    5 366118397 1099604665 2621.8
#> - enginetype        5 398339258 1131825526 2626.6
#> 
#> Step:  AIC=2559.22
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke + peakrpm + curbweight + carbody + enginelocation + 
#>     aspiration
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + highwaympg        1  19859654  677927693 2556.5
#> + carwidth          1  17847579  679939768 2557.0
#> - horsepower        1    351796  698139143 2557.3
#> + citympg           1  10235364  687551983 2558.8
#> <none>                           697787347 2559.2
#> + doornumber        1   5715287  692072060 2559.9
#> + carlength         1   3457215  694330132 2560.4
#> + compressionratio  1    639151  697148196 2561.1
#> + carheight         1    561863  697225484 2561.1
#> + wheelbase         1    166574  697620773 2561.2
#> + boreratio         1        24  697787323 2561.2
#> + drivewheel        2   6213609  691573738 2561.8
#> - fuelsystem        5  56972545  754759892 2562.1
#> - carbody           4  62100903  759888250 2565.2
#> - aspiration        1  35698921  733486268 2565.4
#> - enginelocation    1  38300666  736088013 2566.0
#> - curbweight        1  54780788  752568135 2569.6
#> - peakrpm           1  68903103  766690450 2572.7
#> - stroke            1 120127519  817914865 2583.3
#> - enginesize        1 137326654  835114001 2586.7
#> - cylindernumber    5 309305195 1007092542 2609.4
#> - enginetype        5 392490089 1090277436 2622.4
#> 
#> Step:  AIC=2556.49
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     fuelsystem + stroke + peakrpm + curbweight + carbody + enginelocation + 
#>     aspiration + highwaympg
#> 
#>                    Df Sum of Sq        RSS    AIC
#> - fuelsystem        5  30996655  708924348 2553.8
#> - horsepower        1   1160518  679088211 2554.8
#> + carwidth          1  15101336  662826356 2554.8
#> <none>                           677927693 2556.5
#> + doornumber        1   6134196  671793497 2557.0
#> + compressionratio  1   4698942  673228751 2557.3
#> + citympg           1   1464255  676463438 2558.1
#> + carlength         1    694005  677233687 2558.3
#> + boreratio         1    180279  677747414 2558.4
#> + carheight         1     20895  677906797 2558.5
#> + wheelbase         1       599  677927094 2558.5
#> + drivewheel        2   6962027  670965666 2558.8
#> - highwaympg        1  19859654  697787347 2559.2
#> - enginelocation    1  32312933  710240625 2562.1
#> - carbody           4  63140267  741067959 2563.1
#> - aspiration        1  43270602  721198295 2564.6
#> - curbweight        1  72927964  750855657 2571.2
#> - peakrpm           1  76961677  754889370 2572.1
#> - stroke            1 121744957  799672650 2581.6
#> - enginesize        1 138576327  816504020 2585.0
#> - cylindernumber    5 305807534  983735227 2607.6
#> - enginetype        5 389141577 1067069269 2620.9
#> 
#> Step:  AIC=2553.82
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     stroke + peakrpm + curbweight + carbody + enginelocation + 
#>     aspiration + highwaympg
#> 
#>                    Df Sum of Sq        RSS    AIC
#> + carwidth          1  22865581  686058767 2550.4
#> - horsepower        1    983309  709907657 2552.1
#> <none>                           708924348 2553.8
#> + doornumber        1   5502922  703421426 2554.5
#> + carheight         1   2602552  706321796 2555.2
#> + wheelbase         1   2325378  706598970 2555.3
#> + fueltype          1   2271928  706652420 2555.3
#> + compressionratio  1   2266198  706658150 2555.3
#> + boreratio         1   1586958  707337390 2555.4
#> + citympg           1    109545  708814803 2555.8
#> + carlength         1    101289  708823059 2555.8
#> + drivewheel        2   7101361  701822987 2556.2
#> + fuelsystem        5  30996655  677927693 2556.5
#> - enginelocation    1  40211349  749135697 2560.9
#> - highwaympg        1  45835544  754759892 2562.1
#> - carbody           4  75240207  784164555 2562.4
#> - aspiration        1  57048982  765973330 2564.5
#> - peakrpm           1  88798200  797722548 2571.2
#> - curbweight        1 115605233  824529581 2576.6
#> - enginesize        1 141124970  850049318 2581.6
#> - stroke            1 153242774  862167122 2583.9
#> - cylindernumber    5 329923738 1038848086 2606.5
#> - enginetype        5 381019239 1089943587 2614.4
#> 
#> Step:  AIC=2550.44
#> price ~ enginesize + enginetype + cylindernumber + horsepower + 
#>     stroke + peakrpm + curbweight + carbody + enginelocation + 
#>     aspiration + highwaympg + carwidth
#> 
#>                    Df Sum of Sq        RSS    AIC
#> - horsepower        1     92418  686151185 2548.5
#> <none>                           686058767 2550.4
#> + doornumber        1   6595626  679463140 2550.9
#> + carlength         1   3811778  682246988 2551.5
#> + carheight         1   1165546  684893220 2552.2
#> + fueltype          1    816471  685242296 2552.2
#> + boreratio         1    472922  685585845 2552.3
#> + wheelbase         1    403908  685654858 2552.3
#> + compressionratio  1    355416  685703350 2552.4
#> + citympg           1    348838  685709928 2552.4
#> + drivewheel        2   2815501  683243266 2553.8
#> - carwidth          1  22865581  708924348 2553.8
#> + fuelsystem        5  23232410  662826356 2554.8
#> - highwaympg        1  36800737  722859504 2557.0
#> - enginelocation    1  48431713  734490480 2559.6
#> - curbweight        1  48644219  734702986 2559.7
#> - carbody           4  79287534  765346301 2560.4
#> - aspiration        1  63803023  749861789 2563.0
#> - peakrpm           1  91382621  777441388 2568.9
#> - enginesize        1 145264872  831323638 2579.9
#> - stroke            1 155310978  841369744 2581.9
#> - cylindernumber    5 313744258  999803024 2602.2
#> - enginetype        5 390727930 1076786697 2614.4
#> 
#> Step:  AIC=2548.47
#> price ~ enginesize + enginetype + cylindernumber + stroke + peakrpm + 
#>     curbweight + carbody + enginelocation + aspiration + highwaympg + 
#>     carwidth
#> 
#>                    Df Sum of Sq        RSS    AIC
#> <none>                           686151185 2548.5
#> + doornumber        1   6655136  679496049 2548.9
#> + carlength         1   3813832  682337353 2549.6
#> + carheight         1   1140456  685010728 2550.2
#> + boreratio         1    556023  685595162 2550.3
#> + wheelbase         1    474898  685676287 2550.3
#> + fueltype          1    454811  685696374 2550.4
#> + citympg           1    434647  685716538 2550.4
#> + compressionratio  1    147894  686003290 2550.4
#> + horsepower        1     92418  686058767 2550.4
#> + drivewheel        2   2838321  683312863 2551.8
#> - carwidth          1  23756472  709907657 2552.1
#> + fuelsystem        5  22976766  663174418 2552.9
#> - highwaympg        1  45947909  732099093 2557.1
#> - curbweight        1  48556451  734707635 2557.7
#> - carbody           4  79204795  765355980 2558.4
#> - enginelocation    1  54507088  740658273 2559.0
#> - aspiration        1  92388525  778539710 2567.2
#> - peakrpm           1 122853569  809004753 2573.5
#> - stroke            1 169740570  855891755 2582.7
#> - enginesize        1 192847357  878998542 2587.1
#> - cylindernumber    5 320464121 1006615306 2601.3
#> - enginetype        5 391148835 1077300019 2612.4
summary(carlm_stepwise)
#> 
#> Call:
#> lm(formula = price ~ enginesize + enginetype + cylindernumber + 
#>     stroke + peakrpm + curbweight + carbody + enginelocation + 
#>     aspiration + highwaympg + carwidth, data = data_train)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#>  -5623  -1095      0   1124  10118 
#> 
#> Coefficients: (1 not defined because of singularities)
#>                         Estimate  Std. Error t value      Pr(>|t|)    
#> (Intercept)          -37509.8120  13875.8552  -2.703      0.007717 ** 
#> enginesize              126.1726     20.1143   6.273 0.00000000414 ***
#> enginetypedohcv       -8632.6313   3139.2174  -2.750      0.006748 ** 
#> enginetypel             495.5490   1249.4501   0.397      0.692256    
#> enginetypeohc          3240.0641    852.9449   3.799      0.000216 ***
#> enginetypeohcf          304.0710   1372.3137   0.222      0.824967    
#> enginetypeohcv        -6940.6205   1198.3983  -5.792 0.00000004402 ***
#> enginetyperotor       -1844.8098   3520.2444  -0.524      0.601067    
#> cylindernumberfive   -10609.8721   2673.8108  -3.968      0.000115 ***
#> cylindernumberfour   -12573.3466   2610.1193  -4.817 0.00000373654 ***
#> cylindernumbersix     -8336.3775   2121.5934  -3.929      0.000133 ***
#> cylindernumberthree   -5684.2351   3911.4000  -1.453      0.148393    
#> cylindernumbertwelve -13382.3405   2904.4368  -4.608 0.00000907684 ***
#> cylindernumbertwo             NA          NA      NA            NA    
#> stroke                -5005.8464    850.6102  -5.885 0.00000002805 ***
#> peakrpm                   2.6347      0.5262   5.007 0.00000163876 ***
#> curbweight                4.7769      1.5176   3.148      0.002012 ** 
#> carbodyhardtop        -3108.0866   1330.5722  -2.336      0.020916 *  
#> carbodyhatchback      -3723.8762   1134.3907  -3.283      0.001298 ** 
#> carbodysedan          -2784.2683   1114.1853  -2.499      0.013613 *  
#> carbodywagon          -3788.1228   1213.5771  -3.121      0.002187 ** 
#> enginelocationrear     7476.3848   2241.8732   3.335      0.001092 ** 
#> aspirationturbo        2547.3549    586.7138   4.342 0.00002693543 ***
#> highwaympg              173.5071     56.6670   3.062      0.002638 ** 
#> carwidth                485.9678    220.7305   2.202      0.029328 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2214 on 140 degrees of freedom
#> Multiple R-squared:  0.9373, Adjusted R-squared:  0.927 
#> F-statistic: 90.96 on 23 and 140 DF,  p-value: < 0.00000000000000022

Hasil diatas menunjukan terdapat koefisien yang bernilai NA, hal ini dijelaskan pada warning diatas karena singularitas atau tidak memiliki determinan untuk cylindernumbertwo. Hal tersebut bisa terjadi karena terdapat multikolinearitas antara variabel tersebut dengan lainnya. Saya akan coba menghilangkan variabel cylindernumber

carprice<-carprice %>% select(enginesize,enginetype, 
    stroke,peakrpm,curbweight,carbody ,enginelocation,
    aspiration,highwaympg,carwidth,price)
set.seed(1)
index <- sample(nrow(carprice), nrow(carprice)*0.80)

data_train <- carprice[index,]
data_test <- carprice[-index,]
set.seed(1)
carlm_stepwise2<-lm(formula = price ~ ., data = data_train)
summary(carlm_stepwise2)
#> 
#> Call:
#> lm(formula = price ~ ., data = data_train)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#>  -9297  -1348    -94   1252  12161 
#> 
#> Coefficients:
#>                       Estimate  Std. Error t value             Pr(>|t|)    
#> (Intercept)        -54317.8101  13694.8150  -3.966             0.000114 ***
#> enginesize            152.9313     15.9287   9.601 < 0.0000000000000002 ***
#> enginetypedohcv       544.7787   3041.2348   0.179             0.858085    
#> enginetypel          -205.6431   1314.8680  -0.156             0.875937    
#> enginetypeohc        2126.7167    958.2692   2.219             0.028017 *  
#> enginetypeohcf       -766.9703   1580.5946  -0.485             0.628237    
#> enginetypeohcv      -4613.1635   1328.6839  -3.472             0.000681 ***
#> enginetyperotor     10729.9489   1865.8970   5.751         0.0000000508 ***
#> stroke              -4804.8970    888.5169  -5.408         0.0000002569 ***
#> peakrpm                 2.3892      0.6105   3.914             0.000139 ***
#> curbweight              4.0570      1.6188   2.506             0.013307 *  
#> carbodyhardtop      -3136.4699   1559.2275  -2.012             0.046120 *  
#> carbodyhatchback    -4195.9154   1253.7311  -3.347             0.001042 ** 
#> carbodysedan        -3238.5381   1229.3446  -2.634             0.009345 ** 
#> carbodywagon        -4290.4251   1380.4301  -3.108             0.002267 ** 
#> enginelocationrear   9422.0032   2423.3404   3.888             0.000153 ***
#> aspirationturbo      2201.2510    668.3047   3.294             0.001242 ** 
#> highwaympg            183.1757     64.7844   2.827             0.005356 ** 
#> carwidth              563.5526    222.5759   2.532             0.012408 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2635 on 145 degrees of freedom
#> Multiple R-squared:  0.908,  Adjusted R-squared:  0.8966 
#> F-statistic: 79.49 on 18 and 145 DF,  p-value: < 0.00000000000000022

Hasil diatas menunjukan sudah tidak ada lagi koefisien yang bernilai NA, kemudian saya akan coba mengecek kembali dengan ViF untuk memastikan sebagai berikut :

library(car)
vif(carlm_stepwise2)
#>                     GVIF Df GVIF^(1/(2*Df))
#> enginesize     11.411133  1        3.378037
#> enginetype     15.468897  6        1.256382
#> stroke          1.757303  1        1.325633
#> peakrpm         2.034334  1        1.426301
#> curbweight     17.380957  1        4.169048
#> carbody         2.959622  4        1.145261
#> enginelocation  2.491361  1        1.578404
#> aspiration      1.657042  1        1.287262
#> highwaympg      4.859232  1        2.204367
#> carwidth        5.365915  1        2.316444

Hasil diatas menunjukan bahwa masih terdapat Vif yang lebih dari 10, saya akan coba menghilamgkan dari Vif yang paling besar terlebih dahulu.

carlm_stepwise3<-lm(formula = price ~ enginesize + enginetype + stroke + enginelocation + 
    carwidth + aspiration + carbody + peakrpm + highwaympg, data = data_train)

summary(carlm_stepwise3)
#> 
#> Call:
#> lm(formula = price ~ enginesize + enginetype + stroke + enginelocation + 
#>     carwidth + aspiration + carbody + peakrpm + highwaympg, data = data_train)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -10258.3  -1444.1   -261.4   1111.2  12622.1 
#> 
#> Coefficients:
#>                       Estimate  Std. Error t value             Pr(>|t|)    
#> (Intercept)        -62130.2817  13574.3705  -4.577         0.0000100078 ***
#> enginesize            175.7998     13.2902  13.228 < 0.0000000000000002 ***
#> enginetypedohcv       -33.5250   3086.8276  -0.011             0.991349    
#> enginetypel            24.9066   1335.1583   0.019             0.985142    
#> enginetypeohc        1552.0835    947.1139   1.639             0.103418    
#> enginetypeohcf      -2005.6214   1528.2512  -1.312             0.191457    
#> enginetypeohcv      -5505.2896   1303.0607  -4.225         0.0000419059 ***
#> enginetyperotor     11017.1447   1895.7591   5.811         0.0000000374 ***
#> stroke              -4931.2368    902.9873  -5.461         0.0000001987 ***
#> enginelocationrear   9961.9746   2457.0118   4.055         0.0000814275 ***
#> carwidth              867.1053    190.0830   4.562         0.0000106679 ***
#> aspirationturbo      2731.6834    645.2697   4.233         0.0000405186 ***
#> carbodyhardtop      -3731.1246   1568.6935  -2.378             0.018678 *  
#> carbodyhatchback    -4532.3695   1268.8677  -3.572             0.000480 ***
#> carbodysedan        -3280.3077   1251.2664  -2.622             0.009679 ** 
#> carbodywagon        -3538.1621   1371.5554  -2.580             0.010877 *  
#> peakrpm                 2.0652      0.6074   3.400             0.000868 ***
#> highwaympg            113.5338     59.5715   1.906             0.058636 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2682 on 146 degrees of freedom
#> Multiple R-squared:  0.904,  Adjusted R-squared:  0.8928 
#> F-statistic: 80.87 on 17 and 146 DF,  p-value: < 0.00000000000000022
vif(carlm_stepwise3)
#>                     GVIF Df GVIF^(1/(2*Df))
#> enginesize      7.666567  1        2.768857
#> enginetype     11.577440  6        1.226406
#> stroke          1.751647  1        1.323498
#> enginelocation  2.471668  1        1.572154
#> carwidth        3.776955  1        1.943439
#> aspiration      1.490853  1        1.221005
#> carbody         2.223776  4        1.105061
#> peakrpm         1.943091  1        1.393948
#> highwaympg      3.965257  1        1.991295

Karena enginetype masih ada yang lebih besar dari 10 untuk nilai vif nya, saya akan menghilangkan variabel tersebut.

carlm_stepwise4<-lm(formula = price ~ enginesize + stroke + enginelocation + 
    carwidth + aspiration + carbody + peakrpm, data = data_train)

summary(carlm_stepwise4)
#> 
#> Call:
#> lm(formula = price ~ enginesize + stroke + enginelocation + carwidth + 
#>     aspiration + carbody + peakrpm, data = data_train)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -6777.5 -1630.6  -309.3  1298.2 15998.5 
#> 
#> Coefficients:
#>                      Estimate Std. Error t value             Pr(>|t|)    
#> (Intercept)        -78418.081  12180.458  -6.438        0.00000000148 ***
#> enginesize            116.790      9.475  12.326 < 0.0000000000000002 ***
#> stroke              -2029.285    864.337  -2.348              0.02017 *  
#> enginelocationrear  11241.123   2285.116   4.919        0.00000222102 ***
#> carwidth             1164.625    187.348   6.216        0.00000000462 ***
#> aspirationturbo      1694.267    686.472   2.468              0.01469 *  
#> carbodyhardtop      -3094.881   1778.015  -1.741              0.08376 .  
#> carbodyhatchback    -4153.508   1431.839  -2.901              0.00427 ** 
#> carbodysedan        -3211.359   1401.371  -2.292              0.02329 *  
#> carbodywagon        -4405.455   1539.899  -2.861              0.00482 ** 
#> peakrpm                 1.914      0.571   3.353              0.00101 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3120 on 153 degrees of freedom
#> Multiple R-squared:  0.8638, Adjusted R-squared:  0.8549 
#> F-statistic: 97.06 on 10 and 153 DF,  p-value: < 0.00000000000000022
vif(carlm_stepwise4)
#>                    GVIF Df GVIF^(1/(2*Df))
#> enginesize     2.879313  1        1.696854
#> stroke         1.185783  1        1.088936
#> enginelocation 1.579603  1        1.256822
#> carwidth       2.710869  1        1.646472
#> aspiration     1.246677  1        1.116547
#> carbody        1.636313  4        1.063490
#> peakrpm        1.269048  1        1.126520

Performance

library(MLmetrics)
data_train$pred <- predict(carlm_stepwise4, newdata = data_train)
evaluasi_data_train<-data_train %>% 
  select(price, pred)

RMSE(y_pred=evaluasi_data_train$pred, y_true =  evaluasi_data_train$price)
#> [1] 3013.768
MAPE(y_pred = evaluasi_data_train$pred, y_true =  evaluasi_data_train$price)
#> [1] 0.1688136
data_test$pred <- predict(carlm_stepwise4, newdata = data_test)
evaluasi_data_test<-data_test %>% 
  select(price, pred)

RMSE(y_pred=evaluasi_data_test$pred, y_true =  evaluasi_data_test$price)
#> [1] 2466.849
MAPE(y_pred = evaluasi_data_test$pred, y_true =  evaluasi_data_test$price)
#> [1] 0.1524715

Berdasarkan model yang terpilih diatas memiliki adj R Squared sebesar 85 % dan memiliki MAPE sebesar 16% untuk data train sedangkan untuk data test memiliki MAPE sebesar 15%.

Variabel Importance

Selanjutnya saya kaan mencari tahu dari model tersebut variabel mana yang paling berpengaruh dengan menggunakan fungsi berikut :

library(relaimpo)
ins_model4_shapley<-calc.relimp(carlm_stepwise4)
ins_model4_shapley
#> Response variable: price 
#> Total response variance: 67113553 
#> Analysis based on 164 observations 
#> 
#> 10 Regressors: 
#> Some regressors combined in groups: 
#>         Group  carbody : carbodyhardtop carbodyhatchback carbodysedan carbodywagon 
#> 
#>  Relative importance of 7 (groups of) regressors assessed: 
#>  carbody enginesize stroke enginelocation carwidth aspiration peakrpm 
#>  
#> Proportion of variance explained by model: 86.38%
#> Metrics are not normalized (rela=FALSE). 
#> 
#> Relative importance metrics: 
#> 
#>                        lmg
#> carbody        0.063542466
#> enginesize     0.421097664
#> stroke         0.005728808
#> enginelocation 0.070039328
#> carwidth       0.276153620
#> aspiration     0.016263642
#> peakrpm        0.011009459
#> 
#> Average coefficients for different model sizes: 
#> 
#>                         1group       2groups       3groups      4groups
#> enginesize          162.700455   156.6728650   149.1876355   140.980569
#> stroke             1566.967232   439.8045967  -422.9649738 -1068.149680
#> enginelocation    21509.474118 19874.5747269 18098.5121315 16311.580952
#> carwidth           2808.318905  2521.3993360  2245.8740246  1978.800531
#> aspiration         3746.624537  2750.8754147  2103.4901680  1722.263602
#> carbodyhardtop     1890.785714   240.7076402 -1047.4396351 -1988.911125
#> carbodyhatchback -11333.091948 -9874.4609392 -8309.5380607 -6868.928621
#> carbodysedan      -7578.980000 -7072.3112026 -6233.2027498 -5322.486735
#> carbodywagon      -9887.111111 -8978.8155487 -7850.5033758 -6736.726776
#> peakrpm              -1.747306    -0.5323476     0.4019635     1.074713
#>                      5groups      6groups      7groups
#> enginesize         132.57508   124.387167   116.790407
#> stroke           -1530.85799 -1840.322456 -2029.284602
#> enginelocation   14590.16580 12928.653388 11241.123500
#> carwidth          1715.18094  1447.326097  1164.625178
#> aspiration        1545.70391  1538.658055  1694.266897
#> carbodyhardtop   -2623.63612 -2986.344665 -3094.881470
#> carbodyhatchback -5662.40564 -4746.436528 -4153.508115
#> carbodysedan     -4472.50430 -3755.293299 -3211.358753
#> carbodywagon     -5753.76423 -4964.855624 -4405.454822
#> peakrpm              1.51928     1.780429     1.914434
nilai_importance<-ins_model4_shapley$lmg
as.data.frame(nilai_importance) %>% tibble::rownames_to_column("variabel") %>% ggplot(aes(x=reorder(variabel,-nilai_importance), y=nilai_importance))+geom_bar(aes(fill=nilai_importance),stat="identity")+labs(x="variabel")

Berdasarkan hasil diatas enginesize (42%) dan carwidth (27%) merupakan variabel yang paling berpengaruh dalam menentukan harga mobil.

Kesimpulan

Pada tulisan ini , saya mencoba menganalisa data harga mobil menggunakan analisis regresi dan fitur variabel importance. Data tersebut masih terdapat multikolinearitas sehingga sempat ada yang NA untuk nilai koefisien yang dihasilkan. Kemudian saya coba kan satu persatu menghilangkan variabel yang memiliki multikolinearitas tersebut dan dapatkan model terakhir dengan Adj r squared yang bagus sebesar 85%, dengan tingkat kesalahan prediksi di sekitar 15-16%. Pada tulisan ini saya batasi untuk pengecekan asumsi pada multikolinearitas karena permaasalahan yang paling dihadapi menggunakan data ini diawal adalah terdapat multikolinearitas. Saya juga menganalisa dari model yang didapatkkan, variabel engine size dan carwidth merupakan variabel dominan dalam mempengaruhi harga mobil di pasar AS. Semoga Bermanfaat :)

 

A work by Briandamar Kencana

damarbrian@gmail.com