Saya Briandamar Kencana seorang penggemar data. Saya lulus dengan gelar sarjana statistik. Saya memiliki beberapa pengalaman kerja di bidang data, seperti magang di momobil.id, independent research and advisory Indonesia, staf data analis di Alif Aza Asia dan asisten peneliti di Bank Indonesia, dan kemudian saya melanjutkan karir saya sebagai analis di sebuah bank komersial di Indonesia. Selain itu, saya telah berpartisipasi dalam beberapa kursus pelatihan data di Algoritma Data Science School, Coursera, Udemy dan Dicoding.
Data ini merupakan data yang diperoleh dari Kagle yang merupakan data harga pasar mobil di AS.
Covid-19 Data in Indonesia
Sebuah perusahaan mobil Cina Geely Auto ingin memasuki pasar AS dengan mendirikan unit manufaktur mereka di sana dan memproduksi mobil secara lokal untuk memberikan persaingan kepada rekan-rekan mereka di AS dan Eropa. Mereka telah mengontrak perusahaan konsultan mobil untuk memahami faktor-faktor yang menentukan harga mobil. Secara khusus, mereka ingin memahami faktor-faktor yang mempengaruhi harga mobil di pasar Amerika, karena itu mungkin sangat berbeda dari pasar Cina. Perusahaan ingin tahu:
Berdasarkan berbagai survei pasar, perusahaan konsultan telah mengumpulkan kumpulan data besar dari berbagai jenis mobil di pasar Amerika.
Metode yang akan digunakan pada artikel ini adalah analisis regresi. Analisis Regresi adalah metode yang digunakan untuk mengukur pengaruh variabel bebas terhadap variabel tergantung. Analisis Regresi juga bisa digunakan untuk memprediksi variabel tergantung dengan menggunakan variabel bebas. Gujarati (2006) mendefinisikan analisis regresi sebagai kajian terhadap hubungan satu variabel yang disebut sebagai variabel yang diterangkan (the explained variabel) dengan satu atau dua variabel yang menerangkan (the explanatory).
Regresi
Tujuan :
Kami diharuskan untuk memodelkan harga mobil dengan variabel independen yang tersedia. Ini akan digunakan oleh manajemen untuk memahami bagaimana sebenarnya harga bervariasi dengan variabel independen. Dengan demikian mereka dapat memanipulasi desain mobil, strategi bisnis dll untuk memenuhi tingkat harga tertentu. Lebih lanjut, model ini akan menjadi cara yang baik bagi manajemen untuk memahami dinamika harga pasar baru.
library(dplyr)
library(ggplot2)
carprice <- read.csv("CarPrice_Assignment.csv")
rmarkdown::paged_table(carprice)
unique(carprice$CarName)
#> [1] "alfa-romero giulia" "alfa-romero stelvio"
#> [3] "alfa-romero Quadrifoglio" "audi 100 ls"
#> [5] "audi 100ls" "audi fox"
#> [7] "audi 5000" "audi 4000"
#> [9] "audi 5000s (diesel)" "bmw 320i"
#> [11] "bmw x1" "bmw x3"
#> [13] "bmw z4" "bmw x4"
#> [15] "bmw x5" "chevrolet impala"
#> [17] "chevrolet monte carlo" "chevrolet vega 2300"
#> [19] "dodge rampage" "dodge challenger se"
#> [21] "dodge d200" "dodge monaco (sw)"
#> [23] "dodge colt hardtop" "dodge colt (sw)"
#> [25] "dodge coronet custom" "dodge dart custom"
#> [27] "dodge coronet custom (sw)" "honda civic"
#> [29] "honda civic cvcc" "honda accord cvcc"
#> [31] "honda accord lx" "honda civic 1500 gl"
#> [33] "honda accord" "honda civic 1300"
#> [35] "honda prelude" "honda civic (auto)"
#> [37] "isuzu MU-X" "isuzu D-Max "
#> [39] "isuzu D-Max V-Cross" "jaguar xj"
#> [41] "jaguar xf" "jaguar xk"
#> [43] "maxda rx3" "maxda glc deluxe"
#> [45] "mazda rx2 coupe" "mazda rx-4"
#> [47] "mazda glc deluxe" "mazda 626"
#> [49] "mazda glc" "mazda rx-7 gs"
#> [51] "mazda glc 4" "mazda glc custom l"
#> [53] "mazda glc custom" "buick electra 225 custom"
#> [55] "buick century luxus (sw)" "buick century"
#> [57] "buick skyhawk" "buick opel isuzu deluxe"
#> [59] "buick skylark" "buick century special"
#> [61] "buick regal sport coupe (turbo)" "mercury cougar"
#> [63] "mitsubishi mirage" "mitsubishi lancer"
#> [65] "mitsubishi outlander" "mitsubishi g4"
#> [67] "mitsubishi mirage g4" "mitsubishi montero"
#> [69] "mitsubishi pajero" "Nissan versa"
#> [71] "nissan gt-r" "nissan rogue"
#> [73] "nissan latio" "nissan titan"
#> [75] "nissan leaf" "nissan juke"
#> [77] "nissan note" "nissan clipper"
#> [79] "nissan nv200" "nissan dayz"
#> [81] "nissan fuga" "nissan otti"
#> [83] "nissan teana" "nissan kicks"
#> [85] "peugeot 504" "peugeot 304"
#> [87] "peugeot 504 (sw)" "peugeot 604sl"
#> [89] "peugeot 505s turbo diesel" "plymouth fury iii"
#> [91] "plymouth cricket" "plymouth satellite custom (sw)"
#> [93] "plymouth fury gran sedan" "plymouth valiant"
#> [95] "plymouth duster" "porsche macan"
#> [97] "porcshce panamera" "porsche cayenne"
#> [99] "porsche boxter" "renault 12tl"
#> [101] "renault 5 gtl" "saab 99e"
#> [103] "saab 99le" "saab 99gle"
#> [105] "subaru" "subaru dl"
#> [107] "subaru brz" "subaru baja"
#> [109] "subaru r1" "subaru r2"
#> [111] "subaru trezia" "subaru tribeca"
#> [113] "toyota corona mark ii" "toyota corona"
#> [115] "toyota corolla 1200" "toyota corona hardtop"
#> [117] "toyota corolla 1600 (sw)" "toyota carina"
#> [119] "toyota mark ii" "toyota corolla"
#> [121] "toyota corolla liftback" "toyota celica gt liftback"
#> [123] "toyota corolla tercel" "toyota corona liftback"
#> [125] "toyota starlet" "toyota tercel"
#> [127] "toyota cressida" "toyota celica gt"
#> [129] "toyouta tercel" "vokswagen rabbit"
#> [131] "volkswagen 1131 deluxe sedan" "volkswagen model 111"
#> [133] "volkswagen type 3" "volkswagen 411 (sw)"
#> [135] "volkswagen super beetle" "volkswagen dasher"
#> [137] "vw dasher" "vw rabbit"
#> [139] "volkswagen rabbit" "volkswagen rabbit custom"
#> [141] "volvo 145e (sw)" "volvo 144ea"
#> [143] "volvo 244dl" "volvo 245"
#> [145] "volvo 264gl" "volvo diesel"
#> [147] "volvo 246"
carprice$CarName <- gsub("maxda", "mazda", carprice$CarName)
carprice$CarName <- gsub("nissan", "Nissan", carprice$CarName)
carprice$CarName <- gsub("porcshce", "porsche", carprice$CarName)
carprice$CarName <- gsub("toyouta", "toyota", carprice$CarName)
carprice$CarName <- gsub("vokswagen", "volkswagen", carprice$CarName)
carprice$CarName <- gsub("vw", "volkswagen", carprice$CarName)
unique(carprice$CarName)
#> [1] "alfa-romero giulia" "alfa-romero stelvio"
#> [3] "alfa-romero Quadrifoglio" "audi 100 ls"
#> [5] "audi 100ls" "audi fox"
#> [7] "audi 5000" "audi 4000"
#> [9] "audi 5000s (diesel)" "bmw 320i"
#> [11] "bmw x1" "bmw x3"
#> [13] "bmw z4" "bmw x4"
#> [15] "bmw x5" "chevrolet impala"
#> [17] "chevrolet monte carlo" "chevrolet vega 2300"
#> [19] "dodge rampage" "dodge challenger se"
#> [21] "dodge d200" "dodge monaco (sw)"
#> [23] "dodge colt hardtop" "dodge colt (sw)"
#> [25] "dodge coronet custom" "dodge dart custom"
#> [27] "dodge coronet custom (sw)" "honda civic"
#> [29] "honda civic cvcc" "honda accord cvcc"
#> [31] "honda accord lx" "honda civic 1500 gl"
#> [33] "honda accord" "honda civic 1300"
#> [35] "honda prelude" "honda civic (auto)"
#> [37] "isuzu MU-X" "isuzu D-Max "
#> [39] "isuzu D-Max V-Cross" "jaguar xj"
#> [41] "jaguar xf" "jaguar xk"
#> [43] "mazda rx3" "mazda glc deluxe"
#> [45] "mazda rx2 coupe" "mazda rx-4"
#> [47] "mazda 626" "mazda glc"
#> [49] "mazda rx-7 gs" "mazda glc 4"
#> [51] "mazda glc custom l" "mazda glc custom"
#> [53] "buick electra 225 custom" "buick century luxus (sw)"
#> [55] "buick century" "buick skyhawk"
#> [57] "buick opel isuzu deluxe" "buick skylark"
#> [59] "buick century special" "buick regal sport coupe (turbo)"
#> [61] "mercury cougar" "mitsubishi mirage"
#> [63] "mitsubishi lancer" "mitsubishi outlander"
#> [65] "mitsubishi g4" "mitsubishi mirage g4"
#> [67] "mitsubishi montero" "mitsubishi pajero"
#> [69] "Nissan versa" "Nissan gt-r"
#> [71] "Nissan rogue" "Nissan latio"
#> [73] "Nissan titan" "Nissan leaf"
#> [75] "Nissan juke" "Nissan note"
#> [77] "Nissan clipper" "Nissan nv200"
#> [79] "Nissan dayz" "Nissan fuga"
#> [81] "Nissan otti" "Nissan teana"
#> [83] "Nissan kicks" "peugeot 504"
#> [85] "peugeot 304" "peugeot 504 (sw)"
#> [87] "peugeot 604sl" "peugeot 505s turbo diesel"
#> [89] "plymouth fury iii" "plymouth cricket"
#> [91] "plymouth satellite custom (sw)" "plymouth fury gran sedan"
#> [93] "plymouth valiant" "plymouth duster"
#> [95] "porsche macan" "porsche panamera"
#> [97] "porsche cayenne" "porsche boxter"
#> [99] "renault 12tl" "renault 5 gtl"
#> [101] "saab 99e" "saab 99le"
#> [103] "saab 99gle" "subaru"
#> [105] "subaru dl" "subaru brz"
#> [107] "subaru baja" "subaru r1"
#> [109] "subaru r2" "subaru trezia"
#> [111] "subaru tribeca" "toyota corona mark ii"
#> [113] "toyota corona" "toyota corolla 1200"
#> [115] "toyota corona hardtop" "toyota corolla 1600 (sw)"
#> [117] "toyota carina" "toyota mark ii"
#> [119] "toyota corolla" "toyota corolla liftback"
#> [121] "toyota celica gt liftback" "toyota corolla tercel"
#> [123] "toyota corona liftback" "toyota starlet"
#> [125] "toyota tercel" "toyota cressida"
#> [127] "toyota celica gt" "volkswagen rabbit"
#> [129] "volkswagen 1131 deluxe sedan" "volkswagen model 111"
#> [131] "volkswagen type 3" "volkswagen 411 (sw)"
#> [133] "volkswagen super beetle" "volkswagen dasher"
#> [135] "volkswagen rabbit custom" "volvo 145e (sw)"
#> [137] "volvo 144ea" "volvo 244dl"
#> [139] "volvo 245" "volvo 264gl"
#> [141] "volvo diesel" "volvo 246"
summary(carprice)
#> car_ID symboling CarName fueltype
#> Min. : 1 Min. :-2.0000 Length:205 Length:205
#> 1st Qu.: 52 1st Qu.: 0.0000 Class :character Class :character
#> Median :103 Median : 1.0000 Mode :character Mode :character
#> Mean :103 Mean : 0.8341
#> 3rd Qu.:154 3rd Qu.: 2.0000
#> Max. :205 Max. : 3.0000
#> aspiration doornumber carbody drivewheel
#> Length:205 Length:205 Length:205 Length:205
#> Class :character Class :character Class :character Class :character
#> Mode :character Mode :character Mode :character Mode :character
#>
#>
#>
#> enginelocation wheelbase carlength carwidth
#> Length:205 Min. : 86.60 Min. :141.1 Min. :60.30
#> Class :character 1st Qu.: 94.50 1st Qu.:166.3 1st Qu.:64.10
#> Mode :character Median : 97.00 Median :173.2 Median :65.50
#> Mean : 98.76 Mean :174.0 Mean :65.91
#> 3rd Qu.:102.40 3rd Qu.:183.1 3rd Qu.:66.90
#> Max. :120.90 Max. :208.1 Max. :72.30
#> carheight curbweight enginetype cylindernumber
#> Min. :47.80 Min. :1488 Length:205 Length:205
#> 1st Qu.:52.00 1st Qu.:2145 Class :character Class :character
#> Median :54.10 Median :2414 Mode :character Mode :character
#> Mean :53.72 Mean :2556
#> 3rd Qu.:55.50 3rd Qu.:2935
#> Max. :59.80 Max. :4066
#> enginesize fuelsystem boreratio stroke
#> Min. : 61.0 Length:205 Min. :2.54 Min. :2.070
#> 1st Qu.: 97.0 Class :character 1st Qu.:3.15 1st Qu.:3.110
#> Median :120.0 Mode :character Median :3.31 Median :3.290
#> Mean :126.9 Mean :3.33 Mean :3.255
#> 3rd Qu.:141.0 3rd Qu.:3.58 3rd Qu.:3.410
#> Max. :326.0 Max. :3.94 Max. :4.170
#> compressionratio horsepower peakrpm citympg
#> Min. : 7.00 Min. : 48.0 Min. :4150 Min. :13.00
#> 1st Qu.: 8.60 1st Qu.: 70.0 1st Qu.:4800 1st Qu.:19.00
#> Median : 9.00 Median : 95.0 Median :5200 Median :24.00
#> Mean :10.14 Mean :104.1 Mean :5125 Mean :25.22
#> 3rd Qu.: 9.40 3rd Qu.:116.0 3rd Qu.:5500 3rd Qu.:30.00
#> Max. :23.00 Max. :288.0 Max. :6600 Max. :49.00
#> highwaympg price
#> Min. :16.00 Min. : 5118
#> 1st Qu.:25.00 1st Qu.: 7788
#> Median :30.00 Median :10295
#> Mean :30.75 Mean :13277
#> 3rd Qu.:34.00 3rd Qu.:16503
#> Max. :54.00 Max. :45400
str(carprice)
#> 'data.frame': 205 obs. of 26 variables:
#> $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
#> $ CarName : chr "alfa-romero giulia" "alfa-romero stelvio" "alfa-romero Quadrifoglio" "audi 100 ls" ...
#> $ fueltype : chr "gas" "gas" "gas" "gas" ...
#> $ aspiration : chr "std" "std" "std" "std" ...
#> $ doornumber : chr "two" "two" "two" "four" ...
#> $ carbody : chr "convertible" "convertible" "hatchback" "sedan" ...
#> $ drivewheel : chr "rwd" "rwd" "rwd" "fwd" ...
#> $ enginelocation : chr "front" "front" "front" "front" ...
#> $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
#> $ carlength : num 169 169 171 177 177 ...
#> $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
#> $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
#> $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
#> $ enginetype : chr "dohc" "dohc" "ohcv" "ohc" ...
#> $ cylindernumber : chr "four" "four" "six" "four" ...
#> $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
#> $ fuelsystem : chr "mpfi" "mpfi" "mpfi" "mpfi" ...
#> $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
#> $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
#> $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
#> $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
#> $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
#> $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
#> $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
#> $ price : num 13495 16500 16500 13950 17450 ...
library(tidyr)
carprice<-carprice %>%
mutate_if(is.character,as.factor) %>%
separate(
CarName,
into = c("Brand", "Tipe"),
sep = " ",
fill = "right"
) %>%
select(-Tipe) %>%
mutate(Brand=as.factor(Brand)) %>%
select(-car_ID)
rmarkdown::paged_table(carprice)
# Cek nilai NA
colSums(is.na(carprice))
#> symboling Brand fueltype aspiration
#> 0 0 0 0
#> doornumber carbody drivewheel enginelocation
#> 0 0 0 0
#> wheelbase carlength carwidth carheight
#> 0 0 0 0
#> curbweight enginetype cylindernumber enginesize
#> 0 0 0 0
#> fuelsystem boreratio stroke compressionratio
#> 0 0 0 0
#> horsepower peakrpm citympg highwaympg
#> 0 0 0 0
#> price
#> 0
carprice %>%
group_by(Brand) %>%
summarise(counts= n()) %>%
ggplot(aes(x=Brand, y=counts))+
geom_bar(
aes(x = reorder(Brand,-counts), y= counts,fill = Brand),
stat = "identity", position = position_dodge(0.8),
width = 0.7
)+ theme(axis.text.x = element_text(face="bold", color="#993333", size=9, angle=45),panel.background = element_rect(fill = '#252a32'),
panel.grid.minor = element_line(color = '#bab6b8'),
panel.grid.major = element_line(color = '#696868'))+
ggtitle("Brand") +
geom_text(aes(label= counts),vjust=1.0, color="white", size=2.0)
Berdasarkan gambar diatas, brand toyota merupakan brand mobil yang paling banyak terjual, kemudian diikuti dengan brand Nissan, mazda dan honda
carprice %>%
group_by(Brand) %>%
summarise(mean_prices= mean(price)) %>%
arrange(-mean_prices) %>%
ggplot(aes(x = reorder(Brand,-mean_prices),y=mean_prices)) +
geom_bar(
aes(fill = Brand),
stat = "identity", position = position_dodge(0.8),
width = 0.4
) +theme(axis.text.x = element_text(face="bold", color="#993333", size=9, angle=90))+
ggtitle("Plot Rata-rata harga berdasarkan brand")+labs(x="Brand", y="Rata-rata Harga")
Rata-rata harga mobil yang paling mahal ada di brand jaguar kemudian dikuti dengan brand buick,porsche, dan BMW
carprice<-carprice %>% select(-c(Brand,symboling))
library(GGally)
ggcorr(carprice, label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)
Berdasarkan gambar diatas, terlihat bahwa beberapa variabel predictor mempunyai hubungan liner terhadap price namun beberapa juga terindikasi terjadi multikolinearitas.
Saya akan membagi data train dengan proporsi 80% dan data test dengan proporsi 20%dengan menggunakan fungsi dibawah ini.
set.seed(1)
index <- sample(nrow(carprice), nrow(carprice)*0.80)
data_train <- carprice[index,]
data_test <- carprice[-index,]
Setelah data berhasil di bagi menjadi data train dan data test, saya akan memulai dengan memodelkan data train langsung menggunakan metode feature selection stepwise.
set.seed(1)
lm.none <- lm(price~1, data_train)
lm.all <- lm(price~., data_train)
carlm_stepwise <- step(object = lm.none,
scope = list(lower = lm.none, upper = lm.all),
direction = "both")
#> Start: AIC=2956.59
#> price ~ 1
#>
#> Df Sum of Sq RSS AIC
#> + enginesize 1 8264991200 2674518013 2727.6
#> + curbweight 1 7420830084 3518679129 2772.6
#> + horsepower 1 7135120700 3804388513 2785.4
#> + cylindernumber 6 6951498007 3988011207 2803.1
#> + carwidth 1 5930289491 5009219722 2830.5
#> + highwaympg 1 5252791711 5686717502 2851.3
#> + citympg 1 5118986835 5820522378 2855.1
#> + carlength 1 4746240093 6193269120 2865.3
#> + drivewheel 2 4414452931 6525056282 2875.8
#> + boreratio 1 4011037542 6928471672 2883.7
#> + fuelsystem 5 3787940791 7151568422 2896.9
#> + wheelbase 1 2949794228 7989714985 2907.1
#> + enginetype 6 2660975201 8278534012 2922.9
#> + carbody 4 1752989250 9186519963 2935.9
#> + enginelocation 1 1362582691 9576926522 2936.8
#> + aspiration 1 361543375 10577965838 2953.1
#> <none> 10939509213 2956.6
#> + peakrpm 1 115685389 10823823824 2956.8
#> + fueltype 1 87488732 10852020481 2957.3
#> + carheight 1 50058822 10889450391 2957.8
#> + stroke 1 37942915 10901566298 2958.0
#> + compressionratio 1 33463605 10906045608 2958.1
#> + doornumber 1 1280342 10938228872 2958.6
#>
#> Step: AIC=2727.58
#> price ~ enginesize
#>
#> Df Sum of Sq RSS AIC
#> + enginetype 6 708852837 1965665176 2689.1
#> + cylindernumber 6 643904063 2030613950 2694.4
#> + drivewheel 2 407632835 2266885178 2704.5
#> + horsepower 1 364775892 2309742121 2705.5
#> + fuelsystem 5 473854295 2200663719 2705.6
#> + enginelocation 1 338507832 2336010181 2707.4
#> + curbweight 1 324692187 2349825827 2708.3
#> + carwidth 1 287345658 2387172356 2710.9
#> + citympg 1 274272252 2400245761 2711.8
#> + highwaympg 1 240226854 2434291159 2714.1
#> + peakrpm 1 181205153 2493312860 2718.1
#> + stroke 1 144555148 2529962865 2720.5
#> + carlength 1 122718621 2551799393 2721.9
#> + aspiration 1 115424191 2559093822 2722.3
#> + boreratio 1 79787935 2594730079 2724.6
#> + wheelbase 1 56633300 2617884713 2726.1
#> + carbody 4 150321412 2524196602 2726.1
#> <none> 2674518013 2727.6
#> + fueltype 1 23315593 2651202420 2728.1
#> + carheight 1 21278201 2653239813 2728.3
#> + compressionratio 1 20503572 2654014442 2728.3
#> + doornumber 1 4794717 2669723297 2729.3
#> - enginesize 1 8264991200 10939509213 2956.6
#>
#> Step: AIC=2689.07
#> price ~ enginesize + enginetype
#>
#> Df Sum of Sq RSS AIC
#> + cylindernumber 5 396735336 1568929840 2662.1
#> + stroke 1 218197421 1747467755 2671.8
#> + enginelocation 1 173308915 1792356262 2675.9
#> + horsepower 1 168716802 1796948374 2676.4
#> + curbweight 1 153152636 1812512541 2677.8
#> + peakrpm 1 146310269 1819354908 2678.4
#> + carwidth 1 134479127 1831186049 2679.4
#> + fuelsystem 5 201114272 1764550905 2681.4
#> + drivewheel 2 112318730 1853346447 2683.4
#> + aspiration 1 86989469 1878675708 2683.7
#> + carbody 4 146895483 1818769693 2684.3
#> + citympg 1 55630079 1910035097 2686.4
#> + highwaympg 1 54556918 1911108259 2686.5
#> + carheight 1 54524478 1911140698 2686.5
#> + carlength 1 33000085 1932665092 2688.3
#> <none> 1965665176 2689.1
#> + wheelbase 1 11837558 1953827618 2690.1
#> + boreratio 1 11643525 1954021652 2690.1
#> + fueltype 1 9803681 1955861495 2690.2
#> + compressionratio 1 8912271 1956752906 2690.3
#> + doornumber 1 2774973 1962890203 2690.8
#> - enginetype 6 708852837 2674518013 2727.6
#> - enginesize 1 6312868835 8278534012 2922.9
#>
#> Step: AIC=2662.1
#> price ~ enginesize + enginetype + cylindernumber
#>
#> Df Sum of Sq RSS AIC
#> + horsepower 1 273017595 1295912245 2632.8
#> + stroke 1 247760370 1321169470 2635.9
#> + curbweight 1 196273462 1372656378 2642.2
#> + peakrpm 1 138950003 1429979837 2648.9
#> + enginelocation 1 134563326 1434366514 2649.4
#> + drivewheel 2 125388591 1443541249 2652.4
#> + aspiration 1 106974563 1461955277 2652.5
#> + fuelsystem 5 174432069 1394497771 2652.8
#> + carwidth 1 100558558 1468371282 2653.2
#> + citympg 1 82192458 1486737382 2655.3
#> + highwaympg 1 58404436 1510525404 2657.9
#> + carbody 4 109474446 1459455394 2658.2
#> + boreratio 1 49715856 1519213984 2658.8
#> + carlength 1 37878000 1531051840 2660.1
#> + carheight 1 19707730 1549222110 2662.0
#> <none> 1568929840 2662.1
#> + doornumber 1 4765451 1564164389 2663.6
#> + compressionratio 1 4155616 1564774224 2663.7
#> + fueltype 1 3255069 1565674771 2663.8
#> + wheelbase 1 494742 1568435098 2664.1
#> - cylindernumber 5 396735336 1965665176 2689.1
#> - enginetype 5 461684110 2030613950 2694.4
#> - enginesize 1 1577749669 3146679509 2774.2
#>
#> Step: AIC=2632.75
#> price ~ enginesize + enginetype + cylindernumber + horsepower
#>
#> Df Sum of Sq RSS AIC
#> + fuelsystem 5 213863249 1082048996 2613.2
#> + stroke 1 157084595 1138827649 2613.6
#> + compressionratio 1 128390790 1167521455 2617.6
#> + curbweight 1 112572079 1183340166 2619.8
#> + fueltype 1 102141265 1193770980 2621.3
#> + carbody 4 132157108 1163755137 2623.1
#> + carwidth 1 59232649 1236679595 2627.1
#> + drivewheel 2 69146545 1226765700 2627.8
#> + enginelocation 1 47965307 1247946938 2628.6
#> + carheight 1 34710761 1261201484 2630.3
#> + carlength 1 30427266 1265484979 2630.8
#> <none> 1295912245 2632.8
#> + wheelbase 1 12710719 1283201526 2633.1
#> + aspiration 1 12590352 1283321893 2633.2
#> + peakrpm 1 10379960 1285532285 2633.4
#> + highwaympg 1 9750803 1286161442 2633.5
#> + boreratio 1 5919919 1289992325 2634.0
#> + citympg 1 5853206 1290059038 2634.0
#> + doornumber 1 3991043 1291921202 2634.2
#> - horsepower 1 273017595 1568929840 2662.1
#> - enginesize 1 372840917 1668753162 2672.2
#> - cylindernumber 5 501036129 1796948374 2676.4
#> - enginetype 5 533819009 1829731254 2679.3
#>
#> Step: AIC=2613.17
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem
#>
#> Df Sum of Sq RSS AIC
#> + stroke 1 140678263 941370733 2592.3
#> + curbweight 1 50730963 1031318034 2607.3
#> + carbody 4 86890787 995158209 2607.4
#> + peakrpm 1 42534077 1039514919 2608.6
#> + enginelocation 1 27554626 1054494370 2610.9
#> + compressionratio 1 17272616 1064776381 2612.5
#> <none> 1082048996 2613.2
#> + carwidth 1 12805578 1069243419 2613.2
#> + drivewheel 2 23886207 1058162790 2613.5
#> + citympg 1 4661662 1077387334 2614.5
#> + carlength 1 2509043 1079539954 2614.8
#> + carheight 1 1254573 1080794423 2615.0
#> + boreratio 1 588198 1081460798 2615.1
#> + highwaympg 1 480891 1081568106 2615.1
#> + aspiration 1 284797 1081764199 2615.1
#> + wheelbase 1 1632 1082047364 2615.2
#> + doornumber 1 177 1082048819 2615.2
#> - fuelsystem 5 213863249 1295912245 2632.8
#> - enginesize 1 196905072 1278954068 2638.6
#> - horsepower 1 312448775 1394497771 2652.8
#> - cylindernumber 5 487221643 1569270639 2664.1
#> - enginetype 5 560219831 1642268827 2671.6
#>
#> Step: AIC=2592.33
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke
#>
#> Df Sum of Sq RSS AIC
#> + peakrpm 1 61826746 879543987 2583.2
#> + carbody 4 72451418 868919315 2587.2
#> + enginelocation 1 30212317 911158416 2589.0
#> + curbweight 1 24528103 916842630 2590.0
#> <none> 941370733 2592.3
#> + carwidth 1 10510072 930860661 2592.5
#> + aspiration 1 4296165 937074568 2593.6
#> + boreratio 1 2517163 938853570 2593.9
#> + compressionratio 1 2207814 939162919 2593.9
#> + citympg 1 610659 940760074 2594.2
#> + wheelbase 1 500410 940870323 2594.2
#> + doornumber 1 119544 941251189 2594.3
#> + carlength 1 107297 941263436 2594.3
#> + highwaympg 1 94986 941275747 2594.3
#> + carheight 1 68611 941302122 2594.3
#> + drivewheel 2 4023076 937347657 2595.6
#> - stroke 1 140678263 1082048996 2613.2
#> - fuelsystem 5 197456916 1138827649 2613.6
#> - horsepower 1 256720021 1198090754 2629.9
#> - enginesize 1 320826141 1262196874 2638.4
#> - cylindernumber 5 476157667 1417528400 2649.5
#> - enginetype 5 541651000 1483021733 2656.9
#>
#> Step: AIC=2583.19
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke + peakrpm
#>
#> Df Sum of Sq RSS AIC
#> + curbweight 1 45771282 833772705 2576.4
#> + carbody 4 68754502 810789485 2577.8
#> + aspiration 1 31857289 847686698 2579.1
#> + carwidth 1 19378638 860165349 2581.5
#> <none> 879543987 2583.2
#> + enginelocation 1 8394366 871149621 2583.6
#> + compressionratio 1 2356091 877187896 2584.8
#> + doornumber 1 1454992 878088995 2584.9
#> + citympg 1 1037238 878506749 2585.0
#> + carheight 1 792329 878751658 2585.0
#> + carlength 1 654141 878889846 2585.1
#> + wheelbase 1 176886 879367101 2585.2
#> + highwaympg 1 112498 879431490 2585.2
#> + boreratio 1 29594 879514393 2585.2
#> + drivewheel 2 651626 878892361 2587.1
#> - peakrpm 1 61826746 941370733 2592.3
#> - horsepower 1 121787705 1001331692 2602.5
#> - stroke 1 159970932 1039514919 2608.6
#> - fuelsystem 5 237287221 1116831208 2612.4
#> - enginesize 1 379443776 1258987763 2640.0
#> - cylindernumber 5 445347305 1324891292 2640.4
#> - enginetype 5 544576119 1424120106 2652.2
#>
#> Step: AIC=2576.42
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke + peakrpm + curbweight
#>
#> Df Sum of Sq RSS AIC
#> + carbody 4 76409403 757363302 2568.7
#> + enginelocation 1 37599633 796173072 2570.9
#> + carlength 1 21724691 812048014 2574.1
#> + highwaympg 1 16537703 817235002 2575.1
#> + wheelbase 1 16503291 817269415 2575.1
#> + aspiration 1 16459528 817313178 2575.2
#> <none> 833772705 2576.4
#> + citympg 1 6603893 827168812 2577.1
#> + carheight 1 5844545 827928160 2577.3
#> + doornumber 1 3695899 830076806 2577.7
#> + carwidth 1 2432602 831340103 2577.9
#> + drivewheel 2 11263994 822508711 2578.2
#> + boreratio 1 1085443 832687262 2578.2
#> + compressionratio 1 193412 833579293 2578.4
#> - curbweight 1 45771282 879543987 2583.2
#> - horsepower 1 81291920 915064625 2589.7
#> - peakrpm 1 83069925 916842630 2590.0
#> - stroke 1 127733323 961506028 2597.8
#> - fuelsystem 5 179555406 1013328111 2598.4
#> - enginesize 1 184792960 1018565665 2607.2
#> - cylindernumber 5 444245942 1278018648 2636.5
#> - enginetype 5 573696769 1407469474 2652.3
#>
#> Step: AIC=2568.66
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke + peakrpm + curbweight + carbody
#>
#> Df Sum of Sq RSS AIC
#> + enginelocation 1 23877034 733486268 2565.4
#> + aspiration 1 21275290 736088013 2566.0
#> + highwaympg 1 17415047 739948255 2566.8
#> + citympg 1 11411423 745951879 2568.2
#> + carlength 1 10517499 746845804 2568.4
#> <none> 757363302 2568.7
#> + carwidth 1 9163362 748199940 2568.7
#> + doornumber 1 4836478 752526824 2569.6
#> + boreratio 1 2390917 754972385 2570.1
#> + compressionratio 1 930531 756432771 2570.5
#> + wheelbase 1 720387 756642916 2570.5
#> + carheight 1 40741 757322561 2570.7
#> + drivewheel 2 6107167 751256135 2571.3
#> - carbody 4 76409403 833772705 2576.4
#> - curbweight 1 53426183 810789485 2577.8
#> - horsepower 1 73542247 830905549 2581.9
#> - peakrpm 1 80620847 837984149 2583.2
#> - fuelsystem 5 144197733 901561035 2587.2
#> - stroke 1 102731259 860094561 2587.5
#> - enginesize 1 118416249 875779551 2590.5
#> - cylindernumber 5 393442907 1150806209 2627.3
#> - enginetype 5 529436585 1286799887 2645.6
#>
#> Step: AIC=2565.41
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke + peakrpm + curbweight + carbody + enginelocation
#>
#> Df Sum of Sq RSS AIC
#> + aspiration 1 35698921 697787347 2559.2
#> + carwidth 1 13731321 719754947 2564.3
#> + highwaympg 1 12287973 721198295 2564.6
#> + citympg 1 10442137 723044131 2565.1
#> + carlength 1 10236669 723249599 2565.1
#> <none> 733486268 2565.4
#> + doornumber 1 8297544 725188723 2565.5
#> + compressionratio 1 6033311 727452957 2566.1
#> + carheight 1 1098016 732388252 2567.2
#> + boreratio 1 861583 732624685 2567.2
#> + wheelbase 1 759 733485509 2567.4
#> + drivewheel 2 3645441 729840827 2568.6
#> - enginelocation 1 23877034 757363302 2568.7
#> - carbody 4 62686805 796173072 2570.9
#> - horsepower 1 40790727 774276995 2572.3
#> - peakrpm 1 48012429 781498697 2573.8
#> - fuelsystem 5 89838187 823324455 2574.4
#> - curbweight 1 71098725 804584993 2578.6
#> - stroke 1 98126452 831612720 2584.0
#> - enginesize 1 101762823 835249091 2584.7
#> - cylindernumber 5 366118397 1099604665 2621.8
#> - enginetype 5 398339258 1131825526 2626.6
#>
#> Step: AIC=2559.22
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke + peakrpm + curbweight + carbody + enginelocation +
#> aspiration
#>
#> Df Sum of Sq RSS AIC
#> + highwaympg 1 19859654 677927693 2556.5
#> + carwidth 1 17847579 679939768 2557.0
#> - horsepower 1 351796 698139143 2557.3
#> + citympg 1 10235364 687551983 2558.8
#> <none> 697787347 2559.2
#> + doornumber 1 5715287 692072060 2559.9
#> + carlength 1 3457215 694330132 2560.4
#> + compressionratio 1 639151 697148196 2561.1
#> + carheight 1 561863 697225484 2561.1
#> + wheelbase 1 166574 697620773 2561.2
#> + boreratio 1 24 697787323 2561.2
#> + drivewheel 2 6213609 691573738 2561.8
#> - fuelsystem 5 56972545 754759892 2562.1
#> - carbody 4 62100903 759888250 2565.2
#> - aspiration 1 35698921 733486268 2565.4
#> - enginelocation 1 38300666 736088013 2566.0
#> - curbweight 1 54780788 752568135 2569.6
#> - peakrpm 1 68903103 766690450 2572.7
#> - stroke 1 120127519 817914865 2583.3
#> - enginesize 1 137326654 835114001 2586.7
#> - cylindernumber 5 309305195 1007092542 2609.4
#> - enginetype 5 392490089 1090277436 2622.4
#>
#> Step: AIC=2556.49
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> fuelsystem + stroke + peakrpm + curbweight + carbody + enginelocation +
#> aspiration + highwaympg
#>
#> Df Sum of Sq RSS AIC
#> - fuelsystem 5 30996655 708924348 2553.8
#> - horsepower 1 1160518 679088211 2554.8
#> + carwidth 1 15101336 662826356 2554.8
#> <none> 677927693 2556.5
#> + doornumber 1 6134196 671793497 2557.0
#> + compressionratio 1 4698942 673228751 2557.3
#> + citympg 1 1464255 676463438 2558.1
#> + carlength 1 694005 677233687 2558.3
#> + boreratio 1 180279 677747414 2558.4
#> + carheight 1 20895 677906797 2558.5
#> + wheelbase 1 599 677927094 2558.5
#> + drivewheel 2 6962027 670965666 2558.8
#> - highwaympg 1 19859654 697787347 2559.2
#> - enginelocation 1 32312933 710240625 2562.1
#> - carbody 4 63140267 741067959 2563.1
#> - aspiration 1 43270602 721198295 2564.6
#> - curbweight 1 72927964 750855657 2571.2
#> - peakrpm 1 76961677 754889370 2572.1
#> - stroke 1 121744957 799672650 2581.6
#> - enginesize 1 138576327 816504020 2585.0
#> - cylindernumber 5 305807534 983735227 2607.6
#> - enginetype 5 389141577 1067069269 2620.9
#>
#> Step: AIC=2553.82
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> stroke + peakrpm + curbweight + carbody + enginelocation +
#> aspiration + highwaympg
#>
#> Df Sum of Sq RSS AIC
#> + carwidth 1 22865581 686058767 2550.4
#> - horsepower 1 983309 709907657 2552.1
#> <none> 708924348 2553.8
#> + doornumber 1 5502922 703421426 2554.5
#> + carheight 1 2602552 706321796 2555.2
#> + wheelbase 1 2325378 706598970 2555.3
#> + fueltype 1 2271928 706652420 2555.3
#> + compressionratio 1 2266198 706658150 2555.3
#> + boreratio 1 1586958 707337390 2555.4
#> + citympg 1 109545 708814803 2555.8
#> + carlength 1 101289 708823059 2555.8
#> + drivewheel 2 7101361 701822987 2556.2
#> + fuelsystem 5 30996655 677927693 2556.5
#> - enginelocation 1 40211349 749135697 2560.9
#> - highwaympg 1 45835544 754759892 2562.1
#> - carbody 4 75240207 784164555 2562.4
#> - aspiration 1 57048982 765973330 2564.5
#> - peakrpm 1 88798200 797722548 2571.2
#> - curbweight 1 115605233 824529581 2576.6
#> - enginesize 1 141124970 850049318 2581.6
#> - stroke 1 153242774 862167122 2583.9
#> - cylindernumber 5 329923738 1038848086 2606.5
#> - enginetype 5 381019239 1089943587 2614.4
#>
#> Step: AIC=2550.44
#> price ~ enginesize + enginetype + cylindernumber + horsepower +
#> stroke + peakrpm + curbweight + carbody + enginelocation +
#> aspiration + highwaympg + carwidth
#>
#> Df Sum of Sq RSS AIC
#> - horsepower 1 92418 686151185 2548.5
#> <none> 686058767 2550.4
#> + doornumber 1 6595626 679463140 2550.9
#> + carlength 1 3811778 682246988 2551.5
#> + carheight 1 1165546 684893220 2552.2
#> + fueltype 1 816471 685242296 2552.2
#> + boreratio 1 472922 685585845 2552.3
#> + wheelbase 1 403908 685654858 2552.3
#> + compressionratio 1 355416 685703350 2552.4
#> + citympg 1 348838 685709928 2552.4
#> + drivewheel 2 2815501 683243266 2553.8
#> - carwidth 1 22865581 708924348 2553.8
#> + fuelsystem 5 23232410 662826356 2554.8
#> - highwaympg 1 36800737 722859504 2557.0
#> - enginelocation 1 48431713 734490480 2559.6
#> - curbweight 1 48644219 734702986 2559.7
#> - carbody 4 79287534 765346301 2560.4
#> - aspiration 1 63803023 749861789 2563.0
#> - peakrpm 1 91382621 777441388 2568.9
#> - enginesize 1 145264872 831323638 2579.9
#> - stroke 1 155310978 841369744 2581.9
#> - cylindernumber 5 313744258 999803024 2602.2
#> - enginetype 5 390727930 1076786697 2614.4
#>
#> Step: AIC=2548.47
#> price ~ enginesize + enginetype + cylindernumber + stroke + peakrpm +
#> curbweight + carbody + enginelocation + aspiration + highwaympg +
#> carwidth
#>
#> Df Sum of Sq RSS AIC
#> <none> 686151185 2548.5
#> + doornumber 1 6655136 679496049 2548.9
#> + carlength 1 3813832 682337353 2549.6
#> + carheight 1 1140456 685010728 2550.2
#> + boreratio 1 556023 685595162 2550.3
#> + wheelbase 1 474898 685676287 2550.3
#> + fueltype 1 454811 685696374 2550.4
#> + citympg 1 434647 685716538 2550.4
#> + compressionratio 1 147894 686003290 2550.4
#> + horsepower 1 92418 686058767 2550.4
#> + drivewheel 2 2838321 683312863 2551.8
#> - carwidth 1 23756472 709907657 2552.1
#> + fuelsystem 5 22976766 663174418 2552.9
#> - highwaympg 1 45947909 732099093 2557.1
#> - curbweight 1 48556451 734707635 2557.7
#> - carbody 4 79204795 765355980 2558.4
#> - enginelocation 1 54507088 740658273 2559.0
#> - aspiration 1 92388525 778539710 2567.2
#> - peakrpm 1 122853569 809004753 2573.5
#> - stroke 1 169740570 855891755 2582.7
#> - enginesize 1 192847357 878998542 2587.1
#> - cylindernumber 5 320464121 1006615306 2601.3
#> - enginetype 5 391148835 1077300019 2612.4
summary(carlm_stepwise)
#>
#> Call:
#> lm(formula = price ~ enginesize + enginetype + cylindernumber +
#> stroke + peakrpm + curbweight + carbody + enginelocation +
#> aspiration + highwaympg + carwidth, data = data_train)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5623 -1095 0 1124 10118
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -37509.8120 13875.8552 -2.703 0.007717 **
#> enginesize 126.1726 20.1143 6.273 0.00000000414 ***
#> enginetypedohcv -8632.6313 3139.2174 -2.750 0.006748 **
#> enginetypel 495.5490 1249.4501 0.397 0.692256
#> enginetypeohc 3240.0641 852.9449 3.799 0.000216 ***
#> enginetypeohcf 304.0710 1372.3137 0.222 0.824967
#> enginetypeohcv -6940.6205 1198.3983 -5.792 0.00000004402 ***
#> enginetyperotor -1844.8098 3520.2444 -0.524 0.601067
#> cylindernumberfive -10609.8721 2673.8108 -3.968 0.000115 ***
#> cylindernumberfour -12573.3466 2610.1193 -4.817 0.00000373654 ***
#> cylindernumbersix -8336.3775 2121.5934 -3.929 0.000133 ***
#> cylindernumberthree -5684.2351 3911.4000 -1.453 0.148393
#> cylindernumbertwelve -13382.3405 2904.4368 -4.608 0.00000907684 ***
#> cylindernumbertwo NA NA NA NA
#> stroke -5005.8464 850.6102 -5.885 0.00000002805 ***
#> peakrpm 2.6347 0.5262 5.007 0.00000163876 ***
#> curbweight 4.7769 1.5176 3.148 0.002012 **
#> carbodyhardtop -3108.0866 1330.5722 -2.336 0.020916 *
#> carbodyhatchback -3723.8762 1134.3907 -3.283 0.001298 **
#> carbodysedan -2784.2683 1114.1853 -2.499 0.013613 *
#> carbodywagon -3788.1228 1213.5771 -3.121 0.002187 **
#> enginelocationrear 7476.3848 2241.8732 3.335 0.001092 **
#> aspirationturbo 2547.3549 586.7138 4.342 0.00002693543 ***
#> highwaympg 173.5071 56.6670 3.062 0.002638 **
#> carwidth 485.9678 220.7305 2.202 0.029328 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2214 on 140 degrees of freedom
#> Multiple R-squared: 0.9373, Adjusted R-squared: 0.927
#> F-statistic: 90.96 on 23 and 140 DF, p-value: < 0.00000000000000022
Hasil diatas menunjukan terdapat koefisien yang bernilai NA, hal ini dijelaskan pada warning diatas karena singularitas atau tidak memiliki determinan untuk cylindernumbertwo. Hal tersebut bisa terjadi karena terdapat multikolinearitas antara variabel tersebut dengan lainnya. Saya akan coba menghilangkan variabel cylindernumber
carprice<-carprice %>% select(enginesize,enginetype,
stroke,peakrpm,curbweight,carbody ,enginelocation,
aspiration,highwaympg,carwidth,price)
set.seed(1)
index <- sample(nrow(carprice), nrow(carprice)*0.80)
data_train <- carprice[index,]
data_test <- carprice[-index,]
set.seed(1)
carlm_stepwise2<-lm(formula = price ~ ., data = data_train)
summary(carlm_stepwise2)
#>
#> Call:
#> lm(formula = price ~ ., data = data_train)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -9297 -1348 -94 1252 12161
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -54317.8101 13694.8150 -3.966 0.000114 ***
#> enginesize 152.9313 15.9287 9.601 < 0.0000000000000002 ***
#> enginetypedohcv 544.7787 3041.2348 0.179 0.858085
#> enginetypel -205.6431 1314.8680 -0.156 0.875937
#> enginetypeohc 2126.7167 958.2692 2.219 0.028017 *
#> enginetypeohcf -766.9703 1580.5946 -0.485 0.628237
#> enginetypeohcv -4613.1635 1328.6839 -3.472 0.000681 ***
#> enginetyperotor 10729.9489 1865.8970 5.751 0.0000000508 ***
#> stroke -4804.8970 888.5169 -5.408 0.0000002569 ***
#> peakrpm 2.3892 0.6105 3.914 0.000139 ***
#> curbweight 4.0570 1.6188 2.506 0.013307 *
#> carbodyhardtop -3136.4699 1559.2275 -2.012 0.046120 *
#> carbodyhatchback -4195.9154 1253.7311 -3.347 0.001042 **
#> carbodysedan -3238.5381 1229.3446 -2.634 0.009345 **
#> carbodywagon -4290.4251 1380.4301 -3.108 0.002267 **
#> enginelocationrear 9422.0032 2423.3404 3.888 0.000153 ***
#> aspirationturbo 2201.2510 668.3047 3.294 0.001242 **
#> highwaympg 183.1757 64.7844 2.827 0.005356 **
#> carwidth 563.5526 222.5759 2.532 0.012408 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2635 on 145 degrees of freedom
#> Multiple R-squared: 0.908, Adjusted R-squared: 0.8966
#> F-statistic: 79.49 on 18 and 145 DF, p-value: < 0.00000000000000022
Hasil diatas menunjukan sudah tidak ada lagi koefisien yang bernilai NA, kemudian saya akan coba mengecek kembali dengan ViF untuk memastikan sebagai berikut :
library(car)
vif(carlm_stepwise2)
#> GVIF Df GVIF^(1/(2*Df))
#> enginesize 11.411133 1 3.378037
#> enginetype 15.468897 6 1.256382
#> stroke 1.757303 1 1.325633
#> peakrpm 2.034334 1 1.426301
#> curbweight 17.380957 1 4.169048
#> carbody 2.959622 4 1.145261
#> enginelocation 2.491361 1 1.578404
#> aspiration 1.657042 1 1.287262
#> highwaympg 4.859232 1 2.204367
#> carwidth 5.365915 1 2.316444
Hasil diatas menunjukan bahwa masih terdapat Vif yang lebih dari 10, saya akan coba menghilamgkan dari Vif yang paling besar terlebih dahulu.
carlm_stepwise3<-lm(formula = price ~ enginesize + enginetype + stroke + enginelocation +
carwidth + aspiration + carbody + peakrpm + highwaympg, data = data_train)
summary(carlm_stepwise3)
#>
#> Call:
#> lm(formula = price ~ enginesize + enginetype + stroke + enginelocation +
#> carwidth + aspiration + carbody + peakrpm + highwaympg, data = data_train)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -10258.3 -1444.1 -261.4 1111.2 12622.1
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -62130.2817 13574.3705 -4.577 0.0000100078 ***
#> enginesize 175.7998 13.2902 13.228 < 0.0000000000000002 ***
#> enginetypedohcv -33.5250 3086.8276 -0.011 0.991349
#> enginetypel 24.9066 1335.1583 0.019 0.985142
#> enginetypeohc 1552.0835 947.1139 1.639 0.103418
#> enginetypeohcf -2005.6214 1528.2512 -1.312 0.191457
#> enginetypeohcv -5505.2896 1303.0607 -4.225 0.0000419059 ***
#> enginetyperotor 11017.1447 1895.7591 5.811 0.0000000374 ***
#> stroke -4931.2368 902.9873 -5.461 0.0000001987 ***
#> enginelocationrear 9961.9746 2457.0118 4.055 0.0000814275 ***
#> carwidth 867.1053 190.0830 4.562 0.0000106679 ***
#> aspirationturbo 2731.6834 645.2697 4.233 0.0000405186 ***
#> carbodyhardtop -3731.1246 1568.6935 -2.378 0.018678 *
#> carbodyhatchback -4532.3695 1268.8677 -3.572 0.000480 ***
#> carbodysedan -3280.3077 1251.2664 -2.622 0.009679 **
#> carbodywagon -3538.1621 1371.5554 -2.580 0.010877 *
#> peakrpm 2.0652 0.6074 3.400 0.000868 ***
#> highwaympg 113.5338 59.5715 1.906 0.058636 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2682 on 146 degrees of freedom
#> Multiple R-squared: 0.904, Adjusted R-squared: 0.8928
#> F-statistic: 80.87 on 17 and 146 DF, p-value: < 0.00000000000000022
vif(carlm_stepwise3)
#> GVIF Df GVIF^(1/(2*Df))
#> enginesize 7.666567 1 2.768857
#> enginetype 11.577440 6 1.226406
#> stroke 1.751647 1 1.323498
#> enginelocation 2.471668 1 1.572154
#> carwidth 3.776955 1 1.943439
#> aspiration 1.490853 1 1.221005
#> carbody 2.223776 4 1.105061
#> peakrpm 1.943091 1 1.393948
#> highwaympg 3.965257 1 1.991295
Karena enginetype masih ada yang lebih besar dari 10 untuk nilai vif nya, saya akan menghilangkan variabel tersebut.
carlm_stepwise4<-lm(formula = price ~ enginesize + stroke + enginelocation +
carwidth + aspiration + carbody + peakrpm, data = data_train)
summary(carlm_stepwise4)
#>
#> Call:
#> lm(formula = price ~ enginesize + stroke + enginelocation + carwidth +
#> aspiration + carbody + peakrpm, data = data_train)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -6777.5 -1630.6 -309.3 1298.2 15998.5
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -78418.081 12180.458 -6.438 0.00000000148 ***
#> enginesize 116.790 9.475 12.326 < 0.0000000000000002 ***
#> stroke -2029.285 864.337 -2.348 0.02017 *
#> enginelocationrear 11241.123 2285.116 4.919 0.00000222102 ***
#> carwidth 1164.625 187.348 6.216 0.00000000462 ***
#> aspirationturbo 1694.267 686.472 2.468 0.01469 *
#> carbodyhardtop -3094.881 1778.015 -1.741 0.08376 .
#> carbodyhatchback -4153.508 1431.839 -2.901 0.00427 **
#> carbodysedan -3211.359 1401.371 -2.292 0.02329 *
#> carbodywagon -4405.455 1539.899 -2.861 0.00482 **
#> peakrpm 1.914 0.571 3.353 0.00101 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3120 on 153 degrees of freedom
#> Multiple R-squared: 0.8638, Adjusted R-squared: 0.8549
#> F-statistic: 97.06 on 10 and 153 DF, p-value: < 0.00000000000000022
vif(carlm_stepwise4)
#> GVIF Df GVIF^(1/(2*Df))
#> enginesize 2.879313 1 1.696854
#> stroke 1.185783 1 1.088936
#> enginelocation 1.579603 1 1.256822
#> carwidth 2.710869 1 1.646472
#> aspiration 1.246677 1 1.116547
#> carbody 1.636313 4 1.063490
#> peakrpm 1.269048 1 1.126520
library(MLmetrics)
data_train$pred <- predict(carlm_stepwise4, newdata = data_train)
evaluasi_data_train<-data_train %>%
select(price, pred)
RMSE(y_pred=evaluasi_data_train$pred, y_true = evaluasi_data_train$price)
#> [1] 3013.768
MAPE(y_pred = evaluasi_data_train$pred, y_true = evaluasi_data_train$price)
#> [1] 0.1688136
data_test$pred <- predict(carlm_stepwise4, newdata = data_test)
evaluasi_data_test<-data_test %>%
select(price, pred)
RMSE(y_pred=evaluasi_data_test$pred, y_true = evaluasi_data_test$price)
#> [1] 2466.849
MAPE(y_pred = evaluasi_data_test$pred, y_true = evaluasi_data_test$price)
#> [1] 0.1524715
Berdasarkan model yang terpilih diatas memiliki adj R Squared sebesar 85 % dan memiliki MAPE sebesar 16% untuk data train sedangkan untuk data test memiliki MAPE sebesar 15%.
Selanjutnya saya kaan mencari tahu dari model tersebut variabel mana yang paling berpengaruh dengan menggunakan fungsi berikut :
library(relaimpo)
ins_model4_shapley<-calc.relimp(carlm_stepwise4)
ins_model4_shapley
#> Response variable: price
#> Total response variance: 67113553
#> Analysis based on 164 observations
#>
#> 10 Regressors:
#> Some regressors combined in groups:
#> Group carbody : carbodyhardtop carbodyhatchback carbodysedan carbodywagon
#>
#> Relative importance of 7 (groups of) regressors assessed:
#> carbody enginesize stroke enginelocation carwidth aspiration peakrpm
#>
#> Proportion of variance explained by model: 86.38%
#> Metrics are not normalized (rela=FALSE).
#>
#> Relative importance metrics:
#>
#> lmg
#> carbody 0.063542466
#> enginesize 0.421097664
#> stroke 0.005728808
#> enginelocation 0.070039328
#> carwidth 0.276153620
#> aspiration 0.016263642
#> peakrpm 0.011009459
#>
#> Average coefficients for different model sizes:
#>
#> 1group 2groups 3groups 4groups
#> enginesize 162.700455 156.6728650 149.1876355 140.980569
#> stroke 1566.967232 439.8045967 -422.9649738 -1068.149680
#> enginelocation 21509.474118 19874.5747269 18098.5121315 16311.580952
#> carwidth 2808.318905 2521.3993360 2245.8740246 1978.800531
#> aspiration 3746.624537 2750.8754147 2103.4901680 1722.263602
#> carbodyhardtop 1890.785714 240.7076402 -1047.4396351 -1988.911125
#> carbodyhatchback -11333.091948 -9874.4609392 -8309.5380607 -6868.928621
#> carbodysedan -7578.980000 -7072.3112026 -6233.2027498 -5322.486735
#> carbodywagon -9887.111111 -8978.8155487 -7850.5033758 -6736.726776
#> peakrpm -1.747306 -0.5323476 0.4019635 1.074713
#> 5groups 6groups 7groups
#> enginesize 132.57508 124.387167 116.790407
#> stroke -1530.85799 -1840.322456 -2029.284602
#> enginelocation 14590.16580 12928.653388 11241.123500
#> carwidth 1715.18094 1447.326097 1164.625178
#> aspiration 1545.70391 1538.658055 1694.266897
#> carbodyhardtop -2623.63612 -2986.344665 -3094.881470
#> carbodyhatchback -5662.40564 -4746.436528 -4153.508115
#> carbodysedan -4472.50430 -3755.293299 -3211.358753
#> carbodywagon -5753.76423 -4964.855624 -4405.454822
#> peakrpm 1.51928 1.780429 1.914434
nilai_importance<-ins_model4_shapley$lmg
as.data.frame(nilai_importance) %>% tibble::rownames_to_column("variabel") %>% ggplot(aes(x=reorder(variabel,-nilai_importance), y=nilai_importance))+geom_bar(aes(fill=nilai_importance),stat="identity")+labs(x="variabel")
Berdasarkan hasil diatas enginesize (42%) dan carwidth (27%) merupakan variabel yang paling berpengaruh dalam menentukan harga mobil.
Pada tulisan ini , saya mencoba menganalisa data harga mobil menggunakan analisis regresi dan fitur variabel importance. Data tersebut masih terdapat multikolinearitas sehingga sempat ada yang NA untuk nilai koefisien yang dihasilkan. Kemudian saya coba kan satu persatu menghilangkan variabel yang memiliki multikolinearitas tersebut dan dapatkan model terakhir dengan Adj r squared yang bagus sebesar 85%, dengan tingkat kesalahan prediksi di sekitar 15-16%. Pada tulisan ini saya batasi untuk pengecekan asumsi pada multikolinearitas karena permaasalahan yang paling dihadapi menggunakan data ini diawal adalah terdapat multikolinearitas. Saya juga menganalisa dari model yang didapatkkan, variabel engine size dan carwidth merupakan variabel dominan dalam mempengaruhi harga mobil di pasar AS. Semoga Bermanfaat :)
A work by Briandamar Kencana
damarbrian@gmail.com