Tugas Minggu 6

Input Data

library(readxl)
## Warning: package 'readxl' was built under R version 4.3.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tibble' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'purrr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read_excel("C:/Users/Admin/Downloads/PSD Kelompok 3 (3).xlsx")
names(data)[names(data) == "Jumlah Ulasan"] <- "J.Ulasan"
head(data)
## # A tibble: 6 × 6
##   Brand     HARGA LOKASI      J.Ulasan RAM   Penyimpanan
##   <chr>     <dbl> <chr>       <chr>    <chr> <chr>      
## 1 iphone  4388000 Jabodetabek 10       4     256        
## 2 iphone  9458000 Jabodetabek 3071     4     128        
## 3 iphone  9409000 Jabodetabek 2713     4     128        
## 4 iphone  5047000 Luar Jawa   87       4     64         
## 5 iphone 11287000 Jabodetabek 567      6     128        
## 6 iphone 21447000 Jabodetabek 732      8     256
data <- na.omit(data)

Pada tahap ini, Kami akan mengambil data hasil scrapping pada Minggu 3. Data ini adalah Data Penjualan Handphone, dengan Harga sebagai peubah Respon dan 5 Peubah Penjelas. Peubah Penjelas terdiri dari Brand, Lokasi, Ram, Penyimpanan.

Pada Tahap ini Toko tidak kami masukkan ke model atas dasar pertimbangan pengelompokkan yang rumit untuk dilakukan sehingga pada bagian ini kami tidak memasukkan peubah Toko terlebih dahulu dan memodelkan dengan 5 Peubah Penjelas

data$Brand <- relevel(as.factor(data$Brand), ref="iphone")
data$LOKASI <- relevel(as.factor(data$LOKASI), ref="Jabodetabek")
data$RAM <- relevel(as.factor(as.numeric(data$RAM)), ref = "6")
data$Penyimpanan <- relevel(as.factor(as.numeric(data$Penyimpanan)), ref = "128")
data$J.Ulasan <- as.numeric(data$J.Ulasan)

Pada tahapan ini , kami melakukan Re-level pada data-data kategorik yaitu Brand, Lokasi , RAM dan Penyimpanan. RAM dan Penyimpanan memang berbentuk data numerik, namun dalam substansinya, angka-angka pada RAM dan Penyimpanan sudah disetting dalam beberapa kategori sehingga dapat dikatakan bahwa data ini lebih ke Kategorik daripada Numerik.

Pada Tahap ini, Kami juga menentukan Reference yaitu Untuk Brand Kami menerapkan Reference di kategori Iphone, lalu untuk Lokasi kamu menerapkan reference di Jabodetabek, 6 Untuk Ram dan 128 untuk Penyimpanan.

Terakhir kami menyiapkan data ulasan sebagai numeric. Hal ini dilakukan agar data tetap dibaca sebagai numerik pada model.

Visualisasi Dasar

hist(data$HARGA, col = "#243b90", main = "Histogram Harga Asli", xlab = "Harga")

Untuk Visualisasi Sebelum Ke Model, Dapat dilihat bahwa Kecenderungan Harga menjulur ke kanan. Hal ini mengartikan bahwa harga Handphone menyebar di daerah 5 Jutaan. Kemudian data mulai turun secara signifikan sampai ke harga puluhan juta. Sebaran data hasil histogram ini juga mengindikasikan adanya outlier karena data tidak terdistribusi secara merata, untuk itu kami akan melakukan pendeteksian pencilan setelah pemodelan.

library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
model <- lm(HARGA ~ Brand+LOKASI + RAM + Penyimpanan+J.Ulasan, data=data) 
summary(model)
## 
## Call:
## lm(formula = HARGA ~ Brand + LOKASI + RAM + Penyimpanan + J.Ulasan, 
##     data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12853595  -1607087   -586536   1435603  14613759 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.102e+07  1.678e+05  65.685  < 2e-16 ***
## Brandasus           -7.748e+06  4.688e+05 -16.529  < 2e-16 ***
## Brandinfinix        -9.997e+06  2.215e+05 -45.142  < 2e-16 ***
## Brandoppo           -8.896e+06  2.553e+05 -34.848  < 2e-16 ***
## BrandRealme         -8.830e+06  2.508e+05 -35.211  < 2e-16 ***
## BrandSamsung        -7.186e+06  1.737e+05 -41.364  < 2e-16 ***
## Brandvivo           -8.731e+06  1.980e+05 -44.103  < 2e-16 ***
## Brandxiaomi         -9.110e+06  1.778e+05 -51.228  < 2e-16 ***
## LOKASIBali           3.884e+05  4.681e+05   0.830 0.406761    
## LOKASIDI Yogyakarta -6.591e+05  7.483e+05  -0.881 0.378479    
## LOKASIJawa Barat    -1.784e+04  2.283e+05  -0.078 0.937735    
## LOKASIJawa Tengah   -1.905e+05  4.780e+05  -0.399 0.690263    
## LOKASIJawa Timur     3.799e+05  1.773e+05   2.142 0.032244 *  
## LOKASILuar Jawa      4.378e+05  1.671e+05   2.620 0.008841 ** 
## RAM1                -1.117e+07  1.327e+06  -8.413  < 2e-16 ***
## RAM2                -8.142e+06  5.753e+05 -14.153  < 2e-16 ***
## RAM3                -6.381e+06  3.622e+05 -17.618  < 2e-16 ***
## RAM4                -2.780e+06  1.822e+05 -15.259  < 2e-16 ***
## RAM8                 9.669e+05  1.661e+05   5.820 6.50e-09 ***
## RAM12                4.762e+06  2.669e+05  17.842  < 2e-16 ***
## RAM16                8.399e+06  6.345e+05  13.237  < 2e-16 ***
## RAM18                1.271e+07  2.100e+06   6.053 1.59e-09 ***
## RAM24                1.567e+06  1.742e+06   0.899 0.368505    
## Penyimpanan8        -3.033e+06  2.928e+06  -1.036 0.300349    
## Penyimpanan16        2.474e+06  1.156e+06   2.140 0.032437 *  
## Penyimpanan32        3.742e+06  6.041e+05   6.194 6.63e-10 ***
## Penyimpanan64       -5.358e+05  1.917e+05  -2.795 0.005226 ** 
## Penyimpanan256       5.122e+05  1.379e+05   3.714 0.000207 ***
## Penyimpanan512       2.456e+06  2.494e+05   9.848  < 2e-16 ***
## J.Ulasan            -3.379e+02  2.808e+02  -1.203 0.228918    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2870000 on 3091 degrees of freedom
## Multiple R-squared:  0.6273, Adjusted R-squared:  0.6239 
## F-statistic: 179.4 on 29 and 3091 DF,  p-value: < 2.2e-16
anova(model)
## Analysis of Variance Table
## 
## Response: HARGA
##               Df     Sum Sq    Mean Sq  F value  Pr(>F)    
## Brand          7 2.2663e+16 3.2376e+15 392.9550 < 2e-16 ***
## LOKASI         6 1.2717e+14 2.1196e+13   2.5726 0.01732 *  
## RAM            9 1.8693e+16 2.0770e+15 252.0915 < 2e-16 ***
## Penyimpanan    6 1.3774e+15 2.2957e+14  27.8633 < 2e-16 ***
## J.Ulasan       1 1.1932e+13 1.1932e+13   1.4481 0.22892    
## Residuals   3091 2.5467e+16 8.2392e+12                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Didapatkan Model Sebagai Berikut ini , Dimana R-Squared 62%. Didapat pula bahwa mayoritas peubah yang digunaka signfikan dalam taraf kepercayaan 95 %. Namun ada beberapa peubah juga yang tidak signifkan seperti Jumlah Ulasan.

plot(model)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

library(lmtest)
## Warning: package 'lmtest' was built under R version 4.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
dwtest(model)
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.9868, p-value = 0.3095
## alternative hypothesis: true autocorrelation is greater than 0

Dari Plot Residual Vs Fitted, Asumsi Homoskedastisitas tidak terpenuhi karena banyaknya residual data hasil tidak menyebar di nilai tengah dimana y= 0 . Sehingga dapat disimpulkan bahwa terjadi Heteroskedasitas yang artinya ragam sisanya tidak homogen.

Dari Plot QQ-Residual , walaupun beberapa data menyebar normal di tengah, namun, mayoritas data di ujung dan di belakannya tidak berada di garis kenormalan data. Sehinga dapat dikatakan data tidak menyebar normal namun harus tetap dilakukan uji secara formal. Data ini bisa berpotensi tidak normal karena adanya pencilan. Data tidak menyebar mengikuti garis lurus. Bagian atas yang melenceng dari garis menunjukkan bahwa sisaan model menjulur ke kanan.

Dari Plot Fitted Values Vs Standarad Residual dapat dilihat bahwa data cenderung menyebar dan membentuk pola, hal ini mengindikasikan bahwa nilai harapan sisaan pada data tidak saling bebas . karena adanya pola yang terbentuk.

library(car)
## Warning: package 'car' was built under R version 4.3.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.3.2
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
vif(model)
##                 GVIF Df GVIF^(1/(2*Df))
## Brand       2.494361  7        1.067466
## LOKASI      1.130748  6        1.010293
## RAM         7.462211  9        1.118131
## Penyimpanan 4.322606  6        1.129741
## J.Ulasan    1.067401  1        1.033151
library(leaps)
## Warning: package 'leaps' was built under R version 4.3.3

VIF pada Seluruh Data Berada dibawah 10, yang artinya cukup bukti untuk mengatakan tidak ada multikolinearitas.

Pemilihan Peubah

Best Subset.

regfit.full <- regsubsets(HARGA ~ Brand+ LOKASI + RAM + Penyimpanan+ J.Ulasan, data=data , nvmax = 29)
reg.summary <-  summary(regfit.full)
summary(regfit.full)
## Subset selection object
## Call: regsubsets.formula(HARGA ~ Brand + LOKASI + RAM + Penyimpanan + 
##     J.Ulasan, data = data, nvmax = 29)
## 29 Variables  (and intercept)
##                     Forced in Forced out
## Brandasus               FALSE      FALSE
## Brandinfinix            FALSE      FALSE
## Brandoppo               FALSE      FALSE
## BrandRealme             FALSE      FALSE
## BrandSamsung            FALSE      FALSE
## Brandvivo               FALSE      FALSE
## Brandxiaomi             FALSE      FALSE
## LOKASIBali              FALSE      FALSE
## LOKASIDI Yogyakarta     FALSE      FALSE
## LOKASIJawa Barat        FALSE      FALSE
## LOKASIJawa Tengah       FALSE      FALSE
## LOKASIJawa Timur        FALSE      FALSE
## LOKASILuar Jawa         FALSE      FALSE
## RAM1                    FALSE      FALSE
## RAM2                    FALSE      FALSE
## RAM3                    FALSE      FALSE
## RAM4                    FALSE      FALSE
## RAM8                    FALSE      FALSE
## RAM12                   FALSE      FALSE
## RAM16                   FALSE      FALSE
## RAM18                   FALSE      FALSE
## RAM24                   FALSE      FALSE
## Penyimpanan8            FALSE      FALSE
## Penyimpanan16           FALSE      FALSE
## Penyimpanan32           FALSE      FALSE
## Penyimpanan64           FALSE      FALSE
## Penyimpanan256          FALSE      FALSE
## Penyimpanan512          FALSE      FALSE
## J.Ulasan                FALSE      FALSE
## 1 subsets of each size up to 29
## Selection Algorithm: exhaustive
##           Brandasus Brandinfinix Brandoppo BrandRealme BrandSamsung Brandvivo
## 1  ( 1 )  " "       " "          " "       " "         " "          " "      
## 2  ( 1 )  " "       " "          " "       " "         " "          " "      
## 3  ( 1 )  " "       "*"          " "       " "         " "          " "      
## 4  ( 1 )  " "       "*"          " "       " "         " "          "*"      
## 5  ( 1 )  " "       "*"          " "       "*"         " "          "*"      
## 6  ( 1 )  " "       "*"          "*"       "*"         "*"          "*"      
## 7  ( 1 )  " "       "*"          "*"       "*"         "*"          "*"      
## 8  ( 1 )  " "       "*"          "*"       "*"         "*"          "*"      
## 9  ( 1 )  " "       "*"          "*"       "*"         "*"          "*"      
## 10  ( 1 ) " "       "*"          "*"       "*"         "*"          "*"      
## 11  ( 1 ) " "       "*"          "*"       "*"         "*"          "*"      
## 12  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 13  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 14  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 15  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 16  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 17  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 18  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 19  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 20  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 21  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 22  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 23  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 24  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 25  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 26  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 27  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 28  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
## 29  ( 1 ) "*"       "*"          "*"       "*"         "*"          "*"      
##           Brandxiaomi LOKASIBali LOKASIDI Yogyakarta LOKASIJawa Barat
## 1  ( 1 )  " "         " "        " "                 " "             
## 2  ( 1 )  "*"         " "        " "                 " "             
## 3  ( 1 )  "*"         " "        " "                 " "             
## 4  ( 1 )  "*"         " "        " "                 " "             
## 5  ( 1 )  "*"         " "        " "                 " "             
## 6  ( 1 )  "*"         " "        " "                 " "             
## 7  ( 1 )  "*"         " "        " "                 " "             
## 8  ( 1 )  "*"         " "        " "                 " "             
## 9  ( 1 )  "*"         " "        " "                 " "             
## 10  ( 1 ) "*"         " "        " "                 " "             
## 11  ( 1 ) "*"         " "        " "                 " "             
## 12  ( 1 ) "*"         " "        " "                 " "             
## 13  ( 1 ) "*"         " "        " "                 " "             
## 14  ( 1 ) "*"         " "        " "                 " "             
## 15  ( 1 ) "*"         " "        " "                 " "             
## 16  ( 1 ) "*"         " "        " "                 " "             
## 17  ( 1 ) "*"         " "        " "                 " "             
## 18  ( 1 ) "*"         " "        " "                 " "             
## 19  ( 1 ) "*"         " "        " "                 " "             
## 20  ( 1 ) "*"         " "        " "                 " "             
## 21  ( 1 ) "*"         " "        " "                 " "             
## 22  ( 1 ) "*"         " "        " "                 " "             
## 23  ( 1 ) "*"         " "        " "                 " "             
## 24  ( 1 ) "*"         " "        " "                 " "             
## 25  ( 1 ) "*"         " "        " "                 " "             
## 26  ( 1 ) "*"         " "        "*"                 " "             
## 27  ( 1 ) "*"         "*"        "*"                 " "             
## 28  ( 1 ) "*"         "*"        "*"                 " "             
## 29  ( 1 ) "*"         "*"        "*"                 "*"             
##           LOKASIJawa Tengah LOKASIJawa Timur LOKASILuar Jawa RAM1 RAM2 RAM3
## 1  ( 1 )  " "               " "              " "             " "  " "  " " 
## 2  ( 1 )  " "               " "              " "             " "  " "  " " 
## 3  ( 1 )  " "               " "              " "             " "  " "  " " 
## 4  ( 1 )  " "               " "              " "             " "  " "  " " 
## 5  ( 1 )  " "               " "              " "             " "  " "  " " 
## 6  ( 1 )  " "               " "              " "             " "  " "  " " 
## 7  ( 1 )  " "               " "              " "             " "  " "  " " 
## 8  ( 1 )  " "               " "              " "             " "  " "  " " 
## 9  ( 1 )  " "               " "              " "             " "  " "  "*" 
## 10  ( 1 ) " "               " "              " "             " "  "*"  "*" 
## 11  ( 1 ) " "               " "              " "             " "  "*"  "*" 
## 12  ( 1 ) " "               " "              " "             " "  "*"  "*" 
## 13  ( 1 ) " "               " "              " "             " "  "*"  "*" 
## 14  ( 1 ) " "               " "              " "             "*"  "*"  "*" 
## 15  ( 1 ) " "               " "              " "             "*"  "*"  "*" 
## 16  ( 1 ) " "               " "              " "             "*"  "*"  "*" 
## 17  ( 1 ) " "               " "              " "             "*"  "*"  "*" 
## 18  ( 1 ) " "               " "              " "             "*"  "*"  "*" 
## 19  ( 1 ) " "               " "              " "             "*"  "*"  "*" 
## 20  ( 1 ) " "               " "              "*"             "*"  "*"  "*" 
## 21  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 22  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 23  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 24  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 25  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 26  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 27  ( 1 ) " "               "*"              "*"             "*"  "*"  "*" 
## 28  ( 1 ) "*"               "*"              "*"             "*"  "*"  "*" 
## 29  ( 1 ) "*"               "*"              "*"             "*"  "*"  "*" 
##           RAM4 RAM8 RAM12 RAM16 RAM18 RAM24 Penyimpanan8 Penyimpanan16
## 1  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 2  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 3  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 4  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 5  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 6  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 7  ( 1 )  " "  " "  " "   " "   " "   " "   " "          " "          
## 8  ( 1 )  " "  " "  "*"   " "   " "   " "   " "          " "          
## 9  ( 1 )  "*"  " "  "*"   " "   " "   " "   " "          " "          
## 10  ( 1 ) "*"  " "  "*"   " "   " "   " "   " "          " "          
## 11  ( 1 ) "*"  " "  "*"   " "   " "   " "   " "          " "          
## 12  ( 1 ) "*"  " "  "*"   "*"   " "   " "   " "          " "          
## 13  ( 1 ) "*"  " "  "*"   "*"   " "   " "   " "          " "          
## 14  ( 1 ) "*"  " "  "*"   "*"   " "   " "   " "          " "          
## 15  ( 1 ) "*"  "*"  "*"   "*"   " "   " "   " "          " "          
## 16  ( 1 ) "*"  "*"  "*"   "*"   " "   " "   " "          " "          
## 17  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          " "          
## 18  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          " "          
## 19  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          " "          
## 20  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          " "          
## 21  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          " "          
## 22  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          "*"          
## 23  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   " "          "*"          
## 24  ( 1 ) "*"  "*"  "*"   "*"   "*"   " "   "*"          "*"          
## 25  ( 1 ) "*"  "*"  "*"   "*"   "*"   "*"   "*"          "*"          
## 26  ( 1 ) "*"  "*"  "*"   "*"   "*"   "*"   "*"          "*"          
## 27  ( 1 ) "*"  "*"  "*"   "*"   "*"   "*"   "*"          "*"          
## 28  ( 1 ) "*"  "*"  "*"   "*"   "*"   "*"   "*"          "*"          
## 29  ( 1 ) "*"  "*"  "*"   "*"   "*"   "*"   "*"          "*"          
##           Penyimpanan32 Penyimpanan64 Penyimpanan256 Penyimpanan512 J.Ulasan
## 1  ( 1 )  " "           " "           " "            "*"            " "     
## 2  ( 1 )  " "           " "           " "            "*"            " "     
## 3  ( 1 )  " "           " "           " "            "*"            " "     
## 4  ( 1 )  " "           " "           " "            "*"            " "     
## 5  ( 1 )  " "           " "           " "            "*"            " "     
## 6  ( 1 )  " "           " "           " "            " "            " "     
## 7  ( 1 )  " "           "*"           " "            " "            " "     
## 8  ( 1 )  " "           "*"           " "            " "            " "     
## 9  ( 1 )  " "           " "           " "            " "            " "     
## 10  ( 1 ) " "           " "           " "            " "            " "     
## 11  ( 1 ) " "           " "           " "            "*"            " "     
## 12  ( 1 ) " "           " "           " "            " "            " "     
## 13  ( 1 ) " "           " "           " "            "*"            " "     
## 14  ( 1 ) " "           " "           " "            "*"            " "     
## 15  ( 1 ) " "           " "           " "            "*"            " "     
## 16  ( 1 ) "*"           " "           " "            "*"            " "     
## 17  ( 1 ) "*"           " "           " "            "*"            " "     
## 18  ( 1 ) "*"           " "           "*"            "*"            " "     
## 19  ( 1 ) "*"           "*"           "*"            "*"            " "     
## 20  ( 1 ) "*"           "*"           "*"            "*"            " "     
## 21  ( 1 ) "*"           "*"           "*"            "*"            " "     
## 22  ( 1 ) "*"           "*"           "*"            "*"            " "     
## 23  ( 1 ) "*"           "*"           "*"            "*"            "*"     
## 24  ( 1 ) "*"           "*"           "*"            "*"            "*"     
## 25  ( 1 ) "*"           "*"           "*"            "*"            "*"     
## 26  ( 1 ) "*"           "*"           "*"            "*"            "*"     
## 27  ( 1 ) "*"           "*"           "*"            "*"            "*"     
## 28  ( 1 ) "*"           "*"           "*"            "*"            "*"     
## 29  ( 1 ) "*"           "*"           "*"            "*"            "*"

Pada Subsetting kali ini , kami memilih nilai nvmax 29 untuk memasukkan semua variabel, meningat sebagian besar variabel kami adalah variabel dummy. Setelah dilihat dari summaryna, ternyata model ke 29 atau model maksimal adalah model yang memasukkan semua variabel dummy tanpa adanya variabel dummy yang terbuang, hal ini menjadi pertimbangan kami untuk memilih model ini, namun perlu dilakukan pengetesan.

reg.summary$adjr2
##  [1] 0.09240237 0.13856492 0.17996271 0.22998662 0.27180073 0.33017864
##  [7] 0.40556025 0.46406155 0.51016061 0.53845534 0.56307456 0.58207056
## [13] 0.59511227 0.60330756 0.61001694 0.61525926 0.61958078 0.62152828
## [19] 0.62252194 0.62318433 0.62365766 0.62410094 0.62415064 0.62415792
## [25] 0.62413529 0.62410899 0.62407385 0.62397130 0.62385039
which.max(reg.summary$adjr2)
## [1] 24

Dari adj R^2 dapat dilihat bahwa model ke 24 mendapat r-squared tertinggi di 0.624 namun perlu diperhatikan bahwa model ini tidak menggapai seluruh variabel dummy.

coef(regfit.full, 24)
##      (Intercept)        Brandasus     Brandinfinix        Brandoppo 
##     1.102538e+07    -7.643575e+06    -1.000860e+07    -8.898051e+06 
##      BrandRealme     BrandSamsung        Brandvivo      Brandxiaomi 
##    -8.830088e+06    -7.192492e+06    -8.728234e+06    -9.105534e+06 
## LOKASIJawa Timur  LOKASILuar Jawa             RAM1             RAM2 
##     3.830273e+05     4.408839e+05    -1.117701e+07    -8.169859e+06 
##             RAM3             RAM4             RAM8            RAM12 
##    -6.385813e+06    -2.787857e+06     9.568391e+05     4.737387e+06 
##            RAM16            RAM18     Penyimpanan8    Penyimpanan16 
##     8.300965e+06     1.256514e+07    -3.012885e+06     2.487205e+06 
##    Penyimpanan32    Penyimpanan64   Penyimpanan256   Penyimpanan512 
##     3.748907e+06    -5.397252e+05     5.161355e+05     2.492005e+06 
##         J.Ulasan 
##    -3.327357e+02
#Cp
reg.summary$cp 
##  [1] 4408.72093 4025.65492 3682.31799 3267.74290 2921.42153 2438.19702
##  [7] 1814.56096 1330.98154  950.28778  717.04507  514.33213  358.21288
## [13]  251.37716  184.62932  130.19089   87.89424   53.22162   38.14977
## [19]   30.95178   26.48923   23.58780   20.93643   21.52792   22.46879
## [25]   23.65580   24.87283   26.16250   28.00610   30.00000
which.min(reg.summary$cp)
## [1] 22

Dari Nilai Cp-Mallows, didapat 20.936 di model ke 22 yang artinya paling rendah, namun sama seperti kasus sebelumnya, model ini tidak mencakup seluruh variabel dummy.

n <- nrow(data)
rss <- reg.summary$rss
aic <- n*log(rss/n) + 2*(10+1)
aic
##  [1] 95587.32 95423.40 95268.69 95071.25 94895.99 94634.19 94260.56 93936.23
##  [9] 93654.51 93467.82 93295.73 93156.00 93056.05 92991.22 92936.98 92893.74
## [17] 92857.48 92840.45 92831.24 92824.75 92819.82 92815.14 92813.72 92812.65
## [25] 92811.83 92811.04 92810.32 92810.16 92810.16
which.min(aic)
## [1] 29

AIC terendah di 29, dan di model 29 semua model terambil, sehingga ini bisa menjadi pertimbangan dalam pemilihan subsetnya.

coef(regfit.full, 29)
##         (Intercept)           Brandasus        Brandinfinix           Brandoppo 
##        1.102039e+07       -7.748241e+06       -9.996847e+06       -8.896092e+06 
##         BrandRealme        BrandSamsung           Brandvivo         Brandxiaomi 
##       -8.829891e+06       -7.185516e+06       -8.730847e+06       -9.109551e+06 
##          LOKASIBali LOKASIDI Yogyakarta    LOKASIJawa Barat   LOKASIJawa Tengah 
##        3.883858e+05       -6.590816e+05       -1.783686e+04       -1.904907e+05 
##    LOKASIJawa Timur     LOKASILuar Jawa                RAM1                RAM2 
##        3.798504e+05        4.377656e+05       -1.116853e+07       -8.141921e+06 
##                RAM3                RAM4                RAM8               RAM12 
##       -6.381449e+06       -2.780065e+06        9.668636e+05        4.762440e+06 
##               RAM16               RAM18               RAM24        Penyimpanan8 
##        8.398689e+06        1.271118e+07        1.567127e+06       -3.032647e+06 
##       Penyimpanan16       Penyimpanan32       Penyimpanan64      Penyimpanan256 
##        2.473774e+06        3.741696e+06       -5.357751e+05        5.121893e+05 
##      Penyimpanan512            J.Ulasan 
##        2.456009e+06       -3.379496e+02
reg.summary$bic
##  [1]  -287.5025  -443.3781  -590.0422  -779.4395  -946.6506 -1200.4091
##  [7] -1565.9870 -1882.2796 -2155.9450 -2334.5980 -2498.6373 -2630.3238
## [13] -2722.2266 -2779.0053 -2825.2022 -2860.4001 -2888.6141 -2897.5927
## [19] -2898.7579 -2897.2001 -2894.0840 -2890.7236 -2884.0979 -2877.1204
## [25] -2869.8948 -2862.6391 -2855.3103 -2847.4223 -2839.3826
which.min(reg.summary$bic)
## [1] 19
coef(regfit.full, 19)
##    (Intercept)      Brandasus   Brandinfinix      Brandoppo    BrandRealme 
##     11094447.8     -7667292.8     -9972189.6     -8951201.6     -8835170.6 
##   BrandSamsung      Brandvivo    Brandxiaomi           RAM1           RAM2 
##     -7169694.4     -8736884.5     -9083600.1    -10723015.7     -7959902.8 
##           RAM3           RAM4           RAM8          RAM12          RAM16 
##     -6375916.6     -2763403.3       959749.5      4713574.5      8300264.8 
##          RAM18  Penyimpanan32  Penyimpanan64 Penyimpanan256 Penyimpanan512 
##     12552487.1      3611103.6      -578264.1       495172.0      2434857.9

BIC terendah di model 19, Namun sama seperti r-squared dan cp mallows, masih ada variabel dummy yang tidak tertaut ke model

#Kesimpulan
method.sub <- data.frame(kriteria_pemilihan_model = c("R2-adjusted", "Cp", "AIC", "BIC"), model_terpilih = c(24,22, 29, 19)) 
colnames(method.sub) <- c("Kriteria Pemilihan Model", "Model Terpilih")
method.sub
##   Kriteria Pemilihan Model Model Terpilih
## 1              R2-adjusted             24
## 2                       Cp             22
## 3                      AIC             29
## 4                      BIC             19

Maka dari itu dari Beberapa kriteria pemilihan model, kami memilih model ke 29 karena mengcover seluruh peubah dummy yang ada

#membuat plot
par(mfrow=c(2,2))

#plot Adj R2
plot(reg.summary$adjr2 ,xlab="Model",
 ylab="R2-Adjusted",type="l", main="Plot R2-Adjusted")
points(24, reg.summary$adjr2[24], col="red",cex=2,pch=20)

#plot Cp
plot(reg.summary$cp ,xlab="Model",
 ylab="Cp",type="l", main="Plot Cp")
points(22, reg.summary$cp[22], col="red",cex=2,pch=20)

#plot AIC
plot(aic, xlab="Model",
 ylab="AIC",type="l", main="Plot AIC")
points(29, aic[29], col="red",cex=2,pch=20)

#plot BIC
plot(reg.summary$bic ,xlab="Model",
 ylab="BIC",type="l", main="Plot BIC")
points(19, reg.summary$bic[19], col="red",cex=2,pch=20)

Dapat dilihat secara visualisasi bahwa nilai nilai kategori yang kami pakai sesuai dengan plot

Forward

# Model null (hanya intercept)
null_model <- lm(HARGA ~ 1, data = data)

# Model full (semua variabel)
full_model <- lm(HARGA ~ ., data = data)

# Forward selection
forward_model <- step(null_model, 
                      scope = list(lower = null_model, upper = full_model), 
                      direction = "forward")
## Start:  AIC=95870.92
## HARGA ~ 1
## 
##               Df  Sum of Sq        RSS   AIC
## + Brand        7 2.2663e+16 4.5677e+16 94627
## + Penyimpanan  6 8.1557e+15 6.0185e+16 95486
## + RAM          9 7.5238e+15 6.0817e+16 95525
## + LOKASI       6 3.3614e+14 6.8004e+16 95868
## <none>                      6.8341e+16 95871
## + J.Ulasan     1 1.7083e+12 6.8339e+16 95873
## 
## Step:  AIC=94627.45
## HARGA ~ Brand
## 
##               Df  Sum of Sq        RSS   AIC
## + RAM          9 1.8744e+16 2.6933e+16 92997
## + Penyimpanan  6 1.0760e+16 3.4918e+16 93801
## + J.Ulasan     1 1.5941e+14 4.5518e+16 94619
## <none>                      4.5677e+16 94627
## + LOKASI       6 1.2717e+14 4.5550e+16 94631
## 
## Step:  AIC=92996.78
## HARGA ~ Brand + RAM
## 
##               Df  Sum of Sq        RSS   AIC
## + Penyimpanan  6 1.3425e+15 2.5590e+16 92849
## + J.Ulasan     1 3.7539e+13 2.6895e+16 92994
## <none>                      2.6933e+16 92997
## + LOKASI       6 7.6156e+13 2.6857e+16 93000
## 
## Step:  AIC=92849.2
## HARGA ~ Brand + RAM + Penyimpanan
## 
##            Df  Sum of Sq        RSS   AIC
## + LOKASI    6 1.1107e+14 2.5479e+16 92848
## + J.Ulasan  1 2.3375e+13 2.5567e+16 92848
## <none>                   2.5590e+16 92849
## 
## Step:  AIC=92847.62
## HARGA ~ Brand + RAM + Penyimpanan + LOKASI
## 
##            Df  Sum of Sq        RSS   AIC
## <none>                   2.5479e+16 92848
## + J.Ulasan  1 1.1932e+13 2.5467e+16 92848
summary(forward_model)
## 
## Call:
## lm(formula = HARGA ~ Brand + RAM + Penyimpanan + LOKASI, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12755191  -1607112   -588418   1423541  14641855 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          10981145     164588  66.719  < 2e-16 ***
## Brandasus            -7725334     468403 -16.493  < 2e-16 ***
## Brandinfinix         -9990208     221399 -45.123  < 2e-16 ***
## Brandoppo            -8887418     255200 -34.825  < 2e-16 ***
## BrandRealme          -8808185     250139 -35.213  < 2e-16 ***
## BrandSamsung         -7177078     173586 -41.346  < 2e-16 ***
## Brandvivo            -8714458     197512 -44.121  < 2e-16 ***
## Brandxiaomi          -9096452     177503 -51.247  < 2e-16 ***
## RAM1                -11127642    1327145  -8.385  < 2e-16 ***
## RAM2                 -8146282     575314 -14.160  < 2e-16 ***
## RAM3                 -6411525     361381 -17.742  < 2e-16 ***
## RAM4                 -2785169     182153 -15.290  < 2e-16 ***
## RAM8                   963907     166128   5.802 7.21e-09 ***
## RAM12                 4753831     266839  17.815  < 2e-16 ***
## RAM16                 8396291     634543  13.232  < 2e-16 ***
## RAM18                12694810    2099945   6.045 1.67e-09 ***
## RAM24                 1562644    1742518   0.897 0.369910    
## Penyimpanan8         -3012127    2927824  -1.029 0.303657    
## Penyimpanan16         2410139    1154869   2.087 0.036976 *  
## Penyimpanan32         3757600     603965   6.222 5.59e-10 ***
## Penyimpanan64         -545256     191560  -2.846 0.004451 ** 
## Penyimpanan256         517825     137823   3.757 0.000175 ***
## Penyimpanan512        2463878     249320   9.882  < 2e-16 ***
## LOKASIBali             418587     467453   0.895 0.370610    
## LOKASIDI Yogyakarta   -625319     747780  -0.836 0.403088    
## LOKASIJawa Barat         5923     227479   0.026 0.979231    
## LOKASIJawa Tengah     -158271     477261  -0.332 0.740196    
## LOKASIJawa Timur       402264     176339   2.281 0.022604 *  
## LOKASILuar Jawa        466372     165410   2.819 0.004841 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2871000 on 3092 degrees of freedom
## Multiple R-squared:  0.6272, Adjusted R-squared:  0.6238 
## F-statistic: 185.8 on 28 and 3092 DF,  p-value: < 2.2e-16

Dari Model Forward, didapat model dengan 28 peubah , dimana 1 peubah dibuang yaitu peubah jumlah ulasan, Didapat pula nilai r-squared yang cukup tinggi di angka 0.6238

Backward

# Model penuh
full_model <- lm(HARGA ~ ., data = data)

# Backward elimination
backward_model <- step(full_model, direction = "backward")
## Start:  AIC=92848.16
## HARGA ~ Brand + LOKASI + J.Ulasan + RAM + Penyimpanan
## 
##               Df  Sum of Sq        RSS   AIC
## - J.Ulasan     1 1.1932e+13 2.5479e+16 92848
## <none>                      2.5467e+16 92848
## - LOKASI       6 9.9628e+13 2.5567e+16 92848
## - Penyimpanan  6 1.3624e+15 2.6830e+16 92999
## - RAM          9 9.2964e+15 3.4764e+16 93801
## - Brand        7 3.1323e+16 5.6790e+16 95337
## 
## Step:  AIC=92847.62
## HARGA ~ Brand + LOKASI + RAM + Penyimpanan
## 
##               Df  Sum of Sq        RSS   AIC
## <none>                      2.5479e+16 92848
## - LOKASI       6 1.1107e+14 2.5590e+16 92849
## - Penyimpanan  6 1.3774e+15 2.6857e+16 93000
## - RAM          9 9.3082e+15 3.4788e+16 93801
## - Brand        7 3.1397e+16 5.6876e+16 95340
summary(backward_model)
## 
## Call:
## lm(formula = HARGA ~ Brand + LOKASI + RAM + Penyimpanan, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12755191  -1607112   -588418   1423541  14641855 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          10981145     164588  66.719  < 2e-16 ***
## Brandasus            -7725334     468403 -16.493  < 2e-16 ***
## Brandinfinix         -9990208     221399 -45.123  < 2e-16 ***
## Brandoppo            -8887418     255200 -34.825  < 2e-16 ***
## BrandRealme          -8808185     250139 -35.213  < 2e-16 ***
## BrandSamsung         -7177078     173586 -41.346  < 2e-16 ***
## Brandvivo            -8714458     197512 -44.121  < 2e-16 ***
## Brandxiaomi          -9096452     177503 -51.247  < 2e-16 ***
## LOKASIBali             418587     467453   0.895 0.370610    
## LOKASIDI Yogyakarta   -625319     747780  -0.836 0.403088    
## LOKASIJawa Barat         5923     227479   0.026 0.979231    
## LOKASIJawa Tengah     -158271     477261  -0.332 0.740196    
## LOKASIJawa Timur       402264     176339   2.281 0.022604 *  
## LOKASILuar Jawa        466372     165410   2.819 0.004841 ** 
## RAM1                -11127642    1327145  -8.385  < 2e-16 ***
## RAM2                 -8146282     575314 -14.160  < 2e-16 ***
## RAM3                 -6411525     361381 -17.742  < 2e-16 ***
## RAM4                 -2785169     182153 -15.290  < 2e-16 ***
## RAM8                   963907     166128   5.802 7.21e-09 ***
## RAM12                 4753831     266839  17.815  < 2e-16 ***
## RAM16                 8396291     634543  13.232  < 2e-16 ***
## RAM18                12694810    2099945   6.045 1.67e-09 ***
## RAM24                 1562644    1742518   0.897 0.369910    
## Penyimpanan8         -3012127    2927824  -1.029 0.303657    
## Penyimpanan16         2410139    1154869   2.087 0.036976 *  
## Penyimpanan32         3757600     603965   6.222 5.59e-10 ***
## Penyimpanan64         -545256     191560  -2.846 0.004451 ** 
## Penyimpanan256         517825     137823   3.757 0.000175 ***
## Penyimpanan512        2463878     249320   9.882  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2871000 on 3092 degrees of freedom
## Multiple R-squared:  0.6272, Adjusted R-squared:  0.6238 
## F-statistic: 185.8 on 28 and 3092 DF,  p-value: < 2.2e-16

Dari Model Backward, didapat model dengan 28 peubah , dimana 1 peubah dibuang yaitu peubah jumlah ulasan, Didapat pula nilai r-squared yang cukup tinggi di angka 0.6238

Stepwise

stepwise_model <- step(null_model, 
                       scope = list(lower = null_model, upper = full_model), 
                       direction = "both")
## Start:  AIC=95870.92
## HARGA ~ 1
## 
##               Df  Sum of Sq        RSS   AIC
## + Brand        7 2.2663e+16 4.5677e+16 94627
## + Penyimpanan  6 8.1557e+15 6.0185e+16 95486
## + RAM          9 7.5238e+15 6.0817e+16 95525
## + LOKASI       6 3.3614e+14 6.8004e+16 95868
## <none>                      6.8341e+16 95871
## + J.Ulasan     1 1.7083e+12 6.8339e+16 95873
## 
## Step:  AIC=94627.45
## HARGA ~ Brand
## 
##               Df  Sum of Sq        RSS   AIC
## + RAM          9 1.8744e+16 2.6933e+16 92997
## + Penyimpanan  6 1.0760e+16 3.4918e+16 93801
## + J.Ulasan     1 1.5941e+14 4.5518e+16 94619
## <none>                      4.5677e+16 94627
## + LOKASI       6 1.2717e+14 4.5550e+16 94631
## - Brand        7 2.2663e+16 6.8341e+16 95871
## 
## Step:  AIC=92996.78
## HARGA ~ Brand + RAM
## 
##               Df  Sum of Sq        RSS   AIC
## + Penyimpanan  6 1.3425e+15 2.5590e+16 92849
## + J.Ulasan     1 3.7539e+13 2.6895e+16 92994
## <none>                      2.6933e+16 92997
## + LOKASI       6 7.6156e+13 2.6857e+16 93000
## - RAM          9 1.8744e+16 4.5677e+16 94627
## - Brand        7 3.3884e+16 6.0817e+16 95525
## 
## Step:  AIC=92849.2
## HARGA ~ Brand + RAM + Penyimpanan
## 
##               Df  Sum of Sq        RSS   AIC
## + LOKASI       6 1.1107e+14 2.5479e+16 92848
## + J.Ulasan     1 2.3375e+13 2.5567e+16 92848
## <none>                      2.5590e+16 92849
## - Penyimpanan  6 1.3425e+15 2.6933e+16 92997
## - RAM          9 9.3273e+15 3.4918e+16 93801
## - Brand        7 3.1773e+16 5.7363e+16 95354
## 
## Step:  AIC=92847.62
## HARGA ~ Brand + RAM + Penyimpanan + LOKASI
## 
##               Df  Sum of Sq        RSS   AIC
## <none>                      2.5479e+16 92848
## + J.Ulasan     1 1.1932e+13 2.5467e+16 92848
## - LOKASI       6 1.1107e+14 2.5590e+16 92849
## - Penyimpanan  6 1.3774e+15 2.6857e+16 93000
## - RAM          9 9.3082e+15 3.4788e+16 93801
## - Brand        7 3.1397e+16 5.6876e+16 95340
summary(stepwise_model)
## 
## Call:
## lm(formula = HARGA ~ Brand + RAM + Penyimpanan + LOKASI, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12755191  -1607112   -588418   1423541  14641855 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          10981145     164588  66.719  < 2e-16 ***
## Brandasus            -7725334     468403 -16.493  < 2e-16 ***
## Brandinfinix         -9990208     221399 -45.123  < 2e-16 ***
## Brandoppo            -8887418     255200 -34.825  < 2e-16 ***
## BrandRealme          -8808185     250139 -35.213  < 2e-16 ***
## BrandSamsung         -7177078     173586 -41.346  < 2e-16 ***
## Brandvivo            -8714458     197512 -44.121  < 2e-16 ***
## Brandxiaomi          -9096452     177503 -51.247  < 2e-16 ***
## RAM1                -11127642    1327145  -8.385  < 2e-16 ***
## RAM2                 -8146282     575314 -14.160  < 2e-16 ***
## RAM3                 -6411525     361381 -17.742  < 2e-16 ***
## RAM4                 -2785169     182153 -15.290  < 2e-16 ***
## RAM8                   963907     166128   5.802 7.21e-09 ***
## RAM12                 4753831     266839  17.815  < 2e-16 ***
## RAM16                 8396291     634543  13.232  < 2e-16 ***
## RAM18                12694810    2099945   6.045 1.67e-09 ***
## RAM24                 1562644    1742518   0.897 0.369910    
## Penyimpanan8         -3012127    2927824  -1.029 0.303657    
## Penyimpanan16         2410139    1154869   2.087 0.036976 *  
## Penyimpanan32         3757600     603965   6.222 5.59e-10 ***
## Penyimpanan64         -545256     191560  -2.846 0.004451 ** 
## Penyimpanan256         517825     137823   3.757 0.000175 ***
## Penyimpanan512        2463878     249320   9.882  < 2e-16 ***
## LOKASIBali             418587     467453   0.895 0.370610    
## LOKASIDI Yogyakarta   -625319     747780  -0.836 0.403088    
## LOKASIJawa Barat         5923     227479   0.026 0.979231    
## LOKASIJawa Tengah     -158271     477261  -0.332 0.740196    
## LOKASIJawa Timur       402264     176339   2.281 0.022604 *  
## LOKASILuar Jawa        466372     165410   2.819 0.004841 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2871000 on 3092 degrees of freedom
## Multiple R-squared:  0.6272, Adjusted R-squared:  0.6238 
## F-statistic: 185.8 on 28 and 3092 DF,  p-value: < 2.2e-16

Dari Model Stepwise, didapat model dengan 28 peubah , dimana 1 peubah dibuang yaitu peubah jumlah ulasan, Didapat pula nilai r-squared yang cukup tinggi di angka 0.6238

library(readxl)
library(tidyverse)
data <- read_excel("C:/Users/Admin/Downloads/PSD Kelompok 3 (3).xlsx")
names(data)[names(data) == "Jumlah Ulasan"] <- "J.Ulasan"
head(data)
## # A tibble: 6 × 6
##   Brand     HARGA LOKASI      J.Ulasan RAM   Penyimpanan
##   <chr>     <dbl> <chr>       <chr>    <chr> <chr>      
## 1 iphone  4388000 Jabodetabek 10       4     256        
## 2 iphone  9458000 Jabodetabek 3071     4     128        
## 3 iphone  9409000 Jabodetabek 2713     4     128        
## 4 iphone  5047000 Luar Jawa   87       4     64         
## 5 iphone 11287000 Jabodetabek 567      6     128        
## 6 iphone 21447000 Jabodetabek 732      8     256
data$Brand_iphone <- ifelse(data$Brand == "iphone", 1, 0)
data$Brand_samsung <- ifelse(data$Brand == "samsung", 1, 0)
data$Brand_xiaomi <- ifelse(data$Brand == "xiaomi", 1, 0)
data$J.Ulasan <- as.numeric(data$J.Ulasan)

# Membersihkan data lokasi
data$LOKASI <- gsub("^di ", "", data$LOKASI)

# Membuat dummy DKI Jakarta vs Lainnya
data$dummy <- ifelse(data$LOKASI == "Jabodetabek", 2, 3)

data$dummyram6 <- ifelse(data$RAM == 6, 1, 0)

data$Brand <- relevel(as.factor(data$Brand), ref="iphone")
data$LOKASI <- relevel(as.factor(data$dummy), ref= "2")
data$RAM <- relevel(as.factor(as.numeric(data$RAM)), ref = "1")
data$Penyimpanan <- relevel(as.factor(as.numeric(data$Penyimpanan)), ref = "128")
data$J.Ulasan <- as.numeric(data$J.Ulasan)

Pada Tahap ini , kami ingin mencoba re dummy variabel di Lokasi dan RAM dimana kami hanya mempertahankan referencenya saja dan membuat kategori lainnya selain itu dengan dilambangkan dengan angka.

model <- lm(HARGA ~ Brand+dummy + dummyram6 + Penyimpanan+J.Ulasan, data=data) 
summary(model)
## 
## Call:
## lm(formula = HARGA ~ Brand + dummy + dummyram6 + Penyimpanan + 
##     J.Ulasan, data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -11520286  -1620775   -447706    877252  18761573 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     8665481.7   361873.2  23.946  < 2e-16 ***
## Brandasus      -2193998.8   459488.7  -4.775 1.88e-06 ***
## Brandinfinix   -7863486.7   240782.0 -32.658  < 2e-16 ***
## Brandoppo      -6994171.3   287121.2 -24.360  < 2e-16 ***
## BrandRealme    -6834263.3   280762.4 -24.342  < 2e-16 ***
## BrandSamsung   -4840441.0   178770.5 -27.076  < 2e-16 ***
## Brandvivo      -6344122.5   212745.1 -29.820  < 2e-16 ***
## Brandxiaomi    -7298562.3   193542.3 -37.710  < 2e-16 ***
## dummy            130104.9   135962.0   0.957  0.33868    
## dummyram6        434934.9   172327.7   2.524  0.01166 *  
## Penyimpanan8   -8829886.1  3353000.3  -2.633  0.00849 ** 
## Penyimpanan16  -2875859.9  1273404.2  -2.258  0.02399 *  
## Penyimpanan32  -2731171.9   612496.6  -4.459 8.52e-06 ***
## Penyimpanan64  -2791235.0   197462.1 -14.136  < 2e-16 ***
## Penyimpanan256  1690847.0   150751.4  11.216  < 2e-16 ***
## Penyimpanan512  5209691.8   259351.3  20.087  < 2e-16 ***
## J.Ulasan           -455.2      326.5  -1.394  0.16340    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3349000 on 3104 degrees of freedom
##   (25 observations deleted due to missingness)
## Multiple R-squared:  0.4907, Adjusted R-squared:  0.4881 
## F-statistic: 186.9 on 16 and 3104 DF,  p-value: < 2.2e-16

Didapat model dengan semua peubah dan Aman, Namun saja r-squarednya sangat kecil dibawah dari metode sebelumnya, sehingga hal ini tidak kami lanjutkan ke pengujian lainnya.