Tujuan dari project ini adalah untuk memprediksi penjualan Busana Pria di Belanda. Dataset terdiri dari Penjualan Toko Busana Pria di Belanda dengan 400 observasi.
Berikut beberapa packages yang digunakan, silakan untuk melakukan instalasi packages terlebih dahulu jika diperlukan.
Data yang digunakan dalam project ini adalah Clothing. Dataset ini terdiri dari 400 data dengan 13 variabel.
Clothing <- read.csv("Clothing.csv")
data.frame("Total.Data" = dim(Clothing)[1],
"Total.Variabel" = dim(Clothing)[2])## Total.Data Total.Variabel
## 1 400 13
## [1] "tsale" "sales" "margin" "nown" "nfull" "npart" "naux"
## [8] "hoursw" "hourspw" "inv1" "inv2" "ssize" "start"
Berikut dilampirkan 10 data teratas.
## tsale sales margin nown nfull npart naux hoursw hourspw
## 1 750000 4411.765 41 1.0000 1.0000 1.0000 1.5357 76 16.75596
## 2 1926395 4280.878 39 2.0000 2.0000 3.0000 1.5357 192 22.49376
## 3 1250000 4166.667 40 1.0000 2.0000 2.2222 1.4091 114 17.19120
## 4 694227 2670.104 40 1.0000 1.0000 1.2833 1.3673 100 21.50260
## 5 750000 15000.000 44 2.0000 1.9556 1.2833 1.3673 104 15.74279
## 6 400000 4444.444 41 2.0000 1.9556 1.2833 1.3673 72 10.89885
## 7 1300000 3250.000 39 1.2228 1.0000 3.0000 4.0000 161 17.45674
## 8 495340 4953.400 28 2.0000 1.9556 1.2833 1.3673 80 12.10984
## 9 1200000 2666.667 41 1.0000 3.0000 2.2222 1.4091 158 20.70420
## 10 495340 6604.533 37 1.0000 1.9556 1.2833 1.0000 87 16.60654
## inv1 inv2 ssize start
## 1 17166.67 27177.04 170 1984
## 2 17166.67 27177.04 450 1972
## 3 292857.20 71570.55 300 1952
## 4 22207.04 15000.00 260 1966
## 5 22207.04 10000.00 50 1996
## 6 22207.04 22859.85 90 1947
## 7 22207.04 22859.85 400 1993
## 8 22207.04 22859.85 100 1952
## 9 292857.20 5000.00 450 1954
## 10 22207.04 22859.85 75 1973
Berikut beberapa deskripsi dari variabel dataset ini:
| Variabel | Deskripsi |
|---|---|
| tsales | Annual sales in Dutch guilders |
| sales | Sales per square meter |
| margin | Gross-profit-margin |
| nown | Number of owners (managers) |
| nfull | Number of full-timers |
| npart | Number of part-timers |
| naux | Number of helpers (temporary workers) |
| hoursw | Total number of hours worked |
| hourspw | Number of hours worked per worker |
| inv1 | Investment in shop-premises |
| inv2 | Investment in automation |
| ssize | Sales floorspace of the store |
| start | Year start of business |
Berikut adalah struktur dari dataset tersebut.
## 'data.frame': 400 obs. of 13 variables:
## $ tsale : int 750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
## $ sales : num 4412 4281 4167 2670 15000 ...
## $ margin : num 41 39 40 40 44 41 39 28 41 37 ...
## $ nown : num 1 2 1 1 2 ...
## $ nfull : num 1 2 2 1 1.96 ...
## $ npart : num 1 3 2.22 1.28 1.28 ...
## $ naux : num 1.54 1.54 1.41 1.37 1.37 ...
## $ hoursw : int 76 192 114 100 104 72 161 80 158 87 ...
## $ hourspw: num 16.8 22.5 17.2 21.5 15.7 ...
## $ inv1 : num 17167 17167 292857 22207 22207 ...
## $ inv2 : num 27177 27177 71571 15000 10000 ...
## $ ssize : int 170 450 300 260 50 90 400 100 450 75 ...
## $ start : int 1984 1972 1952 1966 1996 1947 1993 1952 1954 1973 ...
Dalam setiap proses pengolahan data kita dapat melakukan pemeriksaan data yang hilang (missing values) sebelum melakukan proses analisa dengan cara sebagai berikut:
## tsale sales margin nown nfull npart naux hoursw hourspw inv1
## 0 0 0 0 0 0 0 0 0 0
## inv2 ssize start
## 0 0 0
Hasilnya tidak terdapat missing value pada dataset ini.
## jumlah.seluruh.data jumlah.data.unik
## 1 400 400
Hasilnya tidak terdapat data duplikat pada dataset ini.
Berikut ringkasan data yang dapat dilihat.
## tsale sales margin nown
## Min. : 50000 Min. : 300 Min. :16.00 Min. : 1.000
## 1st Qu.: 495340 1st Qu.: 3904 1st Qu.:37.00 1st Qu.: 1.000
## Median : 694227 Median : 5279 Median :39.00 Median : 1.000
## Mean : 833584 Mean : 6335 Mean :38.77 Mean : 1.284
## 3rd Qu.: 976817 3rd Qu.: 7740 3rd Qu.:41.00 3rd Qu.: 1.295
## Max. :5000000 Max. :27000 Max. :66.00 Max. :10.000
## nfull npart naux hoursw
## Min. :1.000 Min. :1.000 Min. :1.000 Min. : 32.0
## 1st Qu.:1.923 1st Qu.:1.283 1st Qu.:1.333 1st Qu.: 80.0
## Median :1.956 Median :1.283 Median :1.367 Median :104.0
## Mean :2.069 Mean :1.566 Mean :1.390 Mean :121.1
## 3rd Qu.:2.066 3rd Qu.:2.000 3rd Qu.:1.367 3rd Qu.:145.2
## Max. :8.000 Max. :9.000 Max. :4.000 Max. :582.0
## hourspw inv1 inv2 ssize
## Min. : 5.708 Min. : 1000 Min. : 350 Min. : 16.0
## 1st Qu.:13.541 1st Qu.: 20000 1st Qu.: 10000 1st Qu.: 80.0
## Median :17.745 Median : 22207 Median : 22860 Median : 120.0
## Mean :18.955 Mean : 58257 Mean : 27829 Mean : 151.1
## 3rd Qu.:24.303 3rd Qu.: 62269 3rd Qu.: 22860 3rd Qu.: 190.0
## Max. :43.326 Max. :1500000 Max. :400000 Max. :1214.0
## start
## Min. :1945
## 1st Qu.:1959
## Median :1978
## Mean :1978
## 3rd Qu.:1996
## Max. :2015
ggcorr(Clothing,label = T, size=3, label_size = 3, hjust=0.95)+
labs(
title="Matriks Korelasi Data set"
)+
theme_minimal()+
theme(
plot.title = element_text(hjust = 0.5),
axis.title=element_text(size=8,face="bold"),
axis.text.y=element_blank()
)Berdasarkan matriks korelasi diatas, setiap variabel mempunyai pengaruh terhadap tsale. Lalu variabel yang memiliki korelasi paling tinggi dengan tsale adalah hoursw.
ggplot(Clothing, aes(x=tsale)) +
geom_histogram(aes(x=tsale, y=..density..), colour="black", fill="white")+
geom_density(alpha=.2, fill="blue")+
labs(title="Distribusi Penjualan Tanpa Outlier")+
theme_minimal()+
theme(
plot.title = element_text(hjust = 0.5),
axis.title=element_text(size=9,face="bold")
)Jika dilihat terdapat cukup banyak data yang memiliki tsale yang jauh lebih tinggi dari data lainnya sehingga menyebabkan distribusi penyebaran total volume tidak normal. Sehingga saya memutuskan untuk membuang data dengan tsale yang lebih dari 4 juta karena jika dilihat data yang memiliki tsale yang lebih dari 4 juta sangat sedikit.
Clothing_clean <- Clothing %>%
filter(tsale < 4000000)
ggplot(Clothing_clean, aes(x=tsale)) +
geom_histogram(aes(y=..density..), colour="black", fill="white",binwidth = 80000)+
geom_density(alpha=.2, fill="blue")+
scale_x_continuous(name = "tsale")+
labs(title="Distribusi Harga Penjualan < 4 Juta")+
theme_minimal()+
theme(
plot.title = element_text(hjust = 0.5),
axis.title=element_text(size=9,face="bold"),
axis.text.y=element_text(margin = margin(l=5)),
axis.text.x.bottom = element_text(margin = margin(b=5))
)Sebelum membangun model regresi linear, saya membagi data train:test dengan proporsi 80:20. Proporsi ini yang sudah sering digunakan dalam proses pemodelan machine learning.
Berdasarkan matriks korelasi sebelumnya, variabel hoursw memiliki korelasi paling kuat dengan variabel tsale.
##
## Call:
## lm(formula = tsale ~ hoursw, data = Clothing_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1844749 -216720 -53231 151918 2036572
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52567.4 42404.9 1.24 0.216
## hoursw 6389.3 306.4 20.85 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 359300 on 318 degrees of freedom
## Multiple R-squared: 0.5775, Adjusted R-squared: 0.5762
## F-statistic: 434.7 on 1 and 318 DF, p-value: < 2.2e-16
Berdasarkan single predictor, model ini hanya memiliki nilai Adjusted R-squared: 0.5762 yang berarti hanya berhasil menangkap variansi dari target sebesar 58%.
Jadi, Model Linear Regression menggunakan single predictor tidak cocok diterapkan untuk menentukan harga penjualan busana pria pada dataset ini.
##
## Call:
## lm(formula = tsale ~ ., data = Clothing_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1647613 -103129 -1013 92850 1484180
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.849e+05 1.370e+06 0.573 0.5671
## sales 7.591e+01 4.930e+00 15.398 <2e-16 ***
## margin -2.266e+02 2.853e+03 -0.079 0.9367
## nown -4.690e+04 4.165e+04 -1.126 0.2610
## nfull 9.169e+04 3.652e+04 2.511 0.0126 *
## npart 4.249e+04 3.720e+04 1.142 0.2542
## naux 9.222e+04 4.550e+04 2.027 0.0436 *
## hoursw 5.068e+02 1.285e+03 0.394 0.6936
## hourspw 1.801e+04 8.267e+03 2.178 0.0301 *
## inv1 -1.227e-01 1.493e-01 -0.821 0.4120
## inv2 2.292e-01 3.623e-01 0.633 0.5275
## ssize 2.265e+03 2.003e+02 11.306 <2e-16 ***
## start -7.559e+02 6.901e+02 -1.095 0.2742
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 251000 on 307 degrees of freedom
## Multiple R-squared: 0.8009, Adjusted R-squared: 0.7932
## F-statistic: 102.9 on 12 and 307 DF, p-value: < 2.2e-16
terlihat nilai Adjusted R-squared:0.793 , hal ini menandakan bahwa model lm_Clothing_all dapat memprediksi 79.3%.
## Start: AIC=7970.04
## tsale ~ sales + margin + nown + nfull + npart + naux + hoursw +
## hourspw + inv1 + inv2 + ssize + start
##
## Df Sum of Sq RSS AIC
## - margin 1 3.9767e+08 1.9345e+13 7968.0
## - hoursw 1 9.8003e+09 1.9355e+13 7968.2
## - inv2 1 2.5209e+10 1.9370e+13 7968.5
## - inv1 1 4.2522e+10 1.9388e+13 7968.7
## - start 1 7.5619e+10 1.9421e+13 7969.3
## - nown 1 7.9903e+10 1.9425e+13 7969.4
## - npart 1 8.2245e+10 1.9427e+13 7969.4
## <none> 1.9345e+13 7970.0
## - naux 1 2.5883e+11 1.9604e+13 7972.3
## - hourspw 1 2.9902e+11 1.9644e+13 7973.0
## - nfull 1 3.9722e+11 1.9742e+13 7974.5
## - ssize 1 8.0541e+12 2.7399e+13 8079.4
## - sales 1 1.4940e+13 3.4285e+13 8151.2
##
## Step: AIC=7968.05
## tsale ~ sales + nown + nfull + npart + naux + hoursw + hourspw +
## inv1 + inv2 + ssize + start
##
## Df Sum of Sq RSS AIC
## - hoursw 1 9.7477e+09 1.9355e+13 7966.2
## - inv2 1 2.5230e+10 1.9371e+13 7966.5
## - inv1 1 4.2180e+10 1.9388e+13 7966.7
## - start 1 7.5607e+10 1.9421e+13 7967.3
## - nown 1 7.9735e+10 1.9425e+13 7967.4
## - npart 1 8.1884e+10 1.9427e+13 7967.4
## <none> 1.9345e+13 7968.0
## - naux 1 2.6467e+11 1.9610e+13 7970.4
## - hourspw 1 2.9862e+11 1.9644e+13 7971.0
## - nfull 1 3.9755e+11 1.9743e+13 7972.6
## - ssize 1 8.0825e+12 2.7428e+13 8077.8
## - sales 1 1.5063e+13 3.4409e+13 8150.3
##
## Step: AIC=7966.21
## tsale ~ sales + nown + nfull + npart + naux + hourspw + inv1 +
## inv2 + ssize + start
##
## Df Sum of Sq RSS AIC
## - inv2 1 2.5949e+10 1.9381e+13 7964.6
## - inv1 1 4.4664e+10 1.9400e+13 7964.9
## - start 1 7.9809e+10 1.9435e+13 7965.5
## <none> 1.9355e+13 7966.2
## - nown 1 1.4669e+11 1.9502e+13 7966.6
## - npart 1 2.3619e+11 1.9591e+13 7968.1
## - naux 1 5.2588e+11 1.9881e+13 7972.8
## - nfull 1 2.3855e+12 2.1741e+13 8001.4
## - hourspw 1 4.6509e+12 2.4006e+13 8033.1
## - ssize 1 8.1207e+12 2.7476e+13 8076.3
## - sales 1 1.5068e+13 3.4424e+13 8148.5
##
## Step: AIC=7964.64
## tsale ~ sales + nown + nfull + npart + naux + hourspw + inv1 +
## ssize + start
##
## Df Sum of Sq RSS AIC
## - inv1 1 2.5289e+10 1.9406e+13 7963.1
## - start 1 7.4609e+10 1.9456e+13 7963.9
## <none> 1.9381e+13 7964.6
## - nown 1 1.3428e+11 1.9515e+13 7964.8
## - npart 1 2.2004e+11 1.9601e+13 7966.3
## - naux 1 5.2495e+11 1.9906e+13 7971.2
## - nfull 1 2.4031e+12 2.1784e+13 8000.0
## - hourspw 1 4.7241e+12 2.4105e+13 8032.4
## - ssize 1 8.2971e+12 2.7678e+13 8076.7
## - sales 1 1.5045e+13 3.4426e+13 8146.5
##
## Step: AIC=7963.06
## tsale ~ sales + nown + nfull + npart + naux + hourspw + ssize +
## start
##
## Df Sum of Sq RSS AIC
## - start 1 6.7331e+10 1.9474e+13 7962.2
## <none> 1.9406e+13 7963.1
## - nown 1 1.4030e+11 1.9547e+13 7963.4
## - npart 1 2.0605e+11 1.9612e+13 7964.4
## - naux 1 5.5560e+11 1.9962e+13 7970.1
## - nfull 1 2.3803e+12 2.1787e+13 7998.1
## - hourspw 1 4.7005e+12 2.4107e+13 8030.5
## - ssize 1 8.2800e+12 2.7686e+13 8074.8
## - sales 1 1.5091e+13 3.4497e+13 8145.1
##
## Step: AIC=7962.16
## tsale ~ sales + nown + nfull + npart + naux + hourspw + ssize
##
## Df Sum of Sq RSS AIC
## <none> 1.9474e+13 7962.2
## - nown 1 1.3095e+11 1.9605e+13 7962.3
## - npart 1 1.8971e+11 1.9663e+13 7963.3
## - naux 1 5.4986e+11 2.0024e+13 7969.1
## - nfull 1 2.4645e+12 2.1938e+13 7998.3
## - hourspw 1 4.7089e+12 2.4183e+13 8029.5
## - ssize 1 8.2229e+12 2.7697e+13 8072.9
## - sales 1 1.5100e+13 3.4574e+13 8143.9
##
## Call:
## lm(formula = tsale ~ sales + nown + nfull + npart + naux + hourspw +
## ssize, data = Clothing_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1634385 -99151 -3269 97041 1491549
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -787834.86 76363.79 -10.317 < 2e-16 ***
## sales 75.90 4.88 15.554 < 2e-16 ***
## nown -30712.84 21204.06 -1.448 0.14850
## nfull 104779.18 16674.71 6.284 1.11e-09 ***
## npart 45967.90 26366.81 1.743 0.08225 .
## naux 105213.32 35447.96 2.968 0.00323 **
## hourspw 21131.21 2432.83 8.686 < 2e-16 ***
## ssize 2257.65 196.69 11.478 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 249800 on 312 degrees of freedom
## Multiple R-squared: 0.7996, Adjusted R-squared: 0.7951
## F-statistic: 177.9 on 7 and 312 DF, p-value: < 2.2e-16
Berdasarkan hasil stepwise backward, formula untuk Multiple Linear Regression yang disarankan yaitu :
lm_Clothing_model <- lm(formula = tsale ~ sales + nown + nfull + npart + naux + hourspw +
ssize, data = Clothing_train)linearity <- data.frame(residual = lm_Clothing_model$residuals, fitted = Clothing_train$tsale)
linearity %>%
ggplot(aes(fitted, residual)) +
geom_point() +
geom_smooth(method = lm) +
geom_hline(aes(yintercept = 0)) +
theme_minimal()+
theme(
plot.title = element_text(hjust = 0.5),
axis.title=element_text(size=9,face="bold"),
axis.text.y=element_text(margin = margin(l=5)),
axis.text.x.bottom = element_text(margin = margin(b=5)))Berdasarkan pengujian tersebut, model yang dibuat belum berhasil menangkap variansi data dengan baik karena masih terlihat error atau jarak data dari mean cukup jauh.
##
## Shapiro-Wilk normality test
##
## data: lm_Clothing_model$residuals[0:5000]
## W = 0.86235, p-value = 2.986e-16
##
## Anderson-Darling normality test
##
## data: lm_Clothing_model$residuals
## A = 8.5716, p-value < 2.2e-16
Hipotesis:
Jadi, Karena nilai p-value < 0.05 maka keputusannya adalah terima H1, dengan kesimpulan residual dari model tidak berdistribusi normal.
##
## studentized Breusch-Pagan test
##
## data: lm_Clothing_model
## BP = 140.31, df = 7, p-value < 2.2e-16
Hipotesis:
Jadi, Karena nilai p-value < 0.05 maka keputusannya adalah terima H1, dengan kesimpulan residual dari model bersifat heteros, dapat dilihat membentuk sebuah pola.
predict_price <- predict(
lm_Clothing_model,
Clothing_test,
interval = "confidence",
level = 0.95
)
Clothing_predict <- cbind(Clothing_test,predict_price)
Clothing_predict %>%
select(
tsale, predict.fit=fit, predict.lower=lwr, predict.upper=upr,
sales, margin, nown, nfull, npart, naux, hoursw, hourspw, inv1,
inv2, ssize)## tsale predict.fit predict.lower predict.upper sales margin nown
## 1 750000 566511.13 509927.43 623094.8 4411.765 41 1.0000
## 3 1250000 998241.05 938778.42 1057703.7 4166.667 40 1.0000
## 12 231000 263618.08 206543.47 320692.7 5775.000 30 1.0000
## 15 330000 662292.54 603891.69 720693.4 8250.000 34 1.0000
## 20 330000 350067.46 306919.45 393215.5 4714.286 41 1.0000
## 25 471000 581879.69 523037.56 640721.8 4957.895 39 1.0000
## 30 694227 790844.75 700236.18 881453.3 5785.225 44 1.0000
## 32 495340 446500.52 393553.66 499447.4 4953.400 41 2.0000
## 39 876000 673050.73 609977.62 736123.8 6257.143 40 1.0000
## 59 549000 399529.22 336477.74 462580.7 4575.000 41 2.0000
## 63 976817 1157535.44 1061895.96 1253174.9 15027.950 43 1.0000
## 64 450000 611885.23 565559.62 658210.8 5000.000 39 1.0000
## 75 301133 483346.77 423212.11 543481.4 8603.800 37 1.0000
## 76 156168 341222.56 282670.94 399774.2 1382.018 37 1.0000
## 78 301133 215570.06 164135.32 267004.8 3011.330 35 1.0000
## 83 1343500 1768292.22 1661012.04 1875572.4 19192.860 42 1.2228
## 84 1290500 1676427.42 1570167.44 1782687.4 18435.710 42 1.2228
## 94 777362 699383.02 643974.35 754791.7 5182.414 32 1.0000
## 95 976817 791963.22 751220.08 832706.4 4884.085 39 1.0000
## 103 495340 472597.92 419507.47 525688.4 4503.091 31 1.0000
## 105 876845 864103.36 801282.84 926923.9 5480.281 42 1.0000
## 107 694227 631676.57 579872.39 683480.8 5833.840 41 2.0000
## 110 694227 642700.13 591986.09 693414.2 4338.919 39 2.0000
## 116 156168 38603.63 -24139.97 101347.2 1952.100 25 1.0000
## 118 976817 783417.62 719462.35 847372.9 9768.170 44 1.0000
## 127 301133 369388.82 310549.59 428228.0 2007.553 28 2.0000
## 134 495340 418265.18 353908.59 482621.8 4953.400 39 2.0000
## 136 976817 1099072.68 1005138.46 1193006.9 13954.530 41 1.0000
## 140 976817 789714.22 748338.52 831089.9 5280.092 41 1.0000
## 141 1378586 1214669.14 1151839.17 1277499.1 7877.634 44 1.0000
## 151 390000 498955.51 456311.92 541599.1 5571.429 42 1.0000
## 152 535000 482557.84 438656.87 526458.8 4458.333 34 1.0000
## 154 1926395 1936548.98 1838243.72 2034854.2 17512.680 41 2.0000
## 187 976817 1685055.27 1564739.14 1805371.4 2382.480 66 1.0000
## 189 694227 1151951.19 1055904.70 1247997.7 9917.528 39 1.0000
## 200 694227 597192.14 508909.94 685474.3 4958.764 44 1.0000
## 201 694227 564645.38 508588.00 620702.8 4958.764 44 1.0000
## 203 301133 394759.47 350304.54 439214.4 3764.163 41 1.0000
## 219 1475000 1280945.06 1220201.68 1341688.4 4402.985 41 1.0000
## 220 795655 1219494.96 1121736.74 1317253.2 1768.122 41 2.0000
## 221 570000 484691.60 437106.89 532276.3 5700.000 40 1.0000
## 230 770200 982477.29 899575.00 1065379.6 2026.842 41 2.0000
## 237 495340 523057.96 481325.79 564790.1 2896.725 40 1.0000
## 245 495340 606476.68 562384.68 650568.7 7620.615 41 1.0000
## 246 1400000 1040360.15 959534.85 1121185.5 7777.778 41 2.0000
## 251 971500 925507.25 847969.18 1003045.3 3736.539 41 2.0000
## 260 976817 1094600.96 1004091.68 1185110.2 2790.906 44 2.0000
## 262 301133 302903.96 257693.17 348114.8 4182.403 31 1.0000
## 270 301133 238725.02 186441.94 291008.1 3764.163 39 1.0000
## 271 750000 857358.82 770717.79 943999.9 3750.000 36 2.0000
## 272 301133 371463.26 326598.63 416327.9 3011.330 39 1.0000
## 274 301133 389601.33 340467.89 438734.8 2737.573 35 1.0000
## 278 976817 985752.63 918060.78 1053444.5 12210.210 39 2.0000
## 283 800000 939434.26 850168.86 1028699.6 5333.333 37 2.0000
## 289 976817 702568.64 628540.89 776596.4 3907.268 41 1.0000
## 290 1076500 1069649.19 983471.73 1155826.6 5053.991 39 1.0000
## 305 694227 494188.19 430892.04 557484.3 6942.270 35 2.0000
## 308 984717 1035699.40 983551.33 1087847.5 5182.721 37 1.0000
## 317 495340 547287.52 492557.94 602017.1 8255.667 38 1.0000
## 320 797825 746569.02 681818.78 811319.3 3989.125 42 1.0000
## 322 355724 324224.29 274312.87 374135.7 5081.771 39 1.0000
## 330 575000 423213.55 367413.29 479013.8 4107.143 35 1.0000
## 331 976817 927080.87 805773.47 1048388.3 3907.268 33 1.0000
## 334 976817 866734.01 812592.36 920875.6 4440.077 37 1.0000
## 341 495340 372992.50 318335.42 427649.6 6191.750 33 1.2228
## 342 694227 662858.89 607400.83 718317.0 6942.270 33 2.0000
## 350 301133 275225.14 223896.67 326553.6 5018.883 33 1.0000
## 352 1286703 1587243.24 1478920.68 1695565.8 3216.758 39 1.0000
## 356 930000 892757.90 832821.41 952694.4 8086.957 43 2.0000
## 357 565400 1370993.92 1289163.29 1452824.5 14135.000 44 1.0000
## 368 732981 916881.99 856914.24 976849.7 5235.579 40 1.0000
## 377 580000 1059197.56 996243.85 1122151.3 1933.333 39 1.0000
## 381 955867 866896.41 784771.01 949021.8 6827.622 43 1.0000
## 384 772665 701151.18 655424.45 746877.9 5519.036 39 1.0000
## 385 1200000 1366090.66 1283232.64 1448948.7 9230.770 42 2.0000
## 393 1800000 1448019.78 1332230.78 1563808.8 3964.758 48 3.0000
## 395 500000 598359.19 554856.96 641861.4 5000.000 37 1.0000
## 397 1252320 1319567.95 1244003.56 1395132.3 3339.520 36 1.0000
## nfull npart naux hoursw hourspw inv1 inv2 ssize
## 1 1.0000 1.0000 1.5357 76 16.755960 17166.67 27177.04 170
## 3 2.0000 2.2222 1.4091 114 17.191200 292857.20 71570.55 300
## 12 1.9556 1.0000 1.3673 40 7.514700 22207.04 5000.00 40
## 15 2.2656 2.0741 1.3333 92 13.786900 62269.23 16624.89 40
## 20 1.9556 1.2833 1.3673 65 11.594310 22207.04 22859.85 70
## 25 1.9556 1.2833 2.0000 99 15.868180 22207.04 22859.85 95
## 30 4.3590 2.0000 1.4091 84 9.580183 292857.20 71570.55 120
## 32 1.9231 1.5946 1.5357 86 12.192700 17166.67 27177.04 100
## 39 1.0000 2.2222 1.0000 96 18.383060 292857.20 71570.55 140
## 59 1.0000 2.0741 1.3333 88 13.734120 25000.00 16624.89 120
## 63 1.9556 1.0000 1.0000 78 15.739770 22207.04 5000.00 65
## 64 2.0000 1.5946 1.5357 117 19.085530 17166.67 27177.04 90
## 75 1.9556 1.2833 1.3673 43 7.670080 22207.04 22859.85 35
## 76 1.9556 1.2833 1.3673 104 18.550890 22207.04 22859.85 113
## 78 1.0000 1.2833 1.0000 63 14.708290 22207.04 1000.00 100
## 83 3.0000 2.0000 1.3673 154 20.289590 22207.04 22859.85 70
## 84 3.0000 2.0000 1.0000 148 20.490670 22207.04 15000.00 70
## 94 1.9556 1.2833 2.0000 92 14.746190 22207.04 22859.85 150
## 95 1.9556 2.0000 1.3673 104 16.448150 22207.04 22859.85 200
## 103 2.2656 2.0741 1.3333 72 10.789750 62269.23 16624.89 110
## 105 1.0000 1.0000 1.3333 126 29.077150 350000.00 16624.89 160
## 107 2.2656 1.0000 1.3333 108 16.366360 62269.23 16624.89 119
## 110 1.9556 2.0000 1.3673 125 17.069740 100000.00 10000.00 160
## 116 1.9556 1.2833 1.3673 32 5.707966 2500.00 22859.85 80
## 118 1.0000 1.2833 1.3673 72 15.481870 22207.04 22859.85 100
## 127 1.9556 1.2833 1.3673 100 15.137300 22207.04 22859.85 150
## 134 2.2656 2.0741 1.3333 70 9.122898 230000.00 3500.00 100
## 136 1.0000 2.2222 1.4091 92 16.337260 292857.20 71570.55 70
## 140 2.2656 2.0000 1.3333 100 15.154040 2000.00 2000.00 185
## 141 2.0000 2.0000 2.0000 175 25.000000 292857.20 25000.00 175
## 151 1.9231 1.5946 1.5357 86 14.206890 17166.67 27177.04 70
## 152 1.9556 1.2833 1.0000 80 15.270380 22207.04 6500.00 120
## 154 4.0000 1.2833 1.3673 240 27.743740 22207.04 30000.00 110
## 187 4.0000 3.0000 1.0000 313 34.777780 17166.67 72000.00 410
## 189 4.3590 2.2222 1.4091 150 16.684650 292857.20 71570.55 70
## 200 1.0000 3.0000 1.3333 102 16.105350 62269.23 16624.89 140
## 201 1.0000 2.0000 1.0000 92 18.400000 62269.23 5000.00 140
## 203 1.9556 1.2833 1.3673 90 16.053660 22207.04 22859.85 80
## 219 2.0000 2.0000 1.3333 170 26.842250 62269.23 16624.89 335
## 220 2.0000 1.5946 1.5357 160 22.439450 17166.67 27177.04 450
## 221 1.9556 1.0000 1.3673 63 11.835650 22207.04 22859.85 100
## 230 2.2656 2.0000 1.3333 126 16.581350 62269.23 16624.89 380
## 237 1.9556 1.2833 1.3673 87 15.518530 22207.04 22859.85 171
## 245 1.9556 1.2833 1.0000 82 15.652140 6000.00 5000.00 65
## 246 1.9556 3.0000 1.3673 160 19.224070 22207.04 5000.00 180
## 251 1.0000 2.0000 1.0000 171 28.500000 292857.20 1483.00 260
## 260 2.0000 1.0000 1.0000 165 27.500000 292857.20 71570.55 350
## 262 1.9556 1.2833 1.3673 62 11.059180 22207.04 1500.00 72
## 270 1.9556 1.2833 1.0000 55 10.498390 40000.00 48450.00 80
## 271 1.0000 1.0000 2.0000 173 28.833330 62269.23 16624.89 200
## 272 1.9556 1.2833 1.3673 87 15.518530 22207.04 22859.85 100
## 274 1.9556 1.0000 1.3673 90 16.908080 22207.04 22859.85 110
## 278 1.9556 1.2833 1.3673 100 15.137300 22207.04 5000.00 80
## 283 1.0000 1.0000 1.5357 192 34.683960 17166.67 2000.00 150
## 289 1.0000 2.2222 1.0000 86 16.468150 292857.20 71570.55 250
## 290 2.0000 3.0000 1.3333 186 25.363750 62269.23 16624.89 213
## 305 1.9231 1.5946 1.0000 65 9.972843 17166.67 10000.00 100
## 308 3.0000 1.2833 1.3673 162 24.358700 22207.04 22859.85 190
## 317 1.9556 1.2833 1.3673 52 9.275445 22207.04 22859.85 60
## 320 2.2656 1.0000 2.0000 94 15.002550 62269.23 1900.00 200
## 322 1.9556 1.2833 1.0000 57 10.880150 22207.04 22859.85 70
## 330 1.0000 2.0741 1.3333 70 12.945220 62269.23 16624.89 140
## 331 1.0000 1.5946 3.0000 122 18.499980 17166.67 27177.04 250
## 334 1.9231 1.5946 2.0000 113 17.337400 17166.67 5000.00 220
## 341 1.0000 1.0000 1.3673 55 11.982310 22207.04 5000.00 80
## 342 2.2656 2.0741 1.3333 104 13.554020 62269.23 16624.89 100
## 350 1.9556 1.2833 1.3673 45 8.026828 22207.04 22859.85 60
## 352 4.0000 1.0000 1.0000 228 32.571430 17166.67 27177.04 400
## 356 1.0000 1.2833 1.3673 150 26.545850 10000.00 22859.85 115
## 357 3.0000 2.0741 1.3333 167 22.545020 62269.23 5000.00 40
## 368 2.0000 1.2833 1.3673 163 28.846490 22207.04 22859.85 140
## 377 3.0000 2.0000 1.3333 176 24.000110 60000.00 30000.00 300
## 381 2.0000 3.0000 1.5357 122 16.189600 9000.00 7600.00 140
## 384 1.0000 1.2833 1.3673 105 22.577730 20000.00 22859.85 140
## 385 3.0000 1.0000 1.3673 250 33.933730 22207.04 22500.00 130
## 393 2.2656 3.0000 1.3333 221 23.023470 62269.23 2450.00 454
## 395 1.9231 1.0000 1.5357 104 19.051810 17166.67 80000.00 100
## 397 2.0000 1.5946 2.0000 170 25.778670 17166.67 27177.04 375
Kesimpulannya, model ini memiliki Adjusted R-squared : 0.795 sehingga dapat memprediksi 80%. Model ini juga masih kurang baik jika hendak digunakan untuk memprediksi harga penjualan busana terkait dataset ini dan untuk mendapatkan model lebih baik lagi maka dapat dilakukan dengan metode yang lain.