This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015.
## 'data.frame': 21613 obs. of 21 variables:
## $ id : num 7129300520 6414100192 5631500400 2487200875 1954400510 ...
## $ date : Factor w/ 372 levels "20140502T000000",..: 165 221 291 221 284 11 57 252 340 306 ...
## $ price : num 221900 538000 180000 604000 510000 ...
## $ bedrooms : int 3 3 2 4 3 4 3 3 3 3 ...
## $ bathrooms : num 1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
## $ sqft_living : int 1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
## $ sqft_lot : int 5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
## $ floors : num 1 2 1 1 1 1 2 1 1 2 ...
## $ waterfront : int 0 0 0 0 0 0 0 0 0 0 ...
## $ view : int 0 0 0 0 0 0 0 0 0 0 ...
## $ condition : int 3 3 3 5 3 3 3 3 3 3 ...
## $ grade : int 7 7 6 7 8 11 7 7 7 7 ...
## $ sqft_above : int 1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
## $ sqft_basement: int 0 400 0 910 0 1530 0 0 730 0 ...
## $ yr_built : int 1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
## $ yr_renovated : int 0 1991 0 0 0 0 0 0 0 0 ...
## $ zipcode : int 98178 98125 98028 98136 98074 98053 98003 98198 98146 98038 ...
## $ lat : num 47.5 47.7 47.7 47.5 47.6 ...
## $ long : num -122 -122 -122 -122 -122 ...
## $ sqft_living15: int 1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
## $ sqft_lot15 : int 5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...
# without predictor
modelwithout <- lm(formula = price~1, data = dfh)
#check summary model
summary(modelwithout)##
## Call:
## lm(formula = price ~ 1, data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -465088 -218138 -90088 104912 7159912
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 540088 2497 216.3 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 367100 on 21612 degrees of freedom
## [1] 540088.1
jika membuat model tanpa prediktor, maka intercept sama dnegan rata-rata dari data.
\(y = 0\) $price = 540088 $
# model bathrooms
model_bathrooms <- lm(formula = price~bathrooms, data = dfh)
summary(model_bathrooms)##
## Call:
## lm(formula = price ~ bathrooms, data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1438157 -184525 -41525 113220 5925322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10708 6211 1.724 0.0847 .
## bathrooms 250327 2760 90.714 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 312400 on 21611 degrees of freedom
## Multiple R-squared: 0.2758, Adjusted R-squared: 0.2757
## F-statistic: 8229 on 1 and 21611 DF, p-value: < 0.00000000000000022
price = 10708 + 250327 ∗ bathrooms
# model sqft_living
model_s_living <- lm(formula = price~sqft_living, data = dfh)
summary(model_s_living)##
## Call:
## lm(formula = price ~ sqft_living, data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1476062 -147486 -24043 106182 4362067
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -43580.743 4402.690 -9.899 <0.0000000000000002 ***
## sqft_living 280.624 1.936 144.920 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 261500 on 21611 degrees of freedom
## Multiple R-squared: 0.4929, Adjusted R-squared: 0.4928
## F-statistic: 2.1e+04 on 1 and 21611 DF, p-value: < 0.00000000000000022
price = −43580.743 +280.64 ∗ sqftliving
##
## Call:
## lm(formula = price ~ sqft_above, data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -913132 -165624 -41468 109327 5339232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59953.2 4729.8 12.68 <0.0000000000000002 ***
## sqft_above 268.5 2.4 111.87 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 292200 on 21611 degrees of freedom
## Multiple R-squared: 0.3667, Adjusted R-squared: 0.3667
## F-statistic: 1.251e+04 on 1 and 21611 DF, p-value: < 0.00000000000000022
price = 59953.2 + 268.5 ∗ sqftabove
##
## Call:
## lm(formula = price ~ sqft_living15, data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -842924 -162153 -43709 95431 6547397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -82807.195 6208.029 -13.34 <0.0000000000000002 ***
## sqft_living15 313.556 2.954 106.14 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 297700 on 21611 degrees of freedom
## Multiple R-squared: 0.3427, Adjusted R-squared: 0.3426
## F-statistic: 1.127e+04 on 1 and 21611 DF, p-value: < 0.00000000000000022
price = -82807.195 + 313.556 *
# multiple predictor
model_multi <- lm(formula = price~ bathrooms + sqft_living + sqft_above + sqft_living15, data = dfh)
summary(model_multi)##
## Call:
## lm(formula = price ~ bathrooms + sqft_living + sqft_above + sqft_living15,
## data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1261636 -145800 -23541 106514 4596663
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -97255.673 6100.663 -15.942 < 0.0000000000000002 ***
## bathrooms -2616.552 3501.065 -0.747 0.455
## sqft_living 269.116 4.700 57.263 < 0.0000000000000002 ***
## sqft_above -37.074 4.549 -8.149 0.000000000000000386 ***
## sqft_living15 75.229 4.033 18.655 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 259300 on 21608 degrees of freedom
## Multiple R-squared: 0.5013, Adjusted R-squared: 0.5012
## F-statistic: 5431 on 4 and 21608 DF, p-value: < 0.00000000000000022
Dari summary diatas kita tau bahwa ada target yang tidak memiliki korelasi yaitu yang nilai (p-value > 0.05), sedangkan ada 3 target yang berkolerasi bagus yaitu sqft_living, sqft_above dan sqft_living15 yang nilai (p-value < 0.05)
##
## Call:
## lm(formula = price ~ ., data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1329723 -107135 -10257 83039 4161420
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -51820690.20472 1631235.72956 -31.768 < 0.0000000000000002 ***
## bedrooms -45061.50431 1973.54316 -22.833 < 0.0000000000000002 ***
## bathrooms 53904.80545 3414.75936 15.786 < 0.0000000000000002 ***
## sqft_living 183.14124 4.53440 40.389 < 0.0000000000000002 ***
## sqft_lot 0.17084 0.05045 3.386 0.000710 ***
## floors 14544.76601 3756.31925 3.872 0.000108 ***
## waterfront 569227.68416 18273.35744 31.151 < 0.0000000000000002 ***
## view 57215.00632 2236.84375 25.578 < 0.0000000000000002 ***
## condition 34249.71009 2459.48103 13.926 < 0.0000000000000002 ***
## sqft_above 61.30804 4.54229 13.497 < 0.0000000000000002 ***
## sqft_basement NA NA NA NA
## yr_built -1713.19198 73.88326 -23.188 < 0.0000000000000002 ***
## yr_renovated 27.64023 3.84548 7.188 0.00000000000068 ***
## lat 637283.46123 10883.25939 58.556 < 0.0000000000000002 ***
## long -201913.59418 12385.91912 -16.302 < 0.0000000000000002 ***
## sqft_living15 72.48159 3.45852 20.957 < 0.0000000000000002 ***
## sqft_lot15 -0.48238 0.07712 -6.255 0.00000000040604 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 211900 on 21597 degrees of freedom
## Multiple R-squared: 0.667, Adjusted R-squared: 0.6668
## F-statistic: 2884 on 15 and 21597 DF, p-value: < 0.00000000000000022
summary diatas terlihat bahwa prediktor bathroom ketika dimodelkan dengan semua prediktor akan memiliki pvalue < 0.05 dan terdapat NA pada prediktor sqft_basement yang artinya mungkin saja terdapat multicolinierity.
## Start: AIC=553875.8
## price ~ 1
##
## Df Sum of Sq RSS AIC
## + sqft_living 1 1435640399598809 1477276362322490 539204
## + sqft_above 1 1068200811636164 1844715950285135 544004
## + sqft_living15 1 998164703117973 1914752058803326 544810
## + bathrooms 1 803293306497671 2109623455423628 546904
## + view 1 459780944970999 2453135816950300 550165
## + sqft_basement 1 305439174800922 2607477587120376 551484
## + bedrooms 1 276958595500073 2635958166421226 551718
## + lat 1 274545716008578 2638371045912720 551738
## + waterfront 1 206679237434408 2706237524486890 552287
## + floors 1 192086763313773 2720829998607526 552403
## + yr_renovated 1 46564442910138 2866352319011162 553529
## + sqft_lot 1 23417141523777 2889499620397522 553703
## + sqft_lot15 1 19800647694735 2893116114226564 553730
## + yr_built 1 8497693415832 2904419068505468 553815
## + condition 1 3851399435633 2909065362485666 553849
## + long 1 1362354570271 2911554407351028 553868
## <none> 2912916761921299 553876
##
## Step: AIC=539203.5
## price ~ sqft_living
##
## Df Sum of Sq RSS AIC
## + lat 1 213137924966058 1264138437356432 535838
## + view 1 123620038415268 1353656323907222 537317
## + waterfront 1 110238185400763 1367038176921727 537529
## + yr_built 1 92854405407200 1384421956915290 537802
## + long 1 66817275084005 1410459087238485 538205
## + bedrooms 1 40635382190095 1436640980132395 538603
## + yr_renovated 1 22404898015578 1454871464306912 538875
## + sqft_living15 1 20108677275202 1457167685047288 538909
## + condition 1 17605348260420 1459671014062070 538946
## + sqft_lot15 1 6440740801824 1470835621520666 539111
## + sqft_lot 1 3011349102420 1474265013220070 539161
## + sqft_above 1 1216499294160 1476059863028330 539188
## + sqft_basement 1 1216499294160 1476059863028330 539188
## + floors 1 229913654973 1477046448667517 539202
## + bathrooms 1 147193010785 1477129169311705 539203
## <none> 1477276362322490 539204
##
## Step: AIC=535838
## price ~ sqft_living + lat
##
## Df Sum of Sq RSS AIC
## + view 1 126630803103906 1137507634252527 533559
## + waterfront 1 116457216515170 1147681220841262 533751
## + yr_built 1 51903847813610 1212234589542822 534934
## + long 1 36167023609974 1227971413746458 535213
## + bedrooms 1 32254157675667 1231884279680765 535281
## + condition 1 19095077947046 1245043359409386 535511
## + yr_renovated 1 18896934952437 1245241502403995 535515
## + sqft_living15 1 18324966511740 1245813470844692 535524
## + sqft_lot15 1 1242925578859 1262895511777573 535819
## <none> 1264138437356432 535838
## + sqft_lot 1 109125131288 1264029312225144 535838
## + sqft_above 1 103865920728 1264034571435704 535838
## + sqft_basement 1 103865920728 1264034571435704 535838
## + bathrooms 1 2294249118 1264136143107314 535840
## + floors 1 29322202 1264138408034231 535840
##
## Step: AIC=533558.7
## price ~ sqft_living + lat + view
##
## Df Sum of Sq RSS AIC
## + waterfront 1 48301384600438 1089206249652089 532623
## + yr_built 1 29685166215947 1107822468036580 532989
## + bedrooms 1 20105107128838 1117402527123689 533175
## + long 1 18126102858463 1119381531394064 533214
## + condition 1 13259272107760 1124248362144767 533307
## + yr_renovated 1 11033349370024 1126474284882502 533350
## + sqft_living15 1 9777260266168 1127730373986359 533374
## + sqft_above 1 5649302765515 1131858331487012 533453
## + sqft_basement 1 5649302765514 1131858331487012 533453
## + sqft_lot15 1 1822194970820 1135685439281707 533526
## + floors 1 790838989632 1136716795262894 533546
## + sqft_lot 1 392067203463 1137115567049064 533553
## + bathrooms 1 192700592756 1137314933659771 533557
## <none> 1137507634252527 533559
##
## Step: AIC=532622.9
## price ~ sqft_living + lat + view + waterfront
##
## Df Sum of Sq RSS AIC
## + yr_built 1 29367526832499 1059838722819591 532034
## + bedrooms 1 17478610279237 1071727639372852 532275
## + long 1 17464543757804 1071741705894285 532276
## + condition 1 13417265900615 1075788983751474 532357
## + sqft_living15 1 11169376379955 1078036873272134 532402
## + yr_renovated 1 8586367580730 1080619882071360 532454
## + sqft_above 1 4670597698794 1084535651953295 532532
## + sqft_basement 1 4670597698794 1084535651953295 532532
## + sqft_lot15 1 1861231101284 1087345018550805 532588
## + floors 1 572161272649 1088634088379440 532614
## + sqft_lot 1 316492359462 1088889757292628 532619
## + bathrooms 1 234351649857 1088971898002233 532620
## <none> 1089206249652089 532623
##
## Step: AIC=532034.2
## price ~ sqft_living + lat + view + waterfront + yr_built
##
## Df Sum of Sq RSS AIC
## + bedrooms 1 20688901922724 1039149820896867 531610
## + sqft_living15 1 18321662372237 1041517060447353 531659
## + sqft_above 1 15037471026572 1044801251793019 531727
## + sqft_basement 1 15037471026571 1044801251793019 531727
## + floors 1 11793327955809 1048045394863782 531794
## + bathrooms 1 9687073571957 1050151649247634 531838
## + long 1 6492502179478 1053346220640113 531903
## + condition 1 3277674240866 1056561048578725 531969
## + yr_renovated 1 2732915035417 1057105807784173 531980
## + sqft_lot15 1 1862994537472 1057975728282119 531998
## + sqft_lot 1 415711282743 1059423011536848 532028
## <none> 1059838722819591 532034
##
## Step: AIC=531610.1
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms
##
## Df Sum of Sq RSS AIC
## + bathrooms 1 16549684599818 1022600136297049 531265
## + sqft_living15 1 15977295635070 1023172525261797 531277
## + sqft_above 1 12393157291328 1026756663605539 531353
## + sqft_basement 1 12393157291328 1026756663605539 531353
## + floors 1 11258034321641 1027891786575226 531377
## + long 1 6897822190854 1032251998706014 531468
## + condition 1 4437764244632 1034712056652235 531520
## + sqft_lot15 1 3328710661702 1035821110235166 531543
## + yr_renovated 1 2454309574146 1036695511322721 531561
## + sqft_lot 1 1110404179464 1038039416717403 531589
## <none> 1039149820896867 531610
##
## Step: AIC=531265.2
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms
##
## Df Sum of Sq RSS AIC
## + sqft_living15 1 18350055736756 1004250080560294 530876
## + sqft_above 1 13904485810531 1008695650486518 530971
## + sqft_basement 1 13904485810531 1008695650486518 530971
## + floors 1 5885883452763 1016714252844286 531142
## + long 1 5034210071147 1017565926225902 531160
## + condition 1 3931668236404 1018668468060645 531184
## + sqft_lot15 1 2353625870920 1020246510426129 531217
## + yr_renovated 1 950579379520 1021649556917529 531247
## + sqft_lot 1 713608944368 1021886527352680 531252
## <none> 1022600136297049 531265
##
## Step: AIC=530875.8
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15
##
## Df Sum of Sq RSS AIC
## + long 1 10805533494283 993444547066010 530644
## + sqft_above 1 8499381288515 995750699271779 530694
## + sqft_basement 1 8499381288515 995750699271779 530694
## + floors 1 6586050836779 997664029723515 530736
## + condition 1 4245848049028 1000004232511265 530786
## + sqft_lot15 1 3215905185938 1001034175374355 530808
## + yr_renovated 1 1227639962617 1003022440597676 530851
## + sqft_lot 1 817312968376 1003432767591917 530860
## <none> 1004250080560294 530876
##
## Step: AIC=530644
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long
##
## Df Sum of Sq RSS AIC
## + sqft_above 1 11430787297210 982013759768800 530396
## + sqft_basement 1 11430787297210 982013759768800 530396
## + floors 1 5113171922797 988331375143213 530534
## + condition 1 4867275002312 988577272063699 530540
## + yr_renovated 1 1510892433283 991933654632728 530613
## + sqft_lot15 1 1242605733929 992201941332081 530619
## <none> 993444547066010 530644
## + sqft_lot 1 54136120758 993390410945252 530645
##
## Step: AIC=530395.9
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long + sqft_above
##
## Df Sum of Sq RSS AIC
## + condition 1 7030802787234 974982956981566 530243
## + sqft_lot15 1 1394571936760 980619187832040 530367
## + yr_renovated 1 1154163783763 980859595985037 530372
## + floors 1 589657336434 981424102432366 530385
## + sqft_lot 1 123985408301 981889774360499 530395
## <none> 982013759768800 530396
##
## Step: AIC=530242.6
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long + sqft_above + condition
##
## Df Sum of Sq RSS AIC
## + yr_renovated 1 2427753578407 972555203403159 530191
## + sqft_lot15 1 1448160668390 973534796313176 530212
## + floors 1 944508155898 974038448825669 530224
## + sqft_lot 1 114782670508 974868174311059 530242
## <none> 974982956981566 530243
##
## Step: AIC=530190.7
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long + sqft_above + condition +
## yr_renovated
##
## Df Sum of Sq RSS AIC
## + sqft_lot15 1 1454398952337 971100804450822 530160
## + floors 1 789954019031 971765249384128 530175
## + sqft_lot 1 105333375183 972449870027976 530190
## <none> 972555203403159 530191
##
## Step: AIC=530160.3
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long + sqft_above + condition +
## yr_renovated + sqft_lot15
##
## Df Sum of Sq RSS AIC
## + floors 1 649871043338 970450933407484 530148
## + sqft_lot 1 491469266990 970609335183832 530151
## <none> 971100804450822 530160
##
## Step: AIC=530147.9
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long + sqft_above + condition +
## yr_renovated + sqft_lot15 + floors
##
## Df Sum of Sq RSS AIC
## + sqft_lot 1 514943987546 969935989419938 530138
## <none> 970450933407484 530148
##
## Step: AIC=530138.4
## price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
## bathrooms + sqft_living15 + long + sqft_above + condition +
## yr_renovated + sqft_lot15 + floors + sqft_lot
##
## Df Sum of Sq RSS AIC
## <none> 969935989419938 530138
##
## Call:
## lm(formula = price ~ sqft_living + lat + view + waterfront +
## yr_built + bedrooms + bathrooms + sqft_living15 + long +
## sqft_above + condition + yr_renovated + sqft_lot15 + floors +
## sqft_lot, data = dfh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1329723 -107135 -10257 83039 4161420
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -51820690.20472 1631235.72956 -31.768 < 0.0000000000000002 ***
## sqft_living 183.14124 4.53440 40.389 < 0.0000000000000002 ***
## lat 637283.46123 10883.25939 58.556 < 0.0000000000000002 ***
## view 57215.00632 2236.84375 25.578 < 0.0000000000000002 ***
## waterfront 569227.68416 18273.35744 31.151 < 0.0000000000000002 ***
## yr_built -1713.19198 73.88326 -23.188 < 0.0000000000000002 ***
## bedrooms -45061.50431 1973.54316 -22.833 < 0.0000000000000002 ***
## bathrooms 53904.80545 3414.75936 15.786 < 0.0000000000000002 ***
## sqft_living15 72.48159 3.45852 20.957 < 0.0000000000000002 ***
## long -201913.59418 12385.91912 -16.302 < 0.0000000000000002 ***
## sqft_above 61.30804 4.54229 13.497 < 0.0000000000000002 ***
## condition 34249.71009 2459.48103 13.926 < 0.0000000000000002 ***
## yr_renovated 27.64023 3.84548 7.188 0.00000000000068 ***
## sqft_lot15 -0.48238 0.07712 -6.255 0.00000000040604 ***
## floors 14544.76601 3756.31925 3.872 0.000108 ***
## sqft_lot 0.17084 0.05045 3.386 0.000710 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 211900 on 21597 degrees of freedom
## Multiple R-squared: 0.667, Adjusted R-squared: 0.6668
## F-statistic: 2884 on 15 and 21597 DF, p-value: < 0.00000000000000022
## Start: AIC=530138.4
## price ~ bedrooms + bathrooms + sqft_living + sqft_lot + floors +
## waterfront + view + condition + sqft_above + sqft_basement +
## yr_built + yr_renovated + lat + long + sqft_living15 + sqft_lot15
##
##
## Step: AIC=530138.4
## price ~ bedrooms + bathrooms + sqft_living + sqft_lot + floors +
## waterfront + view + condition + sqft_above + yr_built + yr_renovated +
## lat + long + sqft_living15 + sqft_lot15
##
## Df Sum of Sq RSS AIC
## <none> 969935989419937 530138
## - sqft_lot 1 514943987546 970450933407484 530148
## - floors 1 673345763894 970609335183832 530151
## - sqft_lot15 1 1756900127612 971692889547549 530176
## - yr_renovated 1 2320239916195 972256229336132 530188
## - sqft_above 1 8181532252160 978117521672097 530318
## - condition 1 8709164479210 978645153899148 530330
## - bathrooms 1 11191399594868 981127389014805 530384
## - long 1 11935051171834 981871040591772 530401
## - sqft_living15 1 19725372733472 989661362153409 530572
## - bedrooms 1 23413576500218 993349565920156 530652
## - yr_built 1 24147354886282 994083344306220 530668
## - view 1 29383142925346 999319132345283 530781
## - waterfront 1 43579774079066 1013515763499003 531086
## - sqft_living 1 73262717718286 1043198707138223 531710
## - lat 1 153991582273224 1123927571693161 533321
## [1] 0.6667911
## [1] 0.6667911
adj.rsquared dari kedua model stepwise tidak didapatkan perbedaan
dfh$price_s_living <- predict(object = model_s_living, newdata = dfh)
dfh$priceback <-predict(object = model_backward, newdata = dfh)## [1] 211842.9
## [1] 261440.8
## [1] 44877434388
## [1] 68351286833
Dari hasil prediksi diatas dan lalu pengujian error dengan menggunakan RMSE dan MSE, didapat model multiple linear regression (backward) adalah yang terbaik dengan error yang lebih rendah dibandingkan single linear regresion
##
## studentized Breusch-Pagan test
##
## data: model_backward
## BP = 3117.3, df = 15, p-value < 0.00000000000000022
Kesimpulannya Tolak H0 jika p-value < alpha (0.05). Berdasarkan nilai p-value yang diatas maka dapat diketahui bahwa model telah memenuhi asumsi homoscedasticity.
## bedrooms bathrooms sqft_living sqft_lot floors
## 1.621296 3.328365 8.346160 2.101672 1.979885
## waterfront view condition sqft_above yr_built
## 1.202782 1.413950 1.232683 6.808499 2.266450
## yr_renovated lat long sqft_living15 sqft_lot15
## 1.148164 1.094365 1.464138 2.703973 2.133977
vif > 10 = Multicoliniearity vif < 10 =No-Multicolinearity
Kesimpulannya adalah dari model terpilih yaitu model_backward tidak ada korelasi antar prediktor.
dari hasil model regresi, didapat model dengan menggunakan stepwise backward lah yang terbaik dikarnakan lolos semua uji asumsi dan mendapat adj. r squared terbesar hal ini bisa jadi disebabkan karena banyaknya outlier, dan diapat prediktor yang mempengaruhi harga rumah antara lain:
bedrooms bathrooms sqft_living sqft_lot floors waterfront view condition sqft_above yr_built yr_renovated lat long sqft_living15 sqft_lot15
yang terdapat didalamnya, sehingga jarak antara garis linear model dengan data tersebut menjadi besar. setelah dilakukan uji test yaitu normality error, homoscedascity dan no-multicolinearity bahwa model dapat diterima