created by Reza Lutfi Ismail
Proyek kali ini membahas tentang prediksi harga rumah menggunakan regressi model pada harga rumah di King County, USA. hal ini dilakukan karena target atau y pada kasus ini adalah numerik. data pada proyek ini saya dapatkan di kaggle dengan link: https://www.kaggle.com/harlfoxem/housesalesprediction
#> Rows: 21,613
#> Columns: 21
#> $ id <chr> "7129300520", "6414100192", "5631500400", "2487200875...
#> $ date <dttm> 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015...
#> $ price <dbl> 221900, 538000, 180000, 604000, 510000, 1225000, 2575...
#> $ bedrooms <dbl> 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4,...
#> $ bathrooms <dbl> 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00,...
#> $ sqft_living <dbl> 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, ...
#> $ sqft_lot <dbl> 5650, 7242, 10000, 5000, 8080, 101930, 6819, 9711, 74...
#> $ floors <dbl> 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0...
#> $ waterfront <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
#> $ view <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0,...
#> $ condition <dbl> 3, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 4,...
#> $ grade <dbl> 7, 7, 6, 7, 8, 11, 7, 7, 7, 7, 8, 7, 7, 7, 7, 9, 7, 7...
#> $ sqft_above <dbl> 1180, 2170, 770, 1050, 1680, 3890, 1715, 1060, 1050, ...
#> $ sqft_basement <dbl> 0, 400, 0, 910, 0, 1530, 0, 0, 730, 0, 1700, 300, 0, ...
#> $ yr_built <dbl> 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960,...
#> $ yr_renovated <dbl> 0, 1991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
#> $ zipcode <dbl> 98178, 98125, 98028, 98136, 98074, 98053, 98003, 9819...
#> $ lat <dbl> 47.5112, 47.7210, 47.7379, 47.5208, 47.6168, 47.6561,...
#> $ long <dbl> -122.257, -122.319, -122.233, -122.393, -122.045, -12...
#> $ sqft_living15 <dbl> 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780,...
#> $ sqft_lot15 <dbl> 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 811...
Delete Column: - id - date - zipcode
Membuat model linear regressi tanpa prediktor dan prediktor
#>
#> Call:
#> lm(formula = price ~ 1, data = house_new)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -465088 -218138 -90088 104912 7159912
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 540088 2497 216.3 <0.0000000000000002 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 367100 on 21612 degrees of freedom
berdasarkan data diatas, kita dapat formula
\[y = b0\] \[Price = 540088\] ketika tidak menggunakan prediktor kedalam model, maka intercept sama saja dnegan rata-rata dari data tersebut
#> [1] 540088.1
sqft_living predictor\[price = -43580.743 + 280.64*sqftliving\]
## Model with
above predictor
\[price = 59953.2 + 268.5*sqftabove\]
## Model with
sqft_living15 predictor
\[price = -82807.195 + 313.556*sqftliving15\]
#> [1] 0.4928532
#> [1] 0.3667118
#> [1] 0.3426685
Dari ketiga model diatas, didapat bahwa model dengan prediktor sqft_living memiliki pola yang lebih baik dengan nilai r-squared yaitu 0.43. maka dari itu, kita akan memilih prediktor sqft_living untuk dijadikan prediksi
dari correlation plot (ggcorr) kita dapat bahwa prediktor yang memiliki nilai korelasi tertinggi adalah
- sqft_living - sqft above - sqft_living15
# multiple predictor
model_multiple <- lm(formula = price~sqft_living + sqft_above + sqft_living15, data = house_new)#>
#> Call:
#> lm(formula = price ~ sqft_living + sqft_above + sqft_living15,
#> data = house_new)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1258201 -146093 -23313 106499 4597726
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -99360.984 5411.285 -18.362 <0.0000000000000002 ***
#> sqft_living 267.633 4.260 62.821 <0.0000000000000002 ***
#> sqft_above -37.340 4.535 -8.233 <0.0000000000000002 ***
#> sqft_living15 75.295 4.031 18.677 <0.0000000000000002 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 259300 on 21609 degrees of freedom
#> Multiple R-squared: 0.5013, Adjusted R-squared: 0.5013
#> F-statistic: 7241 on 3 and 21609 DF, p-value: < 0.00000000000000022
we get formula
\[Price = -99360.984 + 267.633*sqft living - 37.340*sqftabove + 75.295*sqftliving15\] from the summary, we all know that all of predictor have corralation with the target/price (p-value < 0.05)
#>
#> Call:
#> lm(formula = price ~ ., data = house_new)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1329723 -107135 -10257 83039 4161420
#>
#> Coefficients: (1 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -51820690.20472 1631235.72956 -31.768 < 0.0000000000000002 ***
#> bedrooms -45061.50431 1973.54316 -22.833 < 0.0000000000000002 ***
#> bathrooms 53904.80545 3414.75936 15.786 < 0.0000000000000002 ***
#> sqft_living 183.14124 4.53440 40.389 < 0.0000000000000002 ***
#> sqft_lot 0.17084 0.05045 3.386 0.000710 ***
#> floors 14544.76601 3756.31925 3.872 0.000108 ***
#> waterfront 569227.68416 18273.35744 31.151 < 0.0000000000000002 ***
#> view 57215.00632 2236.84375 25.578 < 0.0000000000000002 ***
#> condition 34249.71009 2459.48103 13.926 < 0.0000000000000002 ***
#> sqft_above 61.30804 4.54229 13.497 < 0.0000000000000002 ***
#> sqft_basement NA NA NA NA
#> yr_built -1713.19198 73.88326 -23.188 < 0.0000000000000002 ***
#> yr_renovated 27.64023 3.84548 7.188 0.00000000000068 ***
#> lat 637283.46123 10883.25939 58.556 < 0.0000000000000002 ***
#> long -201913.59418 12385.91912 -16.302 < 0.0000000000000002 ***
#> sqft_living15 72.48159 3.45852 20.957 < 0.0000000000000002 ***
#> sqft_lot15 -0.48238 0.07712 -6.255 0.00000000040604 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 211900 on 21597 degrees of freedom
#> Multiple R-squared: 0.667, Adjusted R-squared: 0.6668
#> F-statistic: 2884 on 15 and 21597 DF, p-value: < 0.00000000000000022
#> Start: AIC=530138.4
#> price ~ bedrooms + bathrooms + sqft_living + sqft_lot + floors +
#> waterfront + view + condition + sqft_above + sqft_basement +
#> yr_built + yr_renovated + lat + long + sqft_living15 + sqft_lot15
#>
#>
#> Step: AIC=530138.4
#> price ~ bedrooms + bathrooms + sqft_living + sqft_lot + floors +
#> waterfront + view + condition + sqft_above + yr_built + yr_renovated +
#> lat + long + sqft_living15 + sqft_lot15
#>
#> Df Sum of Sq RSS AIC
#> <none> 969935989419937 530138
#> - sqft_lot 1 514943987546 970450933407484 530148
#> - floors 1 673345763894 970609335183832 530151
#> - sqft_lot15 1 1756900127612 971692889547549 530176
#> - yr_renovated 1 2320239916195 972256229336132 530188
#> - sqft_above 1 8181532252160 978117521672097 530318
#> - condition 1 8709164479210 978645153899148 530330
#> - bathrooms 1 11191399594868 981127389014805 530384
#> - long 1 11935051171834 981871040591772 530401
#> - sqft_living15 1 19725372733472 989661362153409 530572
#> - bedrooms 1 23413576500218 993349565920156 530652
#> - yr_built 1 24147354886282 994083344306220 530668
#> - view 1 29383142925346 999319132345283 530781
#> - waterfront 1 43579774079066 1013515763499003 531086
#> - sqft_living 1 73262717718286 1043198707138223 531710
#> - lat 1 153991582273224 1123927571693161 533321
#>
#> Call:
#> lm(formula = price ~ bedrooms + bathrooms + sqft_living + sqft_lot +
#> floors + waterfront + view + condition + sqft_above + yr_built +
#> yr_renovated + lat + long + sqft_living15 + sqft_lot15, data = house_new)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1329723 -107135 -10257 83039 4161420
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -51820690.20472 1631235.72956 -31.768 < 0.0000000000000002 ***
#> bedrooms -45061.50431 1973.54316 -22.833 < 0.0000000000000002 ***
#> bathrooms 53904.80545 3414.75936 15.786 < 0.0000000000000002 ***
#> sqft_living 183.14124 4.53440 40.389 < 0.0000000000000002 ***
#> sqft_lot 0.17084 0.05045 3.386 0.000710 ***
#> floors 14544.76601 3756.31925 3.872 0.000108 ***
#> waterfront 569227.68416 18273.35744 31.151 < 0.0000000000000002 ***
#> view 57215.00632 2236.84375 25.578 < 0.0000000000000002 ***
#> condition 34249.71009 2459.48103 13.926 < 0.0000000000000002 ***
#> sqft_above 61.30804 4.54229 13.497 < 0.0000000000000002 ***
#> yr_built -1713.19198 73.88326 -23.188 < 0.0000000000000002 ***
#> yr_renovated 27.64023 3.84548 7.188 0.00000000000068 ***
#> lat 637283.46123 10883.25939 58.556 < 0.0000000000000002 ***
#> long -201913.59418 12385.91912 -16.302 < 0.0000000000000002 ***
#> sqft_living15 72.48159 3.45852 20.957 < 0.0000000000000002 ***
#> sqft_lot15 -0.48238 0.07712 -6.255 0.00000000040604 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 211900 on 21597 degrees of freedom
#> Multiple R-squared: 0.667, Adjusted R-squared: 0.6668
#> F-statistic: 2884 on 15 and 21597 DF, p-value: < 0.00000000000000022
forward <- step(object = model, direction = "forward", scope = list(lower = model, upper = model_all))#> Start: AIC=553875.8
#> price ~ 1
#>
#> Df Sum of Sq RSS AIC
#> + sqft_living 1 1435640399598809 1477276362322490 539204
#> + sqft_above 1 1068200811636164 1844715950285135 544004
#> + sqft_living15 1 998164703117973 1914752058803326 544810
#> + bathrooms 1 803293306497671 2109623455423628 546904
#> + view 1 459780944970999 2453135816950300 550165
#> + sqft_basement 1 305439174800922 2607477587120376 551484
#> + bedrooms 1 276958595500073 2635958166421226 551718
#> + lat 1 274545716008578 2638371045912720 551738
#> + waterfront 1 206679237434408 2706237524486890 552287
#> + floors 1 192086763313773 2720829998607526 552403
#> + yr_renovated 1 46564442910138 2866352319011162 553529
#> + sqft_lot 1 23417141523777 2889499620397522 553703
#> + sqft_lot15 1 19800647694735 2893116114226564 553730
#> + yr_built 1 8497693415832 2904419068505468 553815
#> + condition 1 3851399435633 2909065362485666 553849
#> + long 1 1362354570271 2911554407351028 553868
#> <none> 2912916761921299 553876
#>
#> Step: AIC=539203.5
#> price ~ sqft_living
#>
#> Df Sum of Sq RSS AIC
#> + lat 1 213137924966058 1264138437356432 535838
#> + view 1 123620038415268 1353656323907222 537317
#> + waterfront 1 110238185400763 1367038176921727 537529
#> + yr_built 1 92854405407200 1384421956915290 537802
#> + long 1 66817275084005 1410459087238485 538205
#> + bedrooms 1 40635382190095 1436640980132395 538603
#> + yr_renovated 1 22404898015578 1454871464306912 538875
#> + sqft_living15 1 20108677275202 1457167685047288 538909
#> + condition 1 17605348260420 1459671014062070 538946
#> + sqft_lot15 1 6440740801824 1470835621520666 539111
#> + sqft_lot 1 3011349102420 1474265013220070 539161
#> + sqft_above 1 1216499294160 1476059863028330 539188
#> + sqft_basement 1 1216499294160 1476059863028330 539188
#> + floors 1 229913654973 1477046448667517 539202
#> + bathrooms 1 147193010785 1477129169311705 539203
#> <none> 1477276362322490 539204
#>
#> Step: AIC=535838
#> price ~ sqft_living + lat
#>
#> Df Sum of Sq RSS AIC
#> + view 1 126630803103906 1137507634252527 533559
#> + waterfront 1 116457216515170 1147681220841262 533751
#> + yr_built 1 51903847813610 1212234589542822 534934
#> + long 1 36167023609974 1227971413746458 535213
#> + bedrooms 1 32254157675667 1231884279680765 535281
#> + condition 1 19095077947046 1245043359409386 535511
#> + yr_renovated 1 18896934952437 1245241502403995 535515
#> + sqft_living15 1 18324966511740 1245813470844692 535524
#> + sqft_lot15 1 1242925578859 1262895511777573 535819
#> <none> 1264138437356432 535838
#> + sqft_lot 1 109125131288 1264029312225144 535838
#> + sqft_above 1 103865920728 1264034571435704 535838
#> + sqft_basement 1 103865920728 1264034571435704 535838
#> + bathrooms 1 2294249118 1264136143107314 535840
#> + floors 1 29322202 1264138408034231 535840
#>
#> Step: AIC=533558.7
#> price ~ sqft_living + lat + view
#>
#> Df Sum of Sq RSS AIC
#> + waterfront 1 48301384600438 1089206249652089 532623
#> + yr_built 1 29685166215947 1107822468036580 532989
#> + bedrooms 1 20105107128838 1117402527123689 533175
#> + long 1 18126102858463 1119381531394064 533214
#> + condition 1 13259272107760 1124248362144767 533307
#> + yr_renovated 1 11033349370024 1126474284882502 533350
#> + sqft_living15 1 9777260266168 1127730373986359 533374
#> + sqft_above 1 5649302765515 1131858331487012 533453
#> + sqft_basement 1 5649302765514 1131858331487012 533453
#> + sqft_lot15 1 1822194970820 1135685439281707 533526
#> + floors 1 790838989632 1136716795262894 533546
#> + sqft_lot 1 392067203463 1137115567049064 533553
#> + bathrooms 1 192700592756 1137314933659771 533557
#> <none> 1137507634252527 533559
#>
#> Step: AIC=532622.9
#> price ~ sqft_living + lat + view + waterfront
#>
#> Df Sum of Sq RSS AIC
#> + yr_built 1 29367526832499 1059838722819591 532034
#> + bedrooms 1 17478610279237 1071727639372852 532275
#> + long 1 17464543757804 1071741705894285 532276
#> + condition 1 13417265900615 1075788983751474 532357
#> + sqft_living15 1 11169376379955 1078036873272134 532402
#> + yr_renovated 1 8586367580730 1080619882071360 532454
#> + sqft_above 1 4670597698794 1084535651953295 532532
#> + sqft_basement 1 4670597698794 1084535651953295 532532
#> + sqft_lot15 1 1861231101284 1087345018550805 532588
#> + floors 1 572161272649 1088634088379440 532614
#> + sqft_lot 1 316492359462 1088889757292628 532619
#> + bathrooms 1 234351649857 1088971898002233 532620
#> <none> 1089206249652089 532623
#>
#> Step: AIC=532034.2
#> price ~ sqft_living + lat + view + waterfront + yr_built
#>
#> Df Sum of Sq RSS AIC
#> + bedrooms 1 20688901922724 1039149820896867 531610
#> + sqft_living15 1 18321662372237 1041517060447353 531659
#> + sqft_above 1 15037471026572 1044801251793019 531727
#> + sqft_basement 1 15037471026571 1044801251793019 531727
#> + floors 1 11793327955809 1048045394863782 531794
#> + bathrooms 1 9687073571957 1050151649247634 531838
#> + long 1 6492502179478 1053346220640113 531903
#> + condition 1 3277674240866 1056561048578725 531969
#> + yr_renovated 1 2732915035417 1057105807784173 531980
#> + sqft_lot15 1 1862994537472 1057975728282119 531998
#> + sqft_lot 1 415711282743 1059423011536848 532028
#> <none> 1059838722819591 532034
#>
#> Step: AIC=531610.1
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms
#>
#> Df Sum of Sq RSS AIC
#> + bathrooms 1 16549684599818 1022600136297049 531265
#> + sqft_living15 1 15977295635070 1023172525261797 531277
#> + sqft_above 1 12393157291328 1026756663605539 531353
#> + sqft_basement 1 12393157291328 1026756663605539 531353
#> + floors 1 11258034321641 1027891786575226 531377
#> + long 1 6897822190854 1032251998706014 531468
#> + condition 1 4437764244632 1034712056652235 531520
#> + sqft_lot15 1 3328710661702 1035821110235166 531543
#> + yr_renovated 1 2454309574146 1036695511322721 531561
#> + sqft_lot 1 1110404179464 1038039416717403 531589
#> <none> 1039149820896867 531610
#>
#> Step: AIC=531265.2
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms
#>
#> Df Sum of Sq RSS AIC
#> + sqft_living15 1 18350055736756 1004250080560294 530876
#> + sqft_above 1 13904485810531 1008695650486518 530971
#> + sqft_basement 1 13904485810531 1008695650486518 530971
#> + floors 1 5885883452763 1016714252844286 531142
#> + long 1 5034210071147 1017565926225902 531160
#> + condition 1 3931668236404 1018668468060645 531184
#> + sqft_lot15 1 2353625870920 1020246510426129 531217
#> + yr_renovated 1 950579379520 1021649556917529 531247
#> + sqft_lot 1 713608944368 1021886527352680 531252
#> <none> 1022600136297049 531265
#>
#> Step: AIC=530875.8
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15
#>
#> Df Sum of Sq RSS AIC
#> + long 1 10805533494283 993444547066010 530644
#> + sqft_above 1 8499381288515 995750699271779 530694
#> + sqft_basement 1 8499381288515 995750699271779 530694
#> + floors 1 6586050836779 997664029723515 530736
#> + condition 1 4245848049028 1000004232511265 530786
#> + sqft_lot15 1 3215905185938 1001034175374355 530808
#> + yr_renovated 1 1227639962617 1003022440597676 530851
#> + sqft_lot 1 817312968376 1003432767591917 530860
#> <none> 1004250080560294 530876
#>
#> Step: AIC=530644
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long
#>
#> Df Sum of Sq RSS AIC
#> + sqft_above 1 11430787297210 982013759768800 530396
#> + sqft_basement 1 11430787297210 982013759768800 530396
#> + floors 1 5113171922797 988331375143213 530534
#> + condition 1 4867275002312 988577272063699 530540
#> + yr_renovated 1 1510892433283 991933654632728 530613
#> + sqft_lot15 1 1242605733929 992201941332081 530619
#> <none> 993444547066010 530644
#> + sqft_lot 1 54136120758 993390410945252 530645
#>
#> Step: AIC=530395.9
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long + sqft_above
#>
#> Df Sum of Sq RSS AIC
#> + condition 1 7030802787234 974982956981566 530243
#> + sqft_lot15 1 1394571936760 980619187832040 530367
#> + yr_renovated 1 1154163783763 980859595985037 530372
#> + floors 1 589657336434 981424102432366 530385
#> + sqft_lot 1 123985408301 981889774360499 530395
#> <none> 982013759768800 530396
#>
#> Step: AIC=530242.6
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long + sqft_above + condition
#>
#> Df Sum of Sq RSS AIC
#> + yr_renovated 1 2427753578407 972555203403159 530191
#> + sqft_lot15 1 1448160668390 973534796313176 530212
#> + floors 1 944508155898 974038448825669 530224
#> + sqft_lot 1 114782670508 974868174311059 530242
#> <none> 974982956981566 530243
#>
#> Step: AIC=530190.7
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long + sqft_above + condition +
#> yr_renovated
#>
#> Df Sum of Sq RSS AIC
#> + sqft_lot15 1 1454398952337 971100804450822 530160
#> + floors 1 789954019031 971765249384128 530175
#> + sqft_lot 1 105333375183 972449870027976 530190
#> <none> 972555203403159 530191
#>
#> Step: AIC=530160.3
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long + sqft_above + condition +
#> yr_renovated + sqft_lot15
#>
#> Df Sum of Sq RSS AIC
#> + floors 1 649871043338 970450933407484 530148
#> + sqft_lot 1 491469266990 970609335183832 530151
#> <none> 971100804450822 530160
#>
#> Step: AIC=530147.9
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long + sqft_above + condition +
#> yr_renovated + sqft_lot15 + floors
#>
#> Df Sum of Sq RSS AIC
#> + sqft_lot 1 514943987546 969935989419938 530138
#> <none> 970450933407484 530148
#>
#> Step: AIC=530138.4
#> price ~ sqft_living + lat + view + waterfront + yr_built + bedrooms +
#> bathrooms + sqft_living15 + long + sqft_above + condition +
#> yr_renovated + sqft_lot15 + floors + sqft_lot
#>
#> Df Sum of Sq RSS AIC
#> <none> 969935989419938 530138
#>
#> Call:
#> lm(formula = price ~ sqft_living + lat + view + waterfront +
#> yr_built + bedrooms + bathrooms + sqft_living15 + long +
#> sqft_above + condition + yr_renovated + sqft_lot15 + floors +
#> sqft_lot, data = house_new)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1329723 -107135 -10257 83039 4161420
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -51820690.20472 1631235.72956 -31.768 < 0.0000000000000002 ***
#> sqft_living 183.14124 4.53440 40.389 < 0.0000000000000002 ***
#> lat 637283.46123 10883.25939 58.556 < 0.0000000000000002 ***
#> view 57215.00632 2236.84375 25.578 < 0.0000000000000002 ***
#> waterfront 569227.68416 18273.35744 31.151 < 0.0000000000000002 ***
#> yr_built -1713.19198 73.88326 -23.188 < 0.0000000000000002 ***
#> bedrooms -45061.50431 1973.54316 -22.833 < 0.0000000000000002 ***
#> bathrooms 53904.80545 3414.75936 15.786 < 0.0000000000000002 ***
#> sqft_living15 72.48159 3.45852 20.957 < 0.0000000000000002 ***
#> long -201913.59418 12385.91912 -16.302 < 0.0000000000000002 ***
#> sqft_above 61.30804 4.54229 13.497 < 0.0000000000000002 ***
#> condition 34249.71009 2459.48103 13.926 < 0.0000000000000002 ***
#> yr_renovated 27.64023 3.84548 7.188 0.00000000000068 ***
#> sqft_lot15 -0.48238 0.07712 -6.255 0.00000000040604 ***
#> floors 14544.76601 3756.31925 3.872 0.000108 ***
#> sqft_lot 0.17084 0.05045 3.386 0.000710 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 211900 on 21597 degrees of freedom
#> Multiple R-squared: 0.667, Adjusted R-squared: 0.6668
#> F-statistic: 2884 on 15 and 21597 DF, p-value: < 0.00000000000000022
#> [1] 0.6667911
#> [1] 0.6667911
dari kedua model dengan menggunakan stepwise yaitu backward dan forward, tidak didapat perbedaan yang signifikan
#> [1] 211842.9
#> [1] 261440.8
#> [1] 44877434388
#> [1] 68351286833
Dari hasil prediksi dan pengujian error dengan menggunakan RMSE dan MSE, didapat model multiple linear regression (backward) adalah yang terbaik dengan error yang lebih rendah dibandingkan dengan model single linear regression (model_s_living)
### Homoscedascity
#>
#> studentized Breusch-Pagan test
#>
#> data: backward
#> BP = 3117.3, df = 15, p-value < 0.00000000000000022
Tolak H0 jika p-value < alpha (0.05). Berdasarkan nilai p-value yang diperoleh maka dapat disimpulkan bahwa memenuhi asumsi homoscedasticity
#> bedrooms bathrooms sqft_living sqft_lot floors
#> 1.621296 3.328365 8.346160 2.101672 1.979885
#> waterfront view condition sqft_above yr_built
#> 1.202782 1.413950 1.232683 6.808499 2.266450
#> yr_renovated lat long sqft_living15 sqft_lot15
#> 1.148164 1.094365 1.464138 2.703973 2.133977
didapat bahwa ketika nilai vif tidak ada yang melebihi 10 maka nilai no-multicolinearity terpenuhi
dari hasil linear regressi model, didapat model dengan menggunakan stepwise dan diapat prediktor yang mempengaruhi harga rumah yaitu diantaranya:
didapat adj. r squared sebesar 0.66. hal ini disebabkan karena banyaknya outlier yang terdapat didalamnya, sehingga jarak antara garis linear model dengan data tersebut menjadi besar. setelah dilakukan uji test yaitu normality error, homoscedascity dan no-multicolinearity bahwa model dapat diterima