LBB 4
LBB 4
BACKGROUND
LBB Requirements
In making a report, don’t forget to cover the following:
- Selection of variable targets depends on the perspective of the case you want to take
- Data analysis and the process of selecting predictor / feature selection variables
- Test the validity of the model
- Model interpretations and recommendations related to the initial case
Case Study
I conducted a regression and analysis of exploratory data to gain insight into housing prices in relation to other attributes. And the dataset I got from kaggle is “Housing Prices”
Insight
The aim is to analyze the variable that affect home prices with other variables, that will be considered as factors that can affect prices.
DATA PREPARATION
Packages
Data Input
## Area Garage FirePlace Baths White.Marble Black.Marble Indian.Marble Floors
## 1 164 2 0 2 0 1 0 0
## 2 84 2 0 4 0 0 1 1
## 3 190 2 4 4 1 0 0 0
## 4 75 2 4 4 0 0 1 1
## 5 148 1 4 2 1 0 0 1
## 6 124 3 3 3 0 1 0 1
## City Solar Electric Fiber Glass.Doors Swiming.Pool Garden Prices
## 1 3 1 1 1 1 0 0 43800
## 2 2 0 0 0 1 1 1 37550
## 3 2 0 0 1 0 0 0 49500
## 4 1 1 1 1 1 1 1 50075
## 5 2 1 0 0 1 1 1 52400
## 6 1 0 0 1 1 1 1 54300
Colnames
## [1] "Area" "Garage" "FirePlace" "Baths"
## [5] "White.Marble" "Black.Marble" "Indian.Marble" "Floors"
## [9] "City" "Solar" "Electric" "Fiber"
## [13] "Glass.Doors" "Swiming.Pool" "Garden" "Prices"
Chunk Commentary:
## [1] 164 84 190 75 148 124 58 249 243 242 61 189 160 5 215 113 244 197
## [19] 102 154 4 51 220 20 171 34 53 122 180 110 238 146 66 139 30 212
## [37] 237 35 97 193 36 179 155 136 207 65 246 14 217 233 2 163 173 71
## [55] 37 150 222 149 161 62 144 240 187 79 44 101 115 112 183 70 108 152
## [73] 18 68 245 118 128 213 73 231 142 195 133 99 235 221 241 206 40 143
## [91] 47 214 95 57 104 196 162 67 169 216 106 182 76 22 1 209 33 229
## [109] 87 88 77 134 140 54 38 91 247 236 170 29 78 205 201 200 172 117
## [127] 166 218 32 156 210 6 203 96 31 80 21 12 11 46 24 86 186 69
## [145] 174 49 121 199 125 25 9 23 93 119 192 7 176 151 94 85 19 107
## [163] 74 43 52 13 157 109 123 167 127 137 165 27 17 248 228 234 181 10
## [181] 178 226 232 224 145 147 39 15 82 223 48 98 177 130 153 194 16 83
## [199] 103 202 114 227 198 191 50 175 26 159 208 138 141 225 45 126 72 204
## [217] 219 100 239 92 41 131 116 89 111 55 28 64 158 129 188 184 211 132
## [235] 135 185 230 63 90 81 60 120 3 56 59 105 168 8 42
## [1] 2 1 3
## [1] 0 4 3 1 2
## [1] 2 4 3 1 5
## [1] 0 1
## [1] 1 0
## [1] 0 1
## [1] 0 1
## [1] 3 2 1
## [1] 1 0
## [1] 1 0
## [1] 1 0
## [1] 1 0
## [1] 0 1
## [1] 0 1
## [1] 43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 49725 31675
## [13] 39950 64375 38500 18625 42325 53100 53625 38300 67850 44850 17725 49000
## [25] 44700 40475 51800 32575 47300 51000 58950 35700 46100 31600 57175 48950
## [37] 11125 38250 49125 44325 55425 63275 23850 53175 33875 32400 55875 36075
## [49] 59850 20350 66625 40575 60875 42800 58275 44075 11275 32675 58000 74000
## [61] 50375 55475 61475 43750 48100 26650 50450 28875 22925 41100 41300 19525
## [73] 63600 39700 34875 50000 47325 33950 67150 50250 20950 57050 34150 64575
## [85] 36175 44750 57950 48450 40000 68275 34075 61975 40325 66275 24000 37575
## [97] 65275 38775 39925 36125 56025 31350 62725 59400 68200 29275 49625 35550
## [109] 15175 27825 25925 50050 51350 48750 62925 37675 41150 42000 55850 38550
## [121] 30325 33375 37000 37475 38675 57075 24425 37650 42425 22150 26550 33600
## [133] 59925 50350 26400 48550 50725 44425 33400 15650 43225 32725 45675 36650
## [145] 61375 26675 51700 52125 47825 50200 47450 53500 53750 22850 49950 36675
## [157] 29850 53950 32875 37050 44225 40350 69750 31200 47275 49100 43100 31875
## [169] 33500 34350 56275 52150 34750 56875 36200 38725 28625 32525 56500 31825
## [181] 56125 27750 40550 22900 28800 36725 35475 54400 48150 30750 35750 26100
## [193] 33850 37025 37900 64100 38425 32800 44950 45425 25975 54800 42475 39975
## [205] 22875 54975 52225 34425 33725 50325 37375 25075 35175 45225 33275 21575
## [217] 23175 44450 44375 49350 48225 60850 57475 43325 35675 40400 31625 69275
## [229] 40175 60900 30350 51050 15850 26775 51750 59825 28050 36750 57450 45575
## [241] 55275 24675 30225 63325 38400 36775 19925 38875 55625 36625 37975 20550
## [253] 48575 29125 45100 35950 58575 29375 25575 45875 27375 35575 28275 41875
## [265] 30800 38525 52350 59025 60025 46950 58800 42200 27600 46825 55750 28400
## [277] 31375 49825 52175 35325 41275 35275 41450 52750 60700 38925 46650 55150
## [289] 41950 66250 40850 53075 47350 50625 30050 62250 38900 65375 27800 45275
## [301] 21850 32225 34725 48375 43675 35725 43825 48300 55200 35075 27975 55800
## [313] 60300 45050 56550 42525 38350 54150 39625 15625 45650 61425 16325 43600
## [325] 47050 63500 29700 51325 46075 33050 28600 29950 65550 30000 28575 42050
## [337] 43175 25650 44025 45325 44625 44600 55050 14875 19950 25375 38000 44500
## [349] 39575 38150 66175 42700 36150 47200 38200 35300 25550 29925 51125 25425
## [361] 23775 56375 20700 37625 49225 41025 22175 40425 40775 45850 37600 39750
## [373] 57100 53275 24325 36250 33025 53825 38375 44725 27350 46150 17775 32100
## [385] 31725 21675 28825 17500 50825 57600 29325 48650 57025 57400 43200 41475
## [397] 69700 19150 57300 64525 61150 53475 32050 49450 58425 52800 49075 21700
## [409] 50775 49575 44575 41625 63400 50550 37325 29625 51850 52825 43475 40525
## [421] 67825 17800 35125 51400 52300 22800 23600 34500 20325 32350 20050 25900
## [433] 33450 40450 44475 34175 47625 43550 25950 42850 37450 40300 52525 44000
## [445] 57625 48775 22375 49900 37925 46225 36025 54675 31425 41500 31300 50925
## [457] 23125 64675 66800 25475 45625 55450 21925 67725 17125 25850 45025 53225
## [469] 18700 43500 34100 37850 31775 36825 41575 44675 32550 48825 38025 32650
## [481] 56350 47375 12975 26825 47800 41900 32975 29300 20475 56900 69950 39175
## [493] 45950 36375 61650 38650 51225 20850 46175 24525 57750 59550 31325 54550
## [505] 43775 40250 42925 59150 36525 36425 60450 52100 38625 41675 56775 53675
## [517] 40050 35250 26300 33925 56175 37175 18900 30575 51500 52725 26850 42575
## [529] 42875 34925 43850 70150 49400 35025 43125 38275 28350 46900 23550 37350
## [541] 27650 33550 45500 44275 46850 27325 44350 27900 50950 54450 24875 35900
## [553] 32250 61675 53000 53550 69900 39650 42650 50800 35050 47850 53375 47000
## [565] 50600 63525 54650 54025 27050 11325 35625 58825 52775 62675 32300 37825
## [577] 40600 48800 43050 51675 22675 22650 63650 58850 60525 56050 52275 51925
## [589] 34675 49800 45200 24550 27850 36900 39825 48275 18350 50575 31150 47150
## [601] 16800 14850 39800 18225 26600 52500 41050 59500 48700 22425 53600 43250
## [613] 66400 53650 34650 48975 21875 43425 48050 54000 37950 34325 57800 41525
## [625] 48475 45600 48525 47550 19325 45400 36100 28925 46575 59650 28375 51300
## [637] 56300 60075 50475 68600 55300 45975 46050 45175 18250 41350 37300 21525
## [649] 39775 59125 33675 45750 18425 51625 63450 25750 31475 15050 33300 40100
## [661] 28175 66550 20075 45450 40200 41075 21825 62050 50975 31750 68350 42825
## [673] 36800 53900 66600 44550 48500 30650 43875 28675 33475 60500 33225 57975
## [685] 56850 61625 63250 27625 37725 53125 24925 42250 31500 54600 45000 46775
## [697] 30500 27775 24850 66700 57650 42900 33250 50125 42350 31575 18500 44975
## [709] 49275 60100 32775 63975 43650 33775 52925 16750 47025 45350 25150 71100
## [721] 39850 41400 53025 22100 68250 50525 31225 28700 50650 50875 19750 49875
## [733] 47600 39075 50225 41225 49200 61275 29200 42500 35800 51250 57850 64300
## [745] 35775 32025 52675 30850 39050 47950 46750 36975 32500 46975 15925 30550
## [757] 18750 24275 29425 49250 58600 32925 55675 55900 52975 23325 54775 44150
## [769] 60725 58325 50100 20125 41775 29250 60250 40725 40225 32900 13500 62125
## [781] 59600 70300 58450 46000 20225 59325 38850 32750 42975 38975 20575 38575
## [793] 52025 27525 29500 41425 21275 37275 48850 66850 53150 44900 20000 39350
## [805] 56800 43375 46250 29875 33800 62575 24625 57725 36300 42075 48025 20250
## [817] 61075 42750 34900 40825 39375 44175 40150 46375 45800 21200 55175 33575
## [829] 52425 60400 24050 19500 30775 17900 39200 55075 34375 51100 49525 43625
## [841] 50300 45925 33075 32825 17150 26175 51725 42100 40375 68100 46800 56150
## [853] 36225 41250 38800 39275 23200 28750 47650 43300 27175 38950 60150 34800
## [865] 41600 42150 60275 47575 56325 37775 30975 32175 41750 19650 41700 41375
## [877] 54850 48175 39225 35975 67250 49975 26875 42625 60550 46500 43150 28125
## [889] 48075 16425 52325 27300 28000 58775 53450 46325 57325 45775 17175 24375
## [901] 38100 58875 30250 47975 15500 26975 39300 41000 55375 68900 31175 25200
## [913] 33625 44300 29000 42775 28325 56250 61850 28975 42450 50150 37875 21550
## [925] 58700 39325 65625 37400 61525 51575 49650 25725 61400 41825 48125 53925
## [937] 49750 34275 58750 65700 54050 25675 57350 28300 19275 58050 33150 33825
## [949] 45375 36325 29050 48000 42025 37500 34625 38450 65050 51475 66475 59950
## [961] 51275 16275 36350 33425 37700 11750 19125 54200 36550 40700 57700 35350
## [973] 40075 49425 33350 43400 31550 23000 46675 58650 30375 31900 24250 46275
## [985] 65950 24200 36400 28425 53525 41200 43700 20650 71200 29900 54950 63625
## [997] 34450 50850 27025 37150 34850 51075 31050 51550 29150 41650 26250 32325
## [1009] 18325 43075 56200 22000 44875 47475 58300 52475 55600 19400 56000 18650
## [1021] 39025 33100 56100 39550 19075 16575 55575 56700 66075 44825 12750 53875
## [1033] 34250 25350 16100 54325 33700 15400 24225 51875 44200 42600 57525 55100
## [1045] 49925 44525 23725 49325 54375 63875 71150 39425 44125 39675 43975 45725
## [1057] 46550 48350 41125 28775 50025 55825 26900 20600 29675 62850 47725 26125
## [1069] 47700 20875 69100 35400 31450 31125 30125 47075 27675 39500 55025 46600
## [1081] 53425 59100 67275 27575 53250 66575 60575 57000 47525 48425 48200 29600
## [1093] 31975 31025 35200 49850 23675 60675 52900 20175 27075 56975 46475 22575
## [1105] 48725 46350 28225 38600 24800 50750 59250 59525 20900 23425 46925 69350
## [1117] 55650 51650 52575 27250 13825 59625 29075 64475 61300 35650 57200 23875
## [1129] 17250 52200 31650 63025 52650 21900 14600 18875 55775 23450 24900 68750
## [1141] 34300 45250 58900 51375 21350 51600 48625 47175 45825 55125 16125 42225
## [1153] 39125 24825 48900 43450 27550 30300 41850 46525 62400 52050 19900 62100
## [1165] 25275 22500 28550 70375 21425 43575 21600 28100 37425 49475 67000 37525
## [1177] 17050 27725 37100 17550 31925 26500 25000 34575 60475 39900 24600 18400
## [1189] 38225 40875 23925 29025 51950 29825 61925 54575 39725 26375 32150 35225
## [1201] 39000 34475 60800 46300 36475 35925 36050 23350 24500 63850 51025 39450
## [1213] 59050 38075 29525 44400 36600 31700 29650 29225 46400 52850 54250 45900
## [1225] 36275 57275 17650 35525 39475 44775 46025 47500 22700 38700 19700 62375
## [1237] 67500 18000 33125 60600 20150 68650 48925 37125 35375 49050 20425 44800
## [1249] 55975 65125 16825 20450 60350 41725 25500 37250 55500 58625 63800 28475
## [1261] 58400 62000 59675 51175 49775 21950 36575 71875 56225 60000 17375 43025
## [1273] 31525 32275 32475 18050 30425 60225 35425 47225 56675 21325 44650 54875
## [1285] 35450 61500 43350 51200 69375 47425 64400 50275 60650 64950 17600 49675
## [1297] 46125 20925 50500 63950 59275 35000 27425 52000 25700 60325 19350 26950
## [1309] 16300 32075 32375 52875 23475 48600 25300 35100 26625 23525 62750 48325
## [1321] 52950 17350 60425 60200 67875 23375 52250 45525 37200 29750 41800 42550
## [1333] 40125 26925 56725 34125 27275 54725 60775 19475 40025 49025 55400 30175
## [1345] 34550 55950 41325 61875 31100 21800 65025 24300 21025 56600 58025 62450
## [1357] 53575 31800 53700 39875 21125 30475 26575 53975 34950 54750 57825 34000
## [1369] 40750 24350 34700 39250 40625 65675 72150 65075 31250 52625 61775 50700
## [1381] 57250 44100 36850 30100 69150 73150 59700 42275 69125 55350 26750 54625
## [1393] 37075 40675 25525 32000 36000 46625 47400 30925 69650 21050 53325 32200
## [1405] 64025 63125 35875 57900 65450 27225 44925 47100 41175 54125 57150 48875
## [1417] 68050 42400 43000 25100 53300 64500 32450 45150 58500 42725 38825 58375
## [1429] 51825 56750 28150 40275 23800 18175 28950 49550 23050 34200 23950 38175
## [1441] 61250 63350 25625 28075 36925 30200 24725 51525 49300 26525 63050 54500
## [1453] 64975 58550 59900 59450 24175 16850 55525 33750 56625 46200 65225 65650
## [1465] 59800 17875 15275 61700 52450 42175 26075 58150 57225 24475 42950 20975
## [1477] 30075 18100 33175 67425 62825 51775 22075 17525 11000 66500 20525 34825
## [1489] 50675 54350 18975 43900 57925 39600 56575 21400 62475 26225 46425 66000
## [1501] 14300 38475 23975 53350 52550 69800 24950 69500 66725 25600 66650 39150
## [1513] 58350 37800 30275 57500 28500 19825 33900 70875 41975 67625 59175 24400
## [1525] 28525 64250 40975 56400 26050 47875 47900 23575 29450 42300 45125 26325
## [1537] 73575 54425 29475 54925 15325 42675 29725 68925 56525 55250 42375 53775
## [1549] 19675 55700 32125 64425 68725 54525 33975 36700 26350 57375 61900 27875
## [1561] 22475 32600 21075 34975 63575 46450 67675 64275 48675 45550 30900 33650
## [1573] 34050 28450 48250 67575 35825 65175 30700 20200 43725 51450 57775 39100
## [1585] 61600 36500 23250 25400 22200 26150 62800 19175 58250 23650 14700 23075
## [1597] 60625 33325 42125 33000 63700 25825 25250 40800 71275 22725 24125 29100
## [1609] 52600 56650 21450 29975 27700 22750 27500 65300 56425 37225 21975 48400
## [1621] 34600 44250 25450 43525 66375 23025 20775 34025 65600 39400 24650 47750
## [1633] 22550 63425 65250 60925 54225 53850 27925 62625 16950 28200 31000 36875
## [1645] 69675 38750 32950 38125 27000 64175 34525 33525 30675 41550 31075 35500
## [1657] 31400 19225 12300 66325 34225 67375 64850 36950 39525 22975 19800 20100
## [1669] 18925 69000 58200 23500 29800 47675 23275 61025 16925 35850 26450 54825
## [1681] 40500 25800 16400 28250 45700 52075 71850 45300 71600 65325 50900 59750
## [1693] 29550 63200 22400 32700 45475 14450 30600 51150 19725 63100 24450 68000
## [1705] 22250 70575 47125 26475 15800 21650 56825 52375 21375 37750 64150 19975
## [1717] 16675 67975 19200 23750 63225 32425 23625 62500 17975 17575 58100 56925
## [1729] 22600 59225 30725 38050 23300 24575 27950 12250 62175 19000 72975 70275
## [1741] 68450 31950 17100 24750 18725 59075 64800 74525 51425 19300 17075 61050
## [1753] 71175 72200 49600 63000 71300 64625 27200 67475 12000 30525 12575 62550
## [1765] 68675 12925 67750 64750 21150 57675 53800 40900 62700 19100 65800 63375
## [1777] 63475 62525 47250 51900 24150 54100 62275 67925 50400 54475 26025 19425
## [1789] 38325 35600 30825 60375 69225 69250 43925 62425 22275 66350 49375 14225
## [1801] 62650 68525 27150 58125 14825 49700 46700 16150 59725 19375 70425 65825
## [1813] 66050 56475 30875 72900 63925 74675 17000 61225 15875 61450 64775 58975
## [1825] 67175 13425 54175 16375 20375 70800 27125 61825 14050 65400 67525 64825
## [1837] 62975 24975 65975 20025 51975 61000 54275 26000 15750 40925 41925 71450
## [1849] 17850 72375 33200 61125 64550 17200 13750 66975 57125 23225 22950 28850
## [1861] 43275 66100 68800 36450 19575 54900 53400 17400 72950 54700 60975 16475
## [1873] 22050 30150 66425 25225 59475 66875 18275 17450 70775 21000 14325 73475
## [1885] 71025 66450 68225 53050 55225 67125 65000 55925 69850 28725 63150 57550
## [1897] 64350 19775 26725 12525 31275 58675 56450 49150 30025 62875 64875 32625
## [1909] 32850 54075 58175 20725 23100 29350 55725 13625 29175 63550 25050 30450
## [1921] 30950 56950 64600 18800 70700 16525 22125 43950 65350 11450 14575 21175
## [1933] 61100 70225 74750 18300 25025 69625 19450 22450 25325 68075 46725 68975
## [1945] 69825 28650 55000 63075 62900 19600 66775 65725 52700 46875 13950 62025
## [1957] 47775 14250 72550 62350 30625 15250 29775 68025 70825 71825 65875 58475
## [1969] 45075 57425 62300 69550 53200 17225 65925 15375 18375 62075 72050 59975
## [1981] 58525 16975 57875 22025 26425 20500 58225 15600 20275 44050 69050 63775
## [1993] 58925 25175 17275 61750 56075 25775 29400 15225 26200 61575 13325 23825
## [2005] 12375 59875 24025 65525 14950 70675 16350 50175 28025 10950 72575 30400
## [2017] 40950 19875 66025 16000 60125 13800 66825 60050 21625 35150 28900 70900
## [2029] 73225 67400 68550 55325 15975 16550 67300 26800 23700 58075 21100 24100
## [2041] 23150 16175 65850 61800 67775 22225 22825 15675 18775 62150 75125 16500
## [2053] 62325 68575 67025 70650 24075 64225 71750 14775 20750 49175 19850 24700
## [2065] 47925 21225 62225 59200 67600 20675 64125 70075 14550 27400 67800 34775
## [2077] 15450 68375 17825 69425 70000 57575 61725 66125 63750 61550 70525 27450
## [2089] 19250 59775 67700 21775 65750 69875 73400 64925 18525 71575 26700 65150
## [2101] 68850 18125 21750 66675 59425 18475 68400 60950 63825 25125 60825 53725
## [2113] 67900 68825 63675 67950 26275 67650 68625 21250 55550 65425 59000 63300
## [2125] 59575 10575 74250 17325 19050 58725 19625 20300 27475 68875 70450 64900
## [2137] 17950 17300 65475 25875 71125 21500 20800 63725 72800 68775 59350 67050
## [2149] 13850 27100 64700 20400 14500 17625 15550 21475 71225 73825 73125 40650
## [2161] 18575 15200 67200 71525 72175 73975 69725 22325 67225 68475 62200 64325
## [2173] 15825 65775 31850 16775 59300 18850 13550 19025 66525 69575 13375 68300
## [2185] 64650 65100 64050 12500 72325 12450 16075 10525 72425 17475 70350 74200
## [2197] 15775 67550 61325 22350 71925 12600 61950 64200 68425 17425 70025 16050
## [2209] 66150 14475 23400 72075 73425 13200 66200 75175 66925 17025 14375 61200
## [2221] 13150 66950 70325 16025 16200 12675 71500 14650 72125 21725 18825 14400
## [2233] 73200 17675 17700 19550 71000 11400 68125 70050 72850 12275 15725 73000
## [2245] 71375 70100 16725 10975 71650 13300 8550 23900 18675 66750 13050 67075
## [2257] 66900 68325 14175 62600 66300 72625 18450 20825 65900 68150 13525 71725
## [2269] 18200 62950 70550 10925 9600 18550 24775 18150 11350 59375 15950 22775
## [2281] 14800 74350 60750 16625 69925 62775 13650 17750 68950 74725 73050 9900
## [2293] 76450 15700 18075 18600 15475 14075 70500 67100 20625 65200 15100 63175
## [2305] 16900 18950 12025 16450 64000 71350 63900 22625 64725 66225 74600 13575
## [2317] 21300 16600 15075 64450 69475 22525 14425 14100 65575 69450 67450 14125
## [2329] 69200 18025 15300 65500 16250 14150 71950 13725 10375 69300 14725 73100
## [2341] 15025 72450 73075 14925 72675 73650 12075 61175 13075 13100 13900 70200
## [2353] 16875 60175 67350 14350 11575 12800 64075 70125 14275 70925 8975 70975
## [2365] 71775 12650 16650 15900 61350 8875 72475 16700 69175 14200 12775 76425
## [2377] 71050 68700 11600 15000 76250 69525 15575 69975 74075 13450 10725 13875
## [2389] 70950 15125 73350 68500 9650 67325 12350 72925 13925 14900 69600 15425
## [2401] 13025 11550 70400 9375 13600 11950 13775 10875 13125 73275 13225 71625
## [2413] 74325 71400 12875 70250 13675 74625 13000 69025 14675 13975 71800 12400
## [2425] 13700 76600 73700 70850 70475 8850 15150 73325 17925 72875 11525 12425
## [2437] 15525 69400 12900 72350 74475 11375 13400 9550 9275 72225 70725 10650
## [2449] 71675 11900 73025 11700 70175 71425 71700 72400 10750 70750 14525 70625
## [2461] 73250 71325 13350 69075 13250 72650 69325 14750 15350 68175 12950 11875
## [2473] 11500 75750 72600 71975 10600 12200 14625 12125 73750 12625 72000 12850
## [2485] 73300 69775 75050 10625 13475 75625 71475 73525 12100 72275 74150 74700
## [2497] 72725 11075 75500 16225 71075 74575 74850 11800 72825 73875 72250 13175
## [2509] 73450 12700 13275 73600 74550 11925 10825 71250 10000 11775 72300 74175
## [2521] 70600 12825 12175 12325 73175 9675 75150 76975 74650 72500 12150 73675
## [2533] 10300 12725 10350 14000 9350 10775 10175 76500 14975 74500 72025 73625
## [2545] 76125 73500 71900 9300 12225 10850 72775 73725 72700 9475 10050 11150
## [2557] 73375 11175 11025 73950 71550 11825 75975 72750 10500 11850 14025 74900
## [2569] 75100 12475 10075 76275 73800 75800 74875 10250 74050 73550 12050 11425
## [2581] 73900 10900 10125 74025 74125 10400 9850 11300 12550 11050 74275 8950
## [2593] 8275 75825 75450 76350 75550 77225 11250 9325 77000 75250 73925 73850
## [2605] 11225 9725 72100 10450 74825 77175 75650 75900 11475 76075 11975 10200
## [2617] 10150 11650 75275 9200 11675 9525 10550 8925 74950 76050 9825 7725
## [2629] 10475 75350 77375 9400 74425 72525 8500 75425 75475 74225 9700 10675
## [2641] 8725 76225 10700 9175 10100 77525 10800 76000 74775 11200 76825 7975
## [2653] 11100 75700 74400 77700 75075 9875 9425 76950 74975 9100 9575 9500
## [2665] 11725 75200 74925 10425 75225 76300 76200 77075 11625 9250 8375 75600
## [2677] 9775 75525 75375 75575 8675 8100 10325 74300 75000 10275 77250 76475
## [2689] 9975 74800 8775 9800 77975 74450 76750 73775 10225 75675 75850 8650
## [2701] 75725 10025 74100 9925 75875 8900 76775 75400 76550 75925 76325 75325
## [2713] 74375 76800
Structure
## 'data.frame': 500000 obs. of 16 variables:
## $ Area : int 164 84 190 75 148 124 58 249 243 242 ...
## $ Garage : int 2 2 2 2 1 3 1 2 1 1 ...
## $ FirePlace : int 0 0 4 4 4 3 0 1 0 2 ...
## $ Baths : int 2 4 4 4 2 3 2 1 2 4 ...
## $ White.Marble : int 0 0 1 0 1 0 0 1 0 0 ...
## $ Black.Marble : int 1 0 0 0 0 1 0 0 0 0 ...
## $ Indian.Marble: int 0 1 0 1 0 0 1 0 1 1 ...
## $ Floors : int 0 1 0 1 1 1 0 1 1 0 ...
## $ City : int 3 2 2 1 2 1 3 1 1 2 ...
## $ Solar : int 1 0 0 1 1 0 0 0 0 1 ...
## $ Electric : int 1 0 0 1 0 0 1 1 0 0 ...
## $ Fiber : int 1 0 1 1 0 1 1 0 0 0 ...
## $ Glass.Doors : int 1 1 0 1 1 1 1 1 0 0 ...
## $ Swiming.Pool : int 0 1 0 1 1 1 0 1 1 1 ...
## $ Garden : int 0 1 0 1 1 1 1 0 0 0 ...
## $ Prices : int 43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 ...
Chunk Commentary: The data has 500000 rows and 16 columns. Our target variable is the price, and the rest others is predictior
Variable Description C001
The following is an explanation of the variables and their corresponding data types:
- Area: What is the area of the Unit? | data type: integer
- Garage: Is there a Garage in the Unit? | data type: integer
- Fireplace: how much e a Fireplace in the Unit? | data type: integer
- Bath: What is the amount of Bath? | data type: integer
- White.Marble: Do you use White Marble? | data type: level of Factor
- Black.Marble: Do you use Black Marble? | data type: level of Factor
- Indian.Marble: Does it use Indian Marble? | data type: level of Factor
- Floors: What is the Number of Floors? | data type: integer
- City: Is the city in the unit? | data type: Factor
- Solar: Do you use Solar in the Unit? | data type: boolean
- Electric: Does Electricy use? | data type: boolean
- Fiber: Does they use Fiber? data type: boolean
- Glass.Doors: Do you use Glass Doors? | data type: boolean
- Swiming.Pool: Do you use Swimming Pool? | data type: boolean
- Garden: Is there a Garden in the unit? | data type: boolean
- Prices: What is the unit price? ? data type: integer
## [1] 2 4 3 1 5
#change solar Factor levels Yes/no
house$Solar <- as.factor(house$Solar)
levels(x=house$Solar) <- list("no"="0", "yes"="1")
#change Electric Factor levels Yes/no
house$Electric <- as.factor(house$Electric)
levels(x=house$Electric) <- list("no"="0", "yes"="1")
#change solar Fiber levels Yes/no
house$Fiber <- as.factor(house$Fiber)
levels(x=house$Fiber) <- list("no"="0", "yes"="1")
#change Glass Door Factor levels Yes/no
house$Glass.Doors <- as.factor(house$Glass.Doors)
levels(x=house$Glass.Doors) <- list("no"="0", "yes"="1")
#change swimming pool Factor levels Yes/no
house$Swiming.Pool <- as.factor(house$Swiming.Pool)
levels(x=house$Swiming.Pool) <- list("no"="0", "yes"="1")
#change Garden Factor levels Yes/no
house$Garden <- as.factor(house$Garden)
levels(x=house$Garden) <- list("no"="0", "yes"="1")
house$Floors <- as.factor(house$Floors)
levels(x=house$Floors) <- list("no"="0", "yes"="1")lets Check the coverted data type
## 'data.frame': 500000 obs. of 16 variables:
## $ Area : int 164 84 190 75 148 124 58 249 243 242 ...
## $ Garage : int 2 2 2 2 1 3 1 2 1 1 ...
## $ FirePlace : int 0 0 4 4 4 3 0 1 0 2 ...
## $ Baths : int 2 4 4 4 2 3 2 1 2 4 ...
## $ White.Marble : int 0 0 1 0 1 0 0 1 0 0 ...
## $ Black.Marble : int 1 0 0 0 0 1 0 0 0 0 ...
## $ Indian.Marble: int 0 1 0 1 0 0 1 0 1 1 ...
## $ Floors : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 1 ...
## $ City : int 3 2 2 1 2 1 3 1 1 2 ...
## $ Solar : Factor w/ 2 levels "no","yes": 2 1 1 2 2 1 1 1 1 2 ...
## $ Electric : Factor w/ 2 levels "no","yes": 2 1 1 2 1 1 2 2 1 1 ...
## $ Fiber : Factor w/ 2 levels "no","yes": 2 1 2 2 1 2 2 1 1 1 ...
## $ Glass.Doors : Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 1 1 ...
## $ Swiming.Pool : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ Garden : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 2 1 1 1 ...
## $ Prices : int 43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 ...
## Area Garage FirePlace Baths White.Marble
## 0 0 0 0 0
## Black.Marble Indian.Marble Floors City Solar
## 0 0 0 0 0
## Electric Fiber Glass.Doors Swiming.Pool Garden
## 0 0 0 0 0
## Prices
## 0
house <- house %>%
mutate(marbles = case_when( White.Marble == 1 ~ "White",
Black.Marble == 1 ~ "Black",
Indian.Marble == 1 ~ "Indian") ) %>%
select(-c(Black.Marble, White.Marble, Indian.Marble)) %>%
select(Floors, Fiber, marbles, Prices, Glass.Doors, City, Baths, FirePlace, Garage, Area, Electric, Swiming.Pool, Garden)
house$City <- as.factor(house$City)
# house$Floor <- as.factor(house$City)
head(house)## Floors Fiber marbles Prices Glass.Doors City Baths FirePlace Garage Area
## 1 no yes Black 43800 yes 3 2 0 2 164
## 2 yes no Indian 37550 yes 2 4 0 2 84
## 3 no yes White 49500 no 2 4 4 2 190
## 4 yes yes Indian 50075 yes 1 4 4 2 75
## 5 yes no White 52400 yes 2 2 4 1 148
## 6 yes yes Black 54300 yes 1 3 3 3 124
## Electric Swiming.Pool Garden
## 1 yes no no
## 2 no yes yes
## 3 no no no
## 4 yes yes yes
## 5 no yes yes
## 6 no yes yes
EXPLANATORY DATA ANALYSIS
Linearity Test
Exploratory data analysis is a phase where we explore the data variables, see if there are any pattern that can indicate any kind of correlation between variables.
Find the Pearson correlation between features.
## Warning in ggcorr(house, label = T, hjust = 1, layout.exp = 1): data in
## column(s) 'Floors', 'Fiber', 'marbles', 'Glass.Doors', 'City', 'Electric',
## 'Swiming.Pool', 'Garden' are not numeric and were ignored
- Price variable correlation with : * Baths, FirePlace, Garage, Area with correlation as low as 0.1
- Since Indian Marble, Black Marble and White Marble that should be a factor and we argue that affected Price in business prespective so we erase those variables.
handling Marbles:
## 'data.frame': 500000 obs. of 13 variables:
## $ Floors : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 1 ...
## $ Fiber : Factor w/ 2 levels "no","yes": 2 1 2 2 1 2 2 1 1 1 ...
## $ marbles : chr "Black" "Indian" "White" "Indian" ...
## $ Prices : int 43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 ...
## $ Glass.Doors : Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 1 1 ...
## $ City : Factor w/ 3 levels "1","2","3": 3 2 2 1 2 1 3 1 1 2 ...
## $ Baths : int 2 4 4 4 2 3 2 1 2 4 ...
## $ FirePlace : int 0 0 4 4 4 3 0 1 0 2 ...
## $ Garage : int 2 2 2 2 1 3 1 2 1 1 ...
## $ Area : int 164 84 190 75 148 124 58 249 243 242 ...
## $ Electric : Factor w/ 2 levels "no","yes": 2 1 1 2 1 1 2 2 1 1 ...
## $ Swiming.Pool: Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ Garden : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 2 1 1 1 ...
Data With Outlier
Chunk commentary:
- price is distributed normaly
- it is observed that the collection of random data from independent sources is distributed normally. We get a bell shape curve on plotting a graph.
Data without outlier
outlier<- boxplot(house$Prices, plot = F)$out
house.without.oultier<- house %>%
filter(Prices != outlier)
hist(house.without.oultier$Prices)Chunk Commentary:
- there is no differences we use data with outlier or not
MODELING
Train Test Splitting
set.seed(100)
index <- sample (nrow(house), nrow(house)*0.8)
house_train<- house[index, ]
house_test <- house[-index, ]Chunk commentary:
- store data splitting in house_train and house_test
Choosen predictor
our.model <- lm(formula = Prices ~ Area + Floors + marbles + Fiber + City , data = house_train)
summary(our.model)##
## Call:
## lm(formula = Prices ~ Area + Floors + marbles + Fiber + City,
## data = house_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8506.2 -2475.2 3.3 2469.5 8496.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20742.38484 16.68364 1243.3 <0.0000000000000002 ***
## Area 24.92336 0.07331 340.0 <0.0000000000000002 ***
## Floorsyes 14992.14027 10.52528 1424.4 <0.0000000000000002 ***
## marblesIndian -4997.37808 12.89166 -387.6 <0.0000000000000002 ***
## marblesWhite 9010.52589 12.90038 698.5 <0.0000000000000002 ***
## Fiberyes 11737.26117 10.52526 1115.2 <0.0000000000000002 ***
## City2 3501.06027 12.89352 271.5 <0.0000000000000002 ***
## City3 7003.32070 12.89307 543.2 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3328 on 399991 degrees of freedom
## Multiple R-squared: 0.9245, Adjusted R-squared: 0.9245
## F-statistic: 7.001e+05 on 7 and 399991 DF, p-value: < 0.00000000000000022
Interpretasi koefisien: Setiap kenaikan 1 nilai pada Bath maka price bertambah sebesar 1245.565
Setiap kenaikan 1 nilai pada FirePlace maka price berkurang sebesar 751.936
Setiap Unit yang memilik Floors maka price bertambah sebesar 14994.281
Setiap Unit yang memilik Marbles.Indian maka price berkurang sebesar 5004.175
Step wise predictor
All mode and none predictor Model
All predictor stored in all.model and no perdictable variable stored in none.model
##
## Call:
## lm(formula = Prices ~ ., data = house_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127.1 -124.7 -122.6 125.3 127.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9625.454893 1.018735 9448.441 < 0.0000000000000002 ***
## Floorsyes 14999.402578 0.395286 37945.693 < 0.0000000000000002 ***
## Fiberyes 11750.084214 0.395292 29725.078 < 0.0000000000000002 ***
## marblesIndian -5000.565594 0.484159 -10328.358 < 0.0000000000000002 ***
## marblesWhite 8999.135046 0.484486 18574.622 < 0.0000000000000002 ***
## Glass.Doorsyes 4450.043568 0.395291 11257.642 < 0.0000000000000002 ***
## City2 3500.104138 0.484229 7228.205 < 0.0000000000000002 ***
## City3 6999.621004 0.484211 14455.729 < 0.0000000000000002 ***
## Baths 1249.945704 0.139769 8942.942 < 0.0000000000000002 ***
## FirePlace 749.999688 0.139780 5365.575 < 0.0000000000000002 ***
## Garage 1500.253660 0.241904 6201.856 < 0.0000000000000002 ***
## Area 25.000557 0.002753 9080.103 < 0.0000000000000002 ***
## Electricyes 1250.542534 0.395284 3163.660 < 0.0000000000000002 ***
## Swiming.Poolyes -0.034082 0.395289 -0.086 0.93129
## Gardenyes -1.246035 0.395291 -3.152 0.00162 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125 on 399984 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.684e+08 on 14 and 399984 DF, p-value: < 0.00000000000000022
Backward model
##
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors +
## City + Baths + FirePlace + Garage + Area + Electric + Garden,
## data = house_train)
##
## Coefficients:
## (Intercept) Floorsyes Fiberyes marblesIndian marblesWhite
## 9625.438 14999.403 11750.084 -5000.566 8999.135
## Glass.Doorsyes City2 City3 Baths FirePlace
## 4450.044 3500.104 6999.621 1249.946 750.000
## Garage Area Electricyes Gardenyes
## 1500.254 25.001 1250.543 -1.246
backward.model <-lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors +
City + Baths + FirePlace + Garage + Area + Electric + Garden,
data = house_train)
summary(backward.model)##
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors +
## City + Baths + FirePlace + Garage + Area + Electric + Garden,
## data = house_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127.1 -124.7 -122.6 125.3 127.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9625.438006 0.999730 9628.042 < 0.0000000000000002 ***
## Floorsyes 14999.402624 0.395285 37945.775 < 0.0000000000000002 ***
## Fiberyes 11750.084045 0.395287 29725.479 < 0.0000000000000002 ***
## marblesIndian -5000.565586 0.484158 -10328.371 < 0.0000000000000002 ***
## marblesWhite 8999.135104 0.484484 18574.663 < 0.0000000000000002 ***
## Glass.Doorsyes 4450.043544 0.395290 11257.659 < 0.0000000000000002 ***
## City2 3500.104127 0.484228 7228.214 < 0.0000000000000002 ***
## City3 6999.620964 0.484210 14455.753 < 0.0000000000000002 ***
## Baths 1249.945686 0.139769 8942.962 < 0.0000000000000002 ***
## FirePlace 749.999688 0.139780 5365.581 < 0.0000000000000002 ***
## Garage 1500.253647 0.241904 6201.865 < 0.0000000000000002 ***
## Area 25.000557 0.002753 9080.118 < 0.0000000000000002 ***
## Electricyes 1250.542530 0.395283 3163.664 < 0.0000000000000002 ***
## Gardenyes -1.246040 0.395290 -3.152 0.00162 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125 on 399985 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.891e+08 on 13 and 399985 DF, p-value: < 0.00000000000000022
Forward model
forward model
##
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + City + Glass.Doors +
## Area + Baths + Garage + FirePlace + Electric + Garden, data = house_train)
##
## Coefficients:
## (Intercept) Floorsyes Fiberyes marblesIndian marblesWhite
## 9625.438 14999.403 11750.084 -5000.566 8999.135
## City2 City3 Glass.Doorsyes Area Baths
## 3500.104 6999.621 4450.044 25.001 1249.946
## Garage FirePlace Electricyes Gardenyes
## 1500.254 750.000 1250.543 -1.246
forward.model <- lm(formula = Prices ~ Floors + Fiber + marbles + City + Glass.Doors +
Area + Baths + Garage + FirePlace + Electric + Garden, data = house_train)
summary(forward.model)##
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + City + Glass.Doors +
## Area + Baths + Garage + FirePlace + Electric + Garden, data = house_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127.1 -124.7 -122.6 125.3 127.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9625.438006 0.999730 9628.042 < 0.0000000000000002 ***
## Floorsyes 14999.402624 0.395285 37945.775 < 0.0000000000000002 ***
## Fiberyes 11750.084045 0.395287 29725.479 < 0.0000000000000002 ***
## marblesIndian -5000.565586 0.484158 -10328.371 < 0.0000000000000002 ***
## marblesWhite 8999.135104 0.484484 18574.663 < 0.0000000000000002 ***
## City2 3500.104127 0.484228 7228.214 < 0.0000000000000002 ***
## City3 6999.620964 0.484210 14455.753 < 0.0000000000000002 ***
## Glass.Doorsyes 4450.043544 0.395290 11257.659 < 0.0000000000000002 ***
## Area 25.000557 0.002753 9080.118 < 0.0000000000000002 ***
## Baths 1249.945686 0.139769 8942.962 < 0.0000000000000002 ***
## Garage 1500.253647 0.241904 6201.865 < 0.0000000000000002 ***
## FirePlace 749.999688 0.139780 5365.581 < 0.0000000000000002 ***
## Electricyes 1250.542530 0.395283 3163.664 < 0.0000000000000002 ***
## Gardenyes -1.246040 0.395290 -3.152 0.00162 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125 on 399985 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.891e+08 on 13 and 399985 DF, p-value: < 0.00000000000000022
Both model
##
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors +
## City + Baths + FirePlace + Garage + Area + Electric + Garden,
## data = house_train)
##
## Coefficients:
## (Intercept) Floorsyes Fiberyes marblesIndian marblesWhite
## 9625.438 14999.403 11750.084 -5000.566 8999.135
## Glass.Doorsyes City2 City3 Baths FirePlace
## 4450.044 3500.104 6999.621 1249.946 750.000
## Garage Area Electricyes Gardenyes
## 1500.254 25.001 1250.543 -1.246
both.model <- lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors +
City + Baths + FirePlace + Garage + Area + Electric + Garden,
data = house_train)
summary(both.model)##
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors +
## City + Baths + FirePlace + Garage + Area + Electric + Garden,
## data = house_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127.1 -124.7 -122.6 125.3 127.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9625.438006 0.999730 9628.042 < 0.0000000000000002 ***
## Floorsyes 14999.402624 0.395285 37945.775 < 0.0000000000000002 ***
## Fiberyes 11750.084045 0.395287 29725.479 < 0.0000000000000002 ***
## marblesIndian -5000.565586 0.484158 -10328.371 < 0.0000000000000002 ***
## marblesWhite 8999.135104 0.484484 18574.663 < 0.0000000000000002 ***
## Glass.Doorsyes 4450.043544 0.395290 11257.659 < 0.0000000000000002 ***
## City2 3500.104127 0.484228 7228.214 < 0.0000000000000002 ***
## City3 6999.620964 0.484210 14455.753 < 0.0000000000000002 ***
## Baths 1249.945686 0.139769 8942.962 < 0.0000000000000002 ***
## FirePlace 749.999688 0.139780 5365.581 < 0.0000000000000002 ***
## Garage 1500.253647 0.241904 6201.865 < 0.0000000000000002 ***
## Area 25.000557 0.002753 9080.118 < 0.0000000000000002 ***
## Electricyes 1250.542530 0.395283 3163.664 < 0.0000000000000002 ***
## Gardenyes -1.246040 0.395290 -3.152 0.00162 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 125 on 399985 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 2.891e+08 on 13 and 399985 DF, p-value: < 0.00000000000000022
Prediction
Based on evaluation test that we have, we before Continue to “Checking Assumption” Choosen Predictor has a higher Error in RMSE, MAE, and MSE
Choosen Predictor RMSE
our.pred <- predict(object = our.model, newdata = house_test, type = "response", interval = "confidence", level = 0.95)
RMSE(our.pred, house_test$Prices)## [1] 3329.968
## [1] 11088690
## [1] 2725.801
Stepwise Predictor RMSE C001
Backward RMSE
All stepwise model results have the same results pick backward model
backward.pred <- predict(object = backward.model, newdata = house_test, type = "response", interval = "confidence", level = 0.95)
RMSE(backward.pred, house_test$Prices)## [1] 125.0077
## [1] 15626.93
## [1] 124.9986
Comparing Adjsted R squared and RMSE/MAE/MSE C001
Adjusted R squared
We found out Stepwise model is a better model considering adj.r.squared
Choosen Predictor Model
## [1] 0.9245396
backward.model
## [1] 0.9998936
forward.model
## [1] 0.9998936
both.model
## [1] 0.9998936
CHECKING ASSUMPTIONS
Normality
What if data is not distributed normal like in stepwise model?
- find new model base on business insight, check it until pass assumption and make sure residuals distributed normally
- add more data
Expectation when making linear regression models, the resulting errors are normally distributed. This means that many errors gather around the number 0. To test this assumption can be done: Visualization of residual histograms, using the hist () function.
Saphiro Test’s cannot be used since the the sample size is more than 5000
backward model
H0: Residual spreads normally H1: Residuals do not spread normally
if p-value <alpha (0.05) then reject h0 Conclusion reject H0 residuals are declared not normal when it is not p-value> 0.05 (assumptions are not met)
### forward model H0: Residual spreads normally H1: Residuals do not spread normally
if p-value <alpha (0.05) then reject h0 Coclusion: reject H0 residuals are declared not normal when it is not p-value> 0.05 (assumptions are not met)
Both model
H0: Residual spreads normally H1: Residuals do not spread normally
if p-value <alpha (0.05) then reject h0 conclusion: reject H0 residuals are declared not normal when it is not p-value> 0.05 (assumptions are not met)
Choosen Predictor
H0: Residual spreads normally H1: Residuals do not spread normally
if p-value <alpha (0.05) then reject h0
Conclusion: Failed to reject H0 residuals are declared normal when it is not p-value> 0.05 (assumptions are met)
## Warning in ks.test(our.model$residuals, "pnorm", mean =
## mean(our.model$residuals), : ties should not be present for the Kolmogorov-
## Smirnov test
##
## One-sample Kolmogorov-Smirnov test
##
## data: our.model$residuals
## D = 0.026941, p-value < 0.00000000000000022
## alternative hypothesis: two-sided
Homocedasticity
With Breusch-Pagan from the lmtest package Breusch-Pagan hypothesis test: (the expectation is pvalue> alpha) H0: Variance error spreads constant (Homoscedasticity) H1: Variance error spreads is not constant / forming pattern (Heteroscedasticity)
Conclusion three Models Failed to reject H0 means ALL three models is homocedasticity
##
## studentized Breusch-Pagan test
##
## data: our.model
## BP = 2.9273, df = 7, p-value = 0.8916
##
## studentized Breusch-Pagan test
##
## data: backward.model
## BP = 10.455, df = 13, p-value = 0.6563
##
## studentized Breusch-Pagan test
##
## data: forward.model
## BP = 10.455, df = 13, p-value = 0.6563
##
## studentized Breusch-Pagan test
##
## data: both.model
## BP = 10.455, df = 13, p-value = 0.6563
Multicolinearity
multicolinarity: Multicollinearity exists whenever an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation.
When the VIF value is more than 10, it means multicollinearity. hopes to get VIF <10
Choosen Model
## GVIF Df GVIF^(1/(2*Df))
## Area 1.000022 1 1.000011
## Floors 1.000013 1 1.000007
## marbles 1.000010 2 1.000003
## Fiber 1.000012 1 1.000006
## City 1.000030 2 1.000007
Stepwisemodel
## GVIF Df GVIF^(1/(2*Df))
## Floors 1.000019 1 1.000009
## Fiber 1.000028 1 1.000014
## marbles 1.000028 2 1.000007
## City 1.000050 2 1.000013
## Glass.Doors 1.000046 1 1.000023
## Area 1.000032 1 1.000016
## Baths 1.000044 1 1.000022
## Garage 1.000026 1 1.000013
## FirePlace 1.000015 1 1.000007
## Electric 1.000009 1 1.000005
## Garden 1.000043 1 1.000021
## GVIF Df GVIF^(1/(2*Df))
## Floors 1.000019 1 1.000009
## Fiber 1.000028 1 1.000014
## marbles 1.000028 2 1.000007
## Glass.Doors 1.000046 1 1.000023
## City 1.000050 2 1.000013
## Baths 1.000044 1 1.000022
## FirePlace 1.000015 1 1.000007
## Garage 1.000026 1 1.000013
## Area 1.000032 1 1.000016
## Electric 1.000009 1 1.000005
## Garden 1.000043 1 1.000021
## GVIF Df GVIF^(1/(2*Df))
## Floors 1.000019 1 1.000009
## Fiber 1.000028 1 1.000014
## marbles 1.000028 2 1.000007
## City 1.000050 2 1.000013
## Glass.Doors 1.000046 1 1.000023
## Area 1.000032 1 1.000016
## Baths 1.000044 1 1.000022
## Garage 1.000026 1 1.000013
## FirePlace 1.000015 1 1.000007
## Electric 1.000009 1 1.000005
## Garden 1.000043 1 1.000021
Chunk Commentary:
- There is no Multicolinearity in this test
CONCLUSION
From the results of the model created and after conducting several evaluation tests, the model is formed from stepwise regression backward, forward, and both meet the multicolinearity, homocedasticity test, but do not meet the normality test. After choosing The predictor sees the correlation of the linear test and coupled with the insight of the business that we have, finally we choose the predictor such as marbles, Fiber and Floors.
some business Recomendation From Choosen Model
- Indian Marbles make Price cheaper than the other Marbles
- Floor, Electric and Fiber make Price way more expensive than other Variables
- FirePlace is not expensive than other Variables.