In this problem you will develop a model to predict whether a given car gets high or low gas mileage based on the Auto data set from the ISLR2 package.
Create a binary variable, mpg01, which equals a 1 if mpg contains a value above its median, and a 0 if mpg contains a value below its median. You can compute the median using the median() function. This variable should be added to the Auto dataset using the $ operator.
Auto$mpg
## [1] 18.0 15.0 18.0 16.0 17.0 15.0 14.0 14.0 14.0 15.0 15.0 14.0 15.0 14.0 24.0
## [16] 22.0 18.0 21.0 27.0 26.0 25.0 24.0 25.0 26.0 21.0 10.0 10.0 11.0 9.0 27.0
## [31] 28.0 25.0 19.0 16.0 17.0 19.0 18.0 14.0 14.0 14.0 14.0 12.0 13.0 13.0 18.0
## [46] 22.0 19.0 18.0 23.0 28.0 30.0 30.0 31.0 35.0 27.0 26.0 24.0 25.0 23.0 20.0
## [61] 21.0 13.0 14.0 15.0 14.0 17.0 11.0 13.0 12.0 13.0 19.0 15.0 13.0 13.0 14.0
## [76] 18.0 22.0 21.0 26.0 22.0 28.0 23.0 28.0 27.0 13.0 14.0 13.0 14.0 15.0 12.0
## [91] 13.0 13.0 14.0 13.0 12.0 13.0 18.0 16.0 18.0 18.0 23.0 26.0 11.0 12.0 13.0
## [106] 12.0 18.0 20.0 21.0 22.0 18.0 19.0 21.0 26.0 15.0 16.0 29.0 24.0 20.0 19.0
## [121] 15.0 24.0 20.0 11.0 20.0 19.0 15.0 31.0 26.0 32.0 25.0 16.0 16.0 18.0 16.0
## [136] 13.0 14.0 14.0 14.0 29.0 26.0 26.0 31.0 32.0 28.0 24.0 26.0 24.0 26.0 31.0
## [151] 19.0 18.0 15.0 15.0 16.0 15.0 16.0 14.0 17.0 16.0 15.0 18.0 21.0 20.0 13.0
## [166] 29.0 23.0 20.0 23.0 24.0 25.0 24.0 18.0 29.0 19.0 23.0 23.0 22.0 25.0 33.0
## [181] 28.0 25.0 25.0 26.0 27.0 17.5 16.0 15.5 14.5 22.0 22.0 24.0 22.5 29.0 24.5
## [196] 29.0 33.0 20.0 18.0 18.5 17.5 29.5 32.0 28.0 26.5 20.0 13.0 19.0 19.0 16.5
## [211] 16.5 13.0 13.0 13.0 31.5 30.0 36.0 25.5 33.5 17.5 17.0 15.5 15.0 17.5 20.5
## [226] 19.0 18.5 16.0 15.5 15.5 16.0 29.0 24.5 26.0 25.5 30.5 33.5 30.0 30.5 22.0
## [241] 21.5 21.5 43.1 36.1 32.8 39.4 36.1 19.9 19.4 20.2 19.2 20.5 20.2 25.1 20.5
## [256] 19.4 20.6 20.8 18.6 18.1 19.2 17.7 18.1 17.5 30.0 27.5 27.2 30.9 21.1 23.2
## [271] 23.8 23.9 20.3 17.0 21.6 16.2 31.5 29.5 21.5 19.8 22.3 20.2 20.6 17.0 17.6
## [286] 16.5 18.2 16.9 15.5 19.2 18.5 31.9 34.1 35.7 27.4 25.4 23.0 27.2 23.9 34.2
## [301] 34.5 31.8 37.3 28.4 28.8 26.8 33.5 41.5 38.1 32.1 37.2 28.0 26.4 24.3 19.1
## [316] 34.3 29.8 31.3 37.0 32.2 46.6 27.9 40.8 44.3 43.4 36.4 30.0 44.6 33.8 29.8
## [331] 32.7 23.7 35.0 32.4 27.2 26.6 25.8 23.5 30.0 39.1 39.0 35.1 32.3 37.0 37.7
## [346] 34.1 34.7 34.4 29.9 33.0 33.7 32.4 32.9 31.6 28.1 30.7 25.4 24.2 22.4 26.6
## [361] 20.2 17.6 28.0 27.0 34.0 31.0 29.0 27.0 24.0 36.0 37.0 31.0 38.0 36.0 36.0
## [376] 36.0 34.0 38.0 32.0 38.0 25.0 38.0 26.0 22.0 32.0 36.0 27.0 27.0 44.0 32.0
## [391] 28.0 31.0
Auto$mpg01 <- ifelse(Auto$mpg > median(Auto$mpg), 1, 0)
Create a training and testing set by dividing the data into two parts. There are 392 total observations. Designate the training set as the odd observations, and the testing set as even observations.
# Data Split
idx = seq(1,392,by=2)
idx
## [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35
## [19] 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
## [37] 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107
## [55] 109 111 113 115 117 119 121 123 125 127 129 131 133 135 137 139 141 143
## [73] 145 147 149 151 153 155 157 159 161 163 165 167 169 171 173 175 177 179
## [91] 181 183 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215
## [109] 217 219 221 223 225 227 229 231 233 235 237 239 241 243 245 247 249 251
## [127] 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287
## [145] 289 291 293 295 297 299 301 303 305 307 309 311 313 315 317 319 321 323
## [163] 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359
## [181] 361 363 365 367 369 371 373 375 377 379 381 383 385 387 389 391
training = Auto[idx,]
testing = Auto[-idx,]
training
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18.0 8 307 130 3504 12.0 70 1
## 3 18.0 8 318 150 3436 11.0 70 1
## 5 17.0 8 302 140 3449 10.5 70 1
## 7 14.0 8 454 220 4354 9.0 70 1
## 9 14.0 8 455 225 4425 10.0 70 1
## 11 15.0 8 383 170 3563 10.0 70 1
## 13 15.0 8 400 150 3761 9.5 70 1
## 15 24.0 4 113 95 2372 15.0 70 3
## 17 18.0 6 199 97 2774 15.5 70 1
## 19 27.0 4 97 88 2130 14.5 70 3
## 21 25.0 4 110 87 2672 17.5 70 2
## 23 25.0 4 104 95 2375 17.5 70 2
## 25 21.0 6 199 90 2648 15.0 70 1
## 27 10.0 8 307 200 4376 15.0 70 1
## 29 9.0 8 304 193 4732 18.5 70 1
## 31 28.0 4 140 90 2264 15.5 71 1
## 34 19.0 6 232 100 2634 13.0 71 1
## 36 17.0 6 250 100 3329 15.5 71 1
## 38 18.0 6 232 100 3288 15.5 71 1
## 40 14.0 8 400 175 4464 11.5 71 1
## 42 14.0 8 318 150 4096 13.0 71 1
## 44 13.0 8 400 170 4746 12.0 71 1
## 46 18.0 6 258 110 2962 13.5 71 1
## 48 19.0 6 250 100 3282 15.0 71 1
## 50 23.0 4 122 86 2220 14.0 71 1
## 52 30.0 4 79 70 2074 19.5 71 2
## 54 31.0 4 71 65 1773 19.0 71 3
## 56 27.0 4 97 60 1834 19.0 71 2
## 58 24.0 4 113 95 2278 15.5 72 3
## 60 23.0 4 97 54 2254 23.5 72 2
## 62 21.0 4 122 86 2226 16.5 72 1
## 64 14.0 8 400 175 4385 12.0 72 1
## 66 14.0 8 351 153 4129 13.0 72 1
## 68 11.0 8 429 208 4633 11.0 72 1
## 70 12.0 8 350 160 4456 13.5 72 1
## 72 19.0 3 70 97 2330 13.5 72 3
## 74 13.0 8 307 130 4098 14.0 72 1
## 76 14.0 8 318 150 4077 14.0 72 1
## 78 22.0 4 121 76 2511 18.0 72 2
## 80 26.0 4 96 69 2189 18.0 72 2
## 82 28.0 4 97 92 2288 17.0 72 3
## 84 28.0 4 98 80 2164 15.0 72 1
## 86 13.0 8 350 175 4100 13.0 73 1
## 88 13.0 8 350 145 3988 13.0 73 1
## 90 15.0 8 318 150 3777 12.5 73 1
## 92 13.0 8 400 150 4464 12.0 73 1
## 94 14.0 8 318 150 4237 14.5 73 1
## 96 12.0 8 455 225 4951 11.0 73 1
## 98 18.0 6 225 105 3121 16.5 73 1
## 100 18.0 6 232 100 2945 16.0 73 1
## 102 23.0 6 198 95 2904 16.0 73 1
## 104 11.0 8 400 150 4997 14.0 73 1
## 106 13.0 8 360 170 4654 13.0 73 1
## 108 18.0 6 232 100 2789 15.0 73 1
## 110 21.0 4 140 72 2401 19.5 73 1
## 112 18.0 3 70 90 2124 13.5 73 3
## 114 21.0 6 155 107 2472 14.0 73 1
## 116 15.0 8 350 145 4082 13.0 73 1
## 118 29.0 4 68 49 1867 19.5 73 2
## 120 20.0 4 114 91 2582 14.0 73 2
## 122 15.0 8 318 150 3399 11.0 73 1
## 124 20.0 6 156 122 2807 13.5 73 3
## 126 20.0 6 198 95 3102 16.5 74 1
## 129 15.0 6 250 100 3336 17.0 74 1
## 131 26.0 4 122 80 2451 16.5 74 1
## 133 25.0 4 140 75 2542 17.0 74 1
## 135 16.0 6 258 110 3632 18.0 74 1
## 137 16.0 8 302 140 4141 14.0 74 1
## 139 14.0 8 318 150 4457 13.5 74 1
## 141 14.0 8 304 150 4257 15.5 74 1
## 143 26.0 4 79 67 1963 15.5 74 2
## 145 31.0 4 76 52 1649 16.5 74 3
## 147 28.0 4 90 75 2125 14.5 74 1
## 149 26.0 4 116 75 2246 14.0 74 2
## 151 26.0 4 108 93 2391 15.5 74 3
## 153 19.0 6 225 95 3264 16.0 75 1
## 155 15.0 6 250 72 3432 21.0 75 1
## 157 16.0 8 400 170 4668 11.5 75 1
## 159 16.0 8 318 150 4498 14.5 75 1
## 161 17.0 6 231 110 3907 21.0 75 1
## 163 15.0 6 258 110 3730 19.0 75 1
## 165 21.0 6 231 110 3039 15.0 75 1
## 167 13.0 8 302 129 3169 12.0 75 1
## 169 23.0 4 140 83 2639 17.0 75 1
## 171 23.0 4 140 78 2592 18.5 75 1
## 173 25.0 4 90 71 2223 16.5 75 2
## 175 18.0 6 171 97 2984 14.5 75 1
## 177 19.0 6 232 90 3211 17.0 75 1
## 179 23.0 4 120 88 2957 17.0 75 2
## 181 25.0 4 121 115 2671 13.5 75 2
## 183 28.0 4 107 86 2464 15.5 76 2
## 185 25.0 4 140 92 2572 14.9 76 1
## 187 27.0 4 101 83 2202 15.3 76 2
## 189 16.0 8 318 150 4190 13.0 76 1
## 191 14.5 8 351 152 4215 12.8 76 1
## 193 22.0 6 250 105 3353 14.5 76 1
## 195 22.5 6 232 90 3085 17.6 76 1
## 197 24.5 4 98 60 2164 22.1 76 1
## 199 33.0 4 91 53 1795 17.4 76 3
## 201 18.0 6 250 78 3574 21.0 76 1
## 203 17.5 6 258 95 3193 17.8 76 1
## 205 32.0 4 85 70 1990 17.0 76 3
## 207 26.5 4 140 72 2565 13.6 76 1
## 209 13.0 8 318 150 3940 13.2 76 1
## 211 19.0 6 156 108 2930 15.5 76 3
## 213 16.5 8 350 180 4380 12.1 76 1
## 215 13.0 8 302 130 3870 15.0 76 1
## 217 31.5 4 98 68 2045 18.5 77 3
## 219 36.0 4 79 58 1825 18.6 77 2
## 221 33.5 4 85 70 1945 16.8 77 3
## 223 17.0 8 260 110 4060 19.0 77 1
## 225 15.0 8 302 130 4295 14.9 77 1
## 227 20.5 6 231 105 3425 16.9 77 1
## 229 18.5 6 250 98 3525 19.0 77 1
## 231 15.5 8 350 170 4165 11.4 77 1
## 233 16.0 8 351 149 4335 14.5 77 1
## 235 24.5 4 151 88 2740 16.0 77 1
## 237 25.5 4 140 89 2755 15.8 77 1
## 239 33.5 4 98 83 2075 15.9 77 1
## 241 30.5 4 97 78 2190 14.1 77 2
## 243 21.5 4 121 110 2600 12.8 77 2
## 245 43.1 4 90 48 1985 21.5 78 2
## 247 32.8 4 78 52 1985 19.4 78 3
## 249 36.1 4 91 60 1800 16.4 78 3
## 251 19.4 8 318 140 3735 13.2 78 1
## 253 19.2 6 231 105 3535 19.2 78 1
## 255 20.2 6 200 85 2965 15.8 78 1
## 257 20.5 6 225 100 3430 17.2 78 1
## 259 20.6 6 231 105 3380 15.8 78 1
## 261 18.6 6 225 110 3620 18.7 78 1
## 263 19.2 8 305 145 3425 13.2 78 1
## 265 18.1 8 302 139 3205 11.2 78 1
## 267 30.0 4 98 68 2155 16.5 78 1
## 269 27.2 4 119 97 2300 14.7 78 3
## 271 21.1 4 134 95 2515 14.8 78 3
## 273 23.8 4 151 85 2855 17.6 78 1
## 275 20.3 5 131 103 2830 15.9 78 2
## 277 21.6 4 121 115 2795 15.7 78 2
## 279 31.5 4 89 71 1990 14.9 78 2
## 281 21.5 6 231 115 3245 15.4 79 1
## 283 22.3 4 140 88 2890 17.3 79 1
## 285 20.6 6 225 110 3360 16.6 79 1
## 287 17.6 8 302 129 3725 13.4 79 1
## 289 18.2 8 318 135 3830 15.2 79 1
## 291 15.5 8 351 142 4054 14.3 79 1
## 293 18.5 8 360 150 3940 13.0 79 1
## 295 34.1 4 86 65 1975 15.2 79 3
## 297 27.4 4 121 80 2670 15.0 79 1
## 299 23.0 8 350 125 3900 17.4 79 1
## 301 23.9 8 260 90 3420 22.2 79 1
## 303 34.5 4 105 70 2150 14.9 79 1
## 305 37.3 4 91 69 2130 14.7 79 2
## 307 28.8 6 173 115 2595 11.3 79 1
## 309 33.5 4 151 90 2556 13.2 79 1
## 311 38.1 4 89 60 1968 18.8 80 3
## 313 37.2 4 86 65 2019 16.4 80 3
## 315 26.4 4 140 88 2870 18.1 80 1
## 317 19.1 6 225 90 3381 18.7 80 1
## 319 29.8 4 134 90 2711 15.5 80 3
## 321 37.0 4 119 92 2434 15.0 80 3
## 323 46.6 4 86 65 2110 17.9 80 3
## 325 40.8 4 85 65 2110 19.2 80 3
## 327 43.4 4 90 48 2335 23.7 80 2
## 329 30.0 4 146 67 3250 21.8 80 2
## 332 33.8 4 97 67 2145 18.0 80 3
## 334 32.7 6 168 132 2910 11.4 80 3
## 336 35.0 4 122 88 2500 15.1 80 2
## 339 27.2 4 135 84 2490 15.7 81 1
## 341 25.8 4 156 92 2620 14.4 81 1
## 343 30.0 4 135 84 2385 12.9 81 1
## 345 39.0 4 86 64 1875 16.4 81 1
## 347 32.3 4 97 67 2065 17.8 81 3
## 349 37.7 4 89 62 2050 17.3 81 3
## 351 34.7 4 105 63 2215 14.9 81 1
## 353 29.9 4 98 65 2380 20.7 81 1
## 356 33.7 4 107 75 2210 14.4 81 3
## 358 32.9 4 119 100 2615 14.8 81 3
## 360 28.1 4 141 80 3230 20.4 81 2
## 362 25.4 6 168 116 2900 12.6 81 3
## 364 22.4 6 231 110 3415 15.8 81 1
## 366 20.2 6 200 88 3060 17.1 81 1
## 368 28.0 4 112 88 2605 19.6 82 1
## 370 34.0 4 112 88 2395 18.0 82 1
## 372 29.0 4 135 84 2525 16.0 82 1
## 374 24.0 4 140 92 2865 16.4 82 1
## 376 37.0 4 91 68 2025 18.2 82 3
## 378 38.0 4 105 63 2125 14.7 82 1
## 380 36.0 4 120 88 2160 14.5 82 3
## 382 34.0 4 108 70 2245 16.9 82 3
## 384 32.0 4 91 67 1965 15.7 82 3
## 386 25.0 6 181 110 2945 16.4 82 1
## 388 26.0 4 156 92 2585 14.5 82 1
## 390 32.0 4 144 96 2665 13.9 82 3
## 392 27.0 4 151 90 2950 17.3 82 1
## 394 44.0 4 97 52 2130 24.6 82 2
## 396 28.0 4 120 79 2625 18.6 82 1
## name mpg01
## 1 chevrolet chevelle malibu 0
## 3 plymouth satellite 0
## 5 ford torino 0
## 7 chevrolet impala 0
## 9 pontiac catalina 0
## 11 dodge challenger se 0
## 13 chevrolet monte carlo 0
## 15 toyota corona mark ii 1
## 17 amc hornet 0
## 19 datsun pl510 1
## 21 peugeot 504 1
## 23 saab 99e 1
## 25 amc gremlin 0
## 27 chevy c20 0
## 29 hi 1200d 0
## 31 chevrolet vega 2300 1
## 34 amc gremlin 0
## 36 chevrolet chevelle malibu 0
## 38 amc matador 0
## 40 pontiac catalina brougham 0
## 42 plymouth fury iii 0
## 44 ford country squire (sw) 0
## 46 amc hornet sportabout (sw) 0
## 48 pontiac firebird 0
## 50 mercury capri 2000 1
## 52 peugeot 304 1
## 54 toyota corolla 1200 1
## 56 volkswagen model 111 1
## 58 toyota corona hardtop 1
## 60 volkswagen type 3 1
## 62 ford pinto runabout 0
## 64 pontiac catalina 0
## 66 ford galaxie 500 0
## 68 mercury marquis 0
## 70 oldsmobile delta 88 royale 0
## 72 mazda rx2 coupe 0
## 74 chevrolet chevelle concours (sw) 0
## 76 plymouth satellite custom (sw) 0
## 78 volkswagen 411 (sw) 0
## 80 renault 12 (sw) 1
## 82 datsun 510 (sw) 1
## 84 dodge colt (sw) 1
## 86 buick century 350 0
## 88 chevrolet malibu 0
## 90 dodge coronet custom 0
## 92 chevrolet caprice classic 0
## 94 plymouth fury gran sedan 0
## 96 buick electra 225 custom 0
## 98 plymouth valiant 0
## 100 amc hornet 0
## 102 plymouth duster 1
## 104 chevrolet impala 0
## 106 plymouth custom suburb 0
## 108 amc gremlin 0
## 110 chevrolet vega 0
## 112 maxda rx3 0
## 114 mercury capri v6 0
## 116 chevrolet monte carlo s 0
## 118 fiat 128 1
## 120 audi 100ls 0
## 122 dodge dart custom 0
## 124 toyota mark ii 0
## 126 plymouth duster 0
## 129 chevrolet nova 0
## 131 ford pinto 1
## 133 chevrolet vega 1
## 135 amc matador 0
## 137 ford gran torino 0
## 139 dodge coronet custom (sw) 0
## 141 amc matador (sw) 0
## 143 volkswagen dasher 1
## 145 toyota corona 1
## 147 dodge colt 1
## 149 fiat 124 tc 1
## 151 subaru 1
## 153 plymouth valiant custom 0
## 155 mercury monarch 0
## 157 pontiac catalina 0
## 159 plymouth grand fury 0
## 161 buick century 0
## 163 amc matador 0
## 165 buick skyhawk 0
## 167 ford mustang ii 0
## 169 ford pinto 1
## 171 pontiac astro 1
## 173 volkswagen dasher 1
## 175 ford pinto 0
## 177 amc pacer 0
## 179 peugeot 504 1
## 181 saab 99le 1
## 183 fiat 131 1
## 185 capri ii 1
## 187 renault 12tl 1
## 189 dodge coronet brougham 0
## 191 ford gran torino 0
## 193 chevrolet nova 0
## 195 amc hornet 0
## 197 chevrolet woody 1
## 199 honda civic 1
## 201 ford granada ghia 0
## 203 amc pacer d/l 0
## 205 datsun b-210 1
## 207 ford pinto 1
## 209 plymouth volare premier v8 0
## 211 toyota mark ii 0
## 213 cadillac seville 0
## 215 ford f108 0
## 217 honda accord cvcc 1
## 219 renault 5 gtl 1
## 221 datsun f-10 hatchback 1
## 223 oldsmobile cutlass supreme 0
## 225 mercury cougar brougham 0
## 227 buick skylark 0
## 229 ford granada 0
## 231 chevrolet monte carlo landau 0
## 233 ford thunderbird 0
## 235 pontiac sunbird coupe 1
## 237 ford mustang ii 2+2 1
## 239 dodge colt m/m 1
## 241 volkswagen dasher 1
## 243 bmw 320i 0
## 245 volkswagen rabbit custom diesel 1
## 247 mazda glc deluxe 1
## 249 honda civic cvcc 1
## 251 dodge diplomat 0
## 253 pontiac phoenix lj 0
## 255 ford fairmont (auto) 0
## 257 plymouth volare 0
## 259 buick century special 0
## 261 dodge aspen 0
## 263 chevrolet monte carlo landau 0
## 265 ford futura 0
## 267 chevrolet chevette 1
## 269 datsun 510 1
## 271 toyota celica gt liftback 0
## 273 oldsmobile starfire sx 1
## 275 audi 5000 0
## 277 saab 99gle 0
## 279 volkswagen scirocco 1
## 281 pontiac lemans v6 0
## 283 ford fairmont 4 0
## 285 dodge aspen 6 0
## 287 ford ltd landau 0
## 289 dodge st. regis 0
## 291 ford country squire (sw) 0
## 293 chrysler lebaron town @ country (sw) 0
## 295 maxda glc deluxe 1
## 297 amc spirit dl 1
## 299 cadillac eldorado 1
## 301 oldsmobile cutlass salon brougham 1
## 303 plymouth horizon tc3 1
## 305 fiat strada custom 1
## 307 chevrolet citation 1
## 309 pontiac phoenix 1
## 311 toyota corolla tercel 1
## 313 datsun 310 1
## 315 ford fairmont 1
## 317 dodge aspen 0
## 319 toyota corona liftback 1
## 321 datsun 510 hatchback 1
## 323 mazda glc 1
## 325 datsun 210 1
## 327 vw dasher (diesel) 1
## 329 mercedes-benz 240d 1
## 332 subaru dl 1
## 334 datsun 280-zx 1
## 336 triumph tr7 coupe 1
## 339 plymouth reliant 1
## 341 dodge aries wagon (sw) 1
## 343 plymouth reliant 1
## 345 plymouth champ 1
## 347 subaru 1
## 349 toyota tercel 1
## 351 plymouth horizon 4 1
## 353 ford escort 2h 1
## 356 honda prelude 1
## 358 datsun 200sx 1
## 360 peugeot 505s turbo diesel 1
## 362 toyota cressida 1
## 364 buick century 0
## 366 ford granada gl 0
## 368 chevrolet cavalier 1
## 370 chevrolet cavalier 2-door 1
## 372 dodge aries se 1
## 374 ford fairmont futura 1
## 376 mazda glc custom l 1
## 378 plymouth horizon miser 1
## 380 nissan stanza xe 1
## 382 toyota corolla 1
## 384 honda civic (auto) 1
## 386 buick century limited 1
## 388 chrysler lebaron medallion 1
## 390 toyota celica gt 1
## 392 chevrolet camaro 1
## 394 vw pickup 1
## 396 ford ranger 1
testing
## mpg cylinders displacement horsepower weight acceleration year origin
## 2 15.0 8 350.0 165 3693 11.5 70 1
## 4 16.0 8 304.0 150 3433 12.0 70 1
## 6 15.0 8 429.0 198 4341 10.0 70 1
## 8 14.0 8 440.0 215 4312 8.5 70 1
## 10 15.0 8 390.0 190 3850 8.5 70 1
## 12 14.0 8 340.0 160 3609 8.0 70 1
## 14 14.0 8 455.0 225 3086 10.0 70 1
## 16 22.0 6 198.0 95 2833 15.5 70 1
## 18 21.0 6 200.0 85 2587 16.0 70 1
## 20 26.0 4 97.0 46 1835 20.5 70 2
## 22 24.0 4 107.0 90 2430 14.5 70 2
## 24 26.0 4 121.0 113 2234 12.5 70 2
## 26 10.0 8 360.0 215 4615 14.0 70 1
## 28 11.0 8 318.0 210 4382 13.5 70 1
## 30 27.0 4 97.0 88 2130 14.5 71 3
## 32 25.0 4 113.0 95 2228 14.0 71 3
## 35 16.0 6 225.0 105 3439 15.5 71 1
## 37 19.0 6 250.0 88 3302 15.5 71 1
## 39 14.0 8 350.0 165 4209 12.0 71 1
## 41 14.0 8 351.0 153 4154 13.5 71 1
## 43 12.0 8 383.0 180 4955 11.5 71 1
## 45 13.0 8 400.0 175 5140 12.0 71 1
## 47 22.0 4 140.0 72 2408 19.0 71 1
## 49 18.0 6 250.0 88 3139 14.5 71 1
## 51 28.0 4 116.0 90 2123 14.0 71 2
## 53 30.0 4 88.0 76 2065 14.5 71 2
## 55 35.0 4 72.0 69 1613 18.0 71 3
## 57 26.0 4 91.0 70 1955 20.5 71 1
## 59 25.0 4 97.5 80 2126 17.0 72 1
## 61 20.0 4 140.0 90 2408 19.5 72 1
## 63 13.0 8 350.0 165 4274 12.0 72 1
## 65 15.0 8 318.0 150 4135 13.5 72 1
## 67 17.0 8 304.0 150 3672 11.5 72 1
## 69 13.0 8 350.0 155 4502 13.5 72 1
## 71 13.0 8 400.0 190 4422 12.5 72 1
## 73 15.0 8 304.0 150 3892 12.5 72 1
## 75 13.0 8 302.0 140 4294 16.0 72 1
## 77 18.0 4 121.0 112 2933 14.5 72 2
## 79 21.0 4 120.0 87 2979 19.5 72 2
## 81 22.0 4 122.0 86 2395 16.0 72 1
## 83 23.0 4 120.0 97 2506 14.5 72 3
## 85 27.0 4 97.0 88 2100 16.5 72 3
## 87 14.0 8 304.0 150 3672 11.5 73 1
## 89 14.0 8 302.0 137 4042 14.5 73 1
## 91 12.0 8 429.0 198 4952 11.5 73 1
## 93 13.0 8 351.0 158 4363 13.0 73 1
## 95 13.0 8 440.0 215 4735 11.0 73 1
## 97 13.0 8 360.0 175 3821 11.0 73 1
## 99 16.0 6 250.0 100 3278 18.0 73 1
## 101 18.0 6 250.0 88 3021 16.5 73 1
## 103 26.0 4 97.0 46 1950 21.0 73 2
## 105 12.0 8 400.0 167 4906 12.5 73 1
## 107 12.0 8 350.0 180 4499 12.5 73 1
## 109 20.0 4 97.0 88 2279 19.0 73 3
## 111 22.0 4 108.0 94 2379 16.5 73 3
## 113 19.0 4 122.0 85 2310 18.5 73 1
## 115 26.0 4 98.0 90 2265 15.5 73 2
## 117 16.0 8 400.0 230 4278 9.5 73 1
## 119 24.0 4 116.0 75 2158 15.5 73 2
## 121 19.0 4 121.0 112 2868 15.5 73 2
## 123 24.0 4 121.0 110 2660 14.0 73 2
## 125 11.0 8 350.0 180 3664 11.0 73 1
## 128 19.0 6 232.0 100 2901 16.0 74 1
## 130 31.0 4 79.0 67 1950 19.0 74 3
## 132 32.0 4 71.0 65 1836 21.0 74 3
## 134 16.0 6 250.0 100 3781 17.0 74 1
## 136 18.0 6 225.0 105 3613 16.5 74 1
## 138 13.0 8 350.0 150 4699 14.5 74 1
## 140 14.0 8 302.0 140 4638 16.0 74 1
## 142 29.0 4 98.0 83 2219 16.5 74 2
## 144 26.0 4 97.0 78 2300 14.5 74 2
## 146 32.0 4 83.0 61 2003 19.0 74 3
## 148 24.0 4 90.0 75 2108 15.5 74 2
## 150 24.0 4 120.0 97 2489 15.0 74 3
## 152 31.0 4 79.0 67 2000 16.0 74 2
## 154 18.0 6 250.0 105 3459 16.0 75 1
## 156 15.0 6 250.0 72 3158 19.5 75 1
## 158 15.0 8 350.0 145 4440 14.0 75 1
## 160 14.0 8 351.0 148 4657 13.5 75 1
## 162 16.0 6 250.0 105 3897 18.5 75 1
## 164 18.0 6 225.0 95 3785 19.0 75 1
## 166 20.0 8 262.0 110 3221 13.5 75 1
## 168 29.0 4 97.0 75 2171 16.0 75 3
## 170 20.0 6 232.0 100 2914 16.0 75 1
## 172 24.0 4 134.0 96 2702 13.5 75 3
## 174 24.0 4 119.0 97 2545 17.0 75 3
## 176 29.0 4 90.0 70 1937 14.0 75 2
## 178 23.0 4 115.0 95 2694 15.0 75 2
## 180 22.0 4 121.0 98 2945 14.5 75 2
## 182 33.0 4 91.0 53 1795 17.5 75 3
## 184 25.0 4 116.0 81 2220 16.9 76 2
## 186 26.0 4 98.0 79 2255 17.7 76 1
## 188 17.5 8 305.0 140 4215 13.0 76 1
## 190 15.5 8 304.0 120 3962 13.9 76 1
## 192 22.0 6 225.0 100 3233 15.4 76 1
## 194 24.0 6 200.0 81 3012 17.6 76 1
## 196 29.0 4 85.0 52 2035 22.2 76 1
## 198 29.0 4 90.0 70 1937 14.2 76 2
## 200 20.0 6 225.0 100 3651 17.7 76 1
## 202 18.5 6 250.0 110 3645 16.2 76 1
## 204 29.5 4 97.0 71 1825 12.2 76 2
## 206 28.0 4 97.0 75 2155 16.4 76 3
## 208 20.0 4 130.0 102 3150 15.7 76 2
## 210 19.0 4 120.0 88 3270 21.9 76 2
## 212 16.5 6 168.0 120 3820 16.7 76 2
## 214 13.0 8 350.0 145 4055 12.0 76 1
## 216 13.0 8 318.0 150 3755 14.0 76 1
## 218 30.0 4 111.0 80 2155 14.8 77 1
## 220 25.5 4 122.0 96 2300 15.5 77 1
## 222 17.5 8 305.0 145 3880 12.5 77 1
## 224 15.5 8 318.0 145 4140 13.7 77 1
## 226 17.5 6 250.0 110 3520 16.4 77 1
## 228 19.0 6 225.0 100 3630 17.7 77 1
## 230 16.0 8 400.0 180 4220 11.1 77 1
## 232 15.5 8 400.0 190 4325 12.2 77 1
## 234 29.0 4 97.0 78 1940 14.5 77 2
## 236 26.0 4 97.0 75 2265 18.2 77 3
## 238 30.5 4 98.0 63 2051 17.0 77 1
## 240 30.0 4 97.0 67 1985 16.4 77 3
## 242 22.0 6 146.0 97 2815 14.5 77 3
## 244 21.5 3 80.0 110 2720 13.5 77 3
## 246 36.1 4 98.0 66 1800 14.4 78 1
## 248 39.4 4 85.0 70 2070 18.6 78 3
## 250 19.9 8 260.0 110 3365 15.5 78 1
## 252 20.2 8 302.0 139 3570 12.8 78 1
## 254 20.5 6 200.0 95 3155 18.2 78 1
## 256 25.1 4 140.0 88 2720 15.4 78 1
## 258 19.4 6 232.0 90 3210 17.2 78 1
## 260 20.8 6 200.0 85 3070 16.7 78 1
## 262 18.1 6 258.0 120 3410 15.1 78 1
## 264 17.7 6 231.0 165 3445 13.4 78 1
## 266 17.5 8 318.0 140 4080 13.7 78 1
## 268 27.5 4 134.0 95 2560 14.2 78 3
## 270 30.9 4 105.0 75 2230 14.5 78 1
## 272 23.2 4 156.0 105 2745 16.7 78 1
## 274 23.9 4 119.0 97 2405 14.9 78 3
## 276 17.0 6 163.0 125 3140 13.6 78 2
## 278 16.2 6 163.0 133 3410 15.8 78 2
## 280 29.5 4 98.0 68 2135 16.6 78 3
## 282 19.8 6 200.0 85 2990 18.2 79 1
## 284 20.2 6 232.0 90 3265 18.2 79 1
## 286 17.0 8 305.0 130 3840 15.4 79 1
## 288 16.5 8 351.0 138 3955 13.2 79 1
## 290 16.9 8 350.0 155 4360 14.9 79 1
## 292 19.2 8 267.0 125 3605 15.0 79 1
## 294 31.9 4 89.0 71 1925 14.0 79 2
## 296 35.7 4 98.0 80 1915 14.4 79 1
## 298 25.4 5 183.0 77 3530 20.1 79 2
## 300 27.2 4 141.0 71 3190 24.8 79 2
## 302 34.2 4 105.0 70 2200 13.2 79 1
## 304 31.8 4 85.0 65 2020 19.2 79 3
## 306 28.4 4 151.0 90 2670 16.0 79 1
## 308 26.8 6 173.0 115 2700 12.9 79 1
## 310 41.5 4 98.0 76 2144 14.7 80 2
## 312 32.1 4 98.0 70 2120 15.5 80 1
## 314 28.0 4 151.0 90 2678 16.5 80 1
## 316 24.3 4 151.0 90 3003 20.1 80 1
## 318 34.3 4 97.0 78 2188 15.8 80 2
## 320 31.3 4 120.0 75 2542 17.5 80 3
## 322 32.2 4 108.0 75 2265 15.2 80 3
## 324 27.9 4 156.0 105 2800 14.4 80 1
## 326 44.3 4 90.0 48 2085 21.7 80 2
## 328 36.4 5 121.0 67 2950 19.9 80 2
## 330 44.6 4 91.0 67 1850 13.8 80 3
## 333 29.8 4 89.0 62 1845 15.3 80 2
## 335 23.7 3 70.0 100 2420 12.5 80 3
## 338 32.4 4 107.0 72 2290 17.0 80 3
## 340 26.6 4 151.0 84 2635 16.4 81 1
## 342 23.5 6 173.0 110 2725 12.6 81 1
## 344 39.1 4 79.0 58 1755 16.9 81 3
## 346 35.1 4 81.0 60 1760 16.1 81 3
## 348 37.0 4 85.0 65 1975 19.4 81 3
## 350 34.1 4 91.0 68 1985 16.0 81 3
## 352 34.4 4 98.0 65 2045 16.2 81 1
## 354 33.0 4 105.0 74 2190 14.2 81 2
## 357 32.4 4 108.0 75 2350 16.8 81 3
## 359 31.6 4 120.0 74 2635 18.3 81 3
## 361 30.7 6 145.0 76 3160 19.6 81 2
## 363 24.2 6 146.0 120 2930 13.8 81 3
## 365 26.6 8 350.0 105 3725 19.0 81 1
## 367 17.6 6 225.0 85 3465 16.6 81 1
## 369 27.0 4 112.0 88 2640 18.6 82 1
## 371 31.0 4 112.0 85 2575 16.2 82 1
## 373 27.0 4 151.0 90 2735 18.0 82 1
## 375 36.0 4 105.0 74 1980 15.3 82 2
## 377 31.0 4 91.0 68 1970 17.6 82 3
## 379 36.0 4 98.0 70 2125 17.3 82 1
## 381 36.0 4 107.0 75 2205 14.5 82 3
## 383 38.0 4 91.0 67 1965 15.0 82 3
## 385 38.0 4 91.0 67 1995 16.2 82 3
## 387 38.0 6 262.0 85 3015 17.0 82 1
## 389 22.0 6 232.0 112 2835 14.7 82 1
## 391 36.0 4 135.0 84 2370 13.0 82 1
## 393 27.0 4 140.0 86 2790 15.6 82 1
## 395 32.0 4 135.0 84 2295 11.6 82 1
## 397 31.0 4 119.0 82 2720 19.4 82 1
## name mpg01
## 2 buick skylark 320 0
## 4 amc rebel sst 0
## 6 ford galaxie 500 0
## 8 plymouth fury iii 0
## 10 amc ambassador dpl 0
## 12 plymouth 'cuda 340 0
## 14 buick estate wagon (sw) 0
## 16 plymouth duster 0
## 18 ford maverick 0
## 20 volkswagen 1131 deluxe sedan 1
## 22 audi 100 ls 1
## 24 bmw 2002 1
## 26 ford f250 0
## 28 dodge d200 0
## 30 datsun pl510 1
## 32 toyota corona 1
## 35 plymouth satellite custom 0
## 37 ford torino 500 0
## 39 chevrolet impala 0
## 41 ford galaxie 500 0
## 43 dodge monaco (sw) 0
## 45 pontiac safari (sw) 0
## 47 chevrolet vega (sw) 0
## 49 ford mustang 0
## 51 opel 1900 1
## 53 fiat 124b 1
## 55 datsun 1200 1
## 57 plymouth cricket 1
## 59 dodge colt hardtop 1
## 61 chevrolet vega 0
## 63 chevrolet impala 0
## 65 plymouth fury iii 0
## 67 amc ambassador sst 0
## 69 buick lesabre custom 0
## 71 chrysler newport royal 0
## 73 amc matador (sw) 0
## 75 ford gran torino (sw) 0
## 77 volvo 145e (sw) 0
## 79 peugeot 504 (sw) 0
## 81 ford pinto (sw) 0
## 83 toyouta corona mark ii (sw) 1
## 85 toyota corolla 1600 (sw) 1
## 87 amc matador 0
## 89 ford gran torino 0
## 91 mercury marquis brougham 0
## 93 ford ltd 0
## 95 chrysler new yorker brougham 0
## 97 amc ambassador brougham 0
## 99 chevrolet nova custom 0
## 101 ford maverick 0
## 103 volkswagen super beetle 1
## 105 ford country 0
## 107 oldsmobile vista cruiser 0
## 109 toyota carina 0
## 111 datsun 610 0
## 113 ford pinto 0
## 115 fiat 124 sport coupe 1
## 117 pontiac grand prix 0
## 119 opel manta 1
## 121 volvo 144ea 0
## 123 saab 99le 1
## 125 oldsmobile omega 0
## 128 amc hornet 0
## 130 datsun b210 1
## 132 toyota corolla 1200 1
## 134 chevrolet chevelle malibu classic 0
## 136 plymouth satellite sebring 0
## 138 buick century luxus (sw) 0
## 140 ford gran torino (sw) 0
## 142 audi fox 1
## 144 opel manta 1
## 146 datsun 710 1
## 148 fiat 128 1
## 150 honda civic 1
## 152 fiat x1.9 1
## 154 chevrolet nova 0
## 156 ford maverick 0
## 158 chevrolet bel air 0
## 160 ford ltd 0
## 162 chevroelt chevelle malibu 0
## 164 plymouth fury 0
## 166 chevrolet monza 2+2 0
## 168 toyota corolla 1
## 170 amc gremlin 0
## 172 toyota corona 1
## 174 datsun 710 1
## 176 volkswagen rabbit 1
## 178 audi 100ls 1
## 180 volvo 244dl 0
## 182 honda civic cvcc 1
## 184 opel 1900 1
## 186 dodge colt 1
## 188 chevrolet chevelle malibu classic 0
## 190 amc matador 0
## 192 plymouth valiant 0
## 194 ford maverick 1
## 196 chevrolet chevette 1
## 198 vw rabbit 1
## 200 dodge aspen se 0
## 202 pontiac ventura sj 0
## 204 volkswagen rabbit 1
## 206 toyota corolla 1
## 208 volvo 245 0
## 210 peugeot 504 0
## 212 mercedes-benz 280s 0
## 214 chevy c10 0
## 216 dodge d100 0
## 218 buick opel isuzu deluxe 1
## 220 plymouth arrow gs 1
## 222 chevrolet caprice classic 0
## 224 dodge monaco brougham 0
## 226 chevrolet concours 0
## 228 plymouth volare custom 0
## 230 pontiac grand prix lj 0
## 232 chrysler cordoba 0
## 234 volkswagen rabbit custom 1
## 236 toyota corolla liftback 1
## 238 chevrolet chevette 1
## 240 subaru dl 1
## 242 datsun 810 0
## 244 mazda rx-4 0
## 246 ford fiesta 1
## 248 datsun b210 gx 1
## 250 oldsmobile cutlass salon brougham 0
## 252 mercury monarch ghia 0
## 254 chevrolet malibu 0
## 256 ford fairmont (man) 1
## 258 amc concord 0
## 260 mercury zephyr 0
## 262 amc concord d/l 0
## 264 buick regal sport coupe (turbo) 0
## 266 dodge magnum xe 0
## 268 toyota corona 1
## 270 dodge omni 1
## 272 plymouth sapporo 1
## 274 datsun 200-sx 1
## 276 volvo 264gl 0
## 278 peugeot 604sl 0
## 280 honda accord lx 1
## 282 mercury zephyr 6 0
## 284 amc concord dl 6 0
## 286 chevrolet caprice classic 0
## 288 mercury grand marquis 0
## 290 buick estate wagon (sw) 0
## 292 chevrolet malibu classic (sw) 0
## 294 vw rabbit custom 1
## 296 dodge colt hatchback custom 1
## 298 mercedes benz 300d 1
## 300 peugeot 504 1
## 302 plymouth horizon 1
## 304 datsun 210 1
## 306 buick skylark limited 1
## 308 oldsmobile omega brougham 1
## 310 vw rabbit 1
## 312 chevrolet chevette 1
## 314 chevrolet citation 1
## 316 amc concord 1
## 318 audi 4000 1
## 320 mazda 626 1
## 322 toyota corolla 1
## 324 dodge colt 1
## 326 vw rabbit c (diesel) 1
## 328 audi 5000s (diesel) 1
## 330 honda civic 1500 gl 1
## 333 vokswagen rabbit 1
## 335 mazda rx-7 gs 1
## 338 honda accord 1
## 340 buick skylark 1
## 342 chevrolet citation 1
## 344 toyota starlet 1
## 346 honda civic 1300 1
## 348 datsun 210 mpg 1
## 350 mazda glc 4 1
## 352 ford escort 4w 1
## 354 volkswagen jetta 1
## 357 toyota corolla 1
## 359 mazda 626 1
## 361 volvo diesel 1
## 363 datsun 810 maxima 1
## 365 oldsmobile cutlass ls 1
## 367 chrysler lebaron salon 0
## 369 chevrolet cavalier wagon 1
## 371 pontiac j2000 se hatchback 1
## 373 pontiac phoenix 1
## 375 volkswagen rabbit l 1
## 377 mazda glc custom 1
## 379 mercury lynx l 1
## 381 honda accord 1
## 383 honda civic 1
## 385 datsun 310 gx 1
## 387 oldsmobile cutlass ciera (diesel) 1
## 389 ford granada l 0
## 391 dodge charger 2.2 1
## 393 ford mustang gl 1
## 395 dodge rampage 1
## 397 chevy s-10 1
Using logistic regression, predict mpg01 using:
modela = glm(mpg01 ~ horsepower+cylinders+weight, data=training, family="binomial")
modelb = glm(mpg01 ~ horsepower+cylinders+weight+year, data=training, family="binomial")
training$hpcyl = training$horsepower * training$cylinders
training$hpw = training$horsepower * training$weight
training$hpy = training$horsepower * training$year
training$cylw = training$cylinders * training$weight
training$cyly = training$cylinders * training$year
training$wy = training$weight * training$year
modelc = glm(mpg01 ~ horsepower+cylinders+weight+year+hpcyl+hpw+hpy+cylw+cyly+wy, data=training, family="binomial")
# Logistic regression model 3
summary(modela)
##
## Call:
## glm(formula = mpg01 ~ horsepower + cylinders + weight, family = "binomial",
## data = training)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 12.3665998 1.9360643 6.387 1.69e-10 ***
## horsepower -0.0358216 0.0197315 -1.815 0.06945 .
## cylinders -0.5519014 0.3326000 -1.659 0.09704 .
## weight -0.0021753 0.0008164 -2.664 0.00771 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 271.63 on 195 degrees of freedom
## Residual deviance: 111.40 on 192 degrees of freedom
## AIC: 119.4
##
## Number of Fisher Scoring iterations: 7
summary(modelb)
##
## Call:
## glm(formula = mpg01 ~ horsepower + cylinders + weight + year,
## family = "binomial", data = training)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -12.576602 6.063947 -2.074 0.038080 *
## horsepower -0.041059 0.021660 -1.896 0.058008 .
## cylinders -0.063915 0.377010 -0.170 0.865380
## weight -0.004056 0.001126 -3.601 0.000317 ***
## year 0.371326 0.093733 3.962 7.45e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 271.632 on 195 degrees of freedom
## Residual deviance: 89.971 on 191 degrees of freedom
## AIC: 99.971
##
## Number of Fisher Scoring iterations: 7
summary(modelc)
##
## Call:
## glm(formula = mpg01 ~ horsepower + cylinders + weight + year +
## hpcyl + hpw + hpy + cylw + cyly + wy, family = "binomial",
## data = training)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.940e+01 6.401e+01 0.616 0.538
## horsepower -1.837e-01 6.275e-01 -0.293 0.770
## cylinders -8.481e+00 9.940e+00 -0.853 0.394
## weight -6.296e-03 2.497e-02 -0.252 0.801
## year -2.045e-01 8.777e-01 -0.233 0.816
## hpcyl 2.357e-02 1.787e-02 1.319 0.187
## hpw -3.967e-05 7.453e-05 -0.532 0.595
## hpy 1.655e-03 8.941e-03 0.185 0.853
## cylw 4.552e-04 9.802e-04 0.464 0.642
## cyly 6.148e-02 1.403e-01 0.438 0.661
## wy 5.124e-05 3.221e-04 0.159 0.874
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 271.632 on 195 degrees of freedom
## Residual deviance: 85.378 on 185 degrees of freedom
## AIC: 107.38
##
## Number of Fisher Scoring iterations: 9
For each model, calculate the misclassification rate and confusion matrix on the training data, assuming a threshold of 0.5. Which model would you prefer?
# Calculate MCR
phata = predict(modela, newdata = training, type = "response")
yhata = ifelse(modela$fitted.values>0.5, 1, 0)
yhata
## 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 34 36 38 40
## 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 1 0 0 0 0
## 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
## 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 0 0 1 1
## 82 84 86 88 90 92 94 96 98 100 102 104 106 108 110 112 114 116 118 120
## 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1
## 122 124 126 129 131 133 135 137 139 141 143 145 147 149 151 153 155 157 159 161
## 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0
## 163 165 167 169 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199 201
## 0 0 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 1 0
## 203 205 207 209 211 213 215 217 219 221 223 225 227 229 231 233 235 237 239 241
## 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1
## 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281
## 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0
## 283 285 287 289 291 293 295 297 299 301 303 305 307 309 311 313 315 317 319 321
## 1 0 0 0 0 0 1 1 0 0 1 1 0 1 1 1 1 0 1 1
## 323 325 327 329 332 334 336 339 341 343 345 347 349 351 353 356 358 360 362 364
## 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0
## 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394 396
## 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
MCRa = mean(training$mpg01!= yhata)
MCRa
## [1] 0.08673469
table(training$mpg01, yhata)
## yhata
## 0 1
## 0 90 10
## 1 7 89
phatb = predict(modelb, newdata = training, type = "response")
yhatb = ifelse(modelb$fitted.values>0.5, 1, 0)
MCRb = mean(training$mpg01 != yhatb)
MCRb
## [1] 0.1122449
table(training$mpg01, yhatb)
## yhatb
## 0 1
## 0 87 13
## 1 9 87
phatc = predict(modelc, newdata = training, type = "response")
yhatc = ifelse(modelc$fitted.values>0.5, 1, 0)
MCRc = mean(training$mpg01 != yhatc)
MCRc
## [1] 0.1020408
table(training$mpg01, yhatc)
## yhatc
## 0 1
## 0 89 11
## 1 9 87
*Answer: Model A appears to be the better model considering that it has the lowest misclassification rate. This indicates that most of the predictions made by model A are correct.
For each model, calculate the misclassification rate and confusion matrix on the testing data, assuming a threshold of 0.5. Which model would you prefer? Is this the same as the prior step?
# Calculate MCR
phatat = predict(modela, testing, type = "response")
yhatat = ifelse(modela$fitted.values>0.5,1,0)
MCRat = mean(yhatat!=testing$mpg01)
MCRat
## [1] 0.2295918
table(testing$mpg01, yhata)
## yhata
## 0 1
## 0 74 22
## 1 23 77
phatbt = predict(modelb, testing, type = "response")
yhatbt = ifelse(modelb$fitted.values>0.5,1,0)
MCRbt = mean(yhatbt!=testing$mpg01)
MCRbt
## [1] 0.2244898
table(testing$mpg01, yhatb)
## yhatb
## 0 1
## 0 74 22
## 1 22 78
yhatct = ifelse(modelc$fitted.values>0.5,1,0)
MCRct = mean(yhatct!=testing$mpg01)
MCRct
## [1] 0.2142857
table(testing$mpg01, yhatc)
## yhatc
## 0 1
## 0 76 20
## 1 22 78
*Answer: Model C appears to be the better model, considering it has the lowest misclassification rate.
Report the true positive (true value 1, predicted value 1) and false positive (true value 0, predicted value 1) numbers for the testing data.
# TP/FP values
truePosa = sum(testing$mpg01 ==1 & yhatat == 1)
truePosa
## [1] 77
falsePosa = sum(testing$mpg01 == 0 & yhatat == 1)
falsePosa
## [1] 22
truePosb = sum(testing$mpg01 ==1 & yhatbt == 1)
truePosb
## [1] 78
falsePosb = sum(testing$mpg01 == 0 & yhatbt == 1)
falsePosb
## [1] 22
truePosc = sum(testing$mpg01 ==1 & yhatct == 1)
truePosc
## [1] 78
falsePosc = sum(testing$mpg01 == 0 & yhatct == 1)
falsePosc
## [1] 20
Provide ROC curves for the three methods (ideally in one graph) on the testing data. Print out the AUCs. Which method has the lowest AUC?
# Create ROC curves with AUC
AUCa = plot.roc(ifelse(testing$mpg01==1,1,0)~modela$fitted.values, print.auc=TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
AUCa
##
## Call:
## plot.roc.formula(x = ifelse(testing$mpg01 == 1, 1, 0) ~ modela$fitted.values, print.auc = TRUE)
##
## Data: (unknown) in 96 controls (ifelse(testing$mpg01 == 1, 1, 0) ~ modela$fitted.values 0) < 100 cases (ifelse(testing$mpg01 == 1, 1, 0) ~ modela$fitted.values 1).
## Area under the curve: 0.817
AUCb = plot.roc(ifelse(testing$mpg01==1,1,0)~modelb$fitted.values, print.auc=TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
AUCb
##
## Call:
## plot.roc.formula(x = ifelse(testing$mpg01 == 1, 1, 0) ~ modelb$fitted.values, print.auc = TRUE)
##
## Data: (unknown) in 96 controls (ifelse(testing$mpg01 == 1, 1, 0) ~ modelb$fitted.values 0) < 100 cases (ifelse(testing$mpg01 == 1, 1, 0) ~ modelb$fitted.values 1).
## Area under the curve: 0.8583
AUCc = plot.roc(ifelse(testing$mpg01==1,1,0)~modelc$fitted.values, print.auc=TRUE)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
AUCc
##
## Call:
## plot.roc.formula(x = ifelse(testing$mpg01 == 1, 1, 0) ~ modelc$fitted.values, print.auc = TRUE)
##
## Data: (unknown) in 96 controls (ifelse(testing$mpg01 == 1, 1, 0) ~ modelc$fitted.values 0) < 100 cases (ifelse(testing$mpg01 == 1, 1, 0) ~ modelc$fitted.values 1).
## Area under the curve: 0.8535
*Answer: According to the models, Model A has the lowest AUC.
Based on the results of steps C-F, are there any characteristics of the data you can infer?
*Answer: The testing models had higher misclassification rates than the training models, indicating a chance of overfitting. Given the high AUC’s between 0.8 and 0.9, we can conclude that the models performed very well and are good predictors of positive and negative cases.