Assignment 8

Author

Jonathan McCanlas

#Homework 8

Problem 5

Part A

set.seed(123)  # For reproducibility

# Generate predictors
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5

# Generate binary class labels with a quadratic decision boundary
y <- 1 * (x1^2 - x2^2 > 0)  # y = 1 if x1^2 > x2^2, else 0

# Create a data frame
df <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))

print(df)
              x1           x2 y
1   -0.212422480 -0.146393924 1
2    0.288305135 -0.133558555 1
3   -0.091023078 -0.212899869 0
4    0.383017404 -0.420027088 0
5    0.440467284 -0.134545730 1
6   -0.454443501 -0.321986185 1
7    0.028105488  0.036053721 0
8    0.392419044  0.003948712 1
9    0.051435014  0.445035103 0
10  -0.043385265 -0.158678716 0
11   0.456833345 -0.035286226 1
12  -0.046665844 -0.417468818 0
13   0.177570635  0.360106846 0
14   0.072633402 -0.104339360 0
15  -0.397075317  0.235899350 1
16   0.399824970 -0.328256594 1
17  -0.253912266 -0.045238369 1
18  -0.457940466  0.270204759 1
19  -0.172079281 -0.437350002 0
20   0.454503649  0.315081495 1
21   0.389539316 -0.198857476 1
22   0.192803406 -0.135328972 1
23   0.140506814 -0.187887208 0
24   0.494269777 -0.462630035 1
25   0.155705799  0.018804923 1
26   0.208530468  0.179013416 1
27   0.044066025  0.403233560 0
28   0.094142020 -0.474473302 0
29  -0.210840263  0.489078271 0
30  -0.352886353 -0.197112375 1
31   0.463024233  0.439138401 1
32   0.402299045  0.187607582 1
33   0.190705278 -0.052963005 1
34   0.295467418  0.316478139 0
35  -0.475386315 -0.460500802 1
36  -0.022204029  0.239031686 0
37   0.258459538 -0.151280506 1
38  -0.283592064  0.329250823 0
39  -0.181818992  0.035541392 1
40  -0.268374215 -0.225454636 1
41  -0.357199978  0.300948212 1
42  -0.085453664 -0.408355305 0
43  -0.086275674  0.332107774 0
44  -0.131154549 -0.223144928 0
45  -0.347555252  0.253110100 1
46  -0.361193937  0.464152570 0
47  -0.266965901 -0.418533440 0
48  -0.034037550  0.354364746 0
49  -0.234027360  0.302238217 0
50   0.357827715 -0.114826395 1
51  -0.454168833 -0.172402601 1
52  -0.057799926 -0.295061297 0
53   0.298924846  0.069382663 1
54  -0.378100740  0.388055190 0
55   0.060947984  0.029714091 1
56  -0.293468610  0.086958657 1
57  -0.372468350  0.165735143 1
58   0.253307864  0.029892459 1
59   0.395045359  0.009842899 1
60  -0.125537224 -0.483839519 0
61   0.165115195 -0.452290015 0
62  -0.405159339  0.429350683 0
63  -0.116030362  0.269285539 0
64  -0.225616355 -0.298919384 0
65   0.314640039  0.150261500 1
66  -0.051483659  0.153766682 0
67   0.310064353 -0.104746239 1
68   0.312389510  0.312304829 1
69   0.294342321  0.047023418 1
70  -0.060168312  0.385132652 0
71   0.254475159  0.053313988 1
72   0.129221132  0.406048112 0
73   0.210182401  0.087461367 1
74  -0.499375227 -0.076536349 1
75  -0.024683426  0.449585323 0
76  -0.279881115  0.209037903 1
77  -0.120183462 -0.086694592 1
78   0.112771003 -0.481635916 0
79  -0.148202091  0.066734083 1
80  -0.388864576 -0.009936547 1
81  -0.256380527  0.378674021 0
82   0.168055587  0.312852688 0
83  -0.082353220  0.354099978 0
84   0.288195834 -0.132104052 1
85  -0.397135356  0.373949913 1
86  -0.065107259 -0.348661829 0
87   0.484956980 -0.218188330 1
88   0.393051114  0.166705158 1
89   0.386469061  0.477383577 0
90  -0.324947350  0.082739737 1
91  -0.369304308  0.026590080 1
92   0.153101925 -0.439217775 0
93  -0.156483528  0.469038864 0
94   0.156758128 -0.379762871 0
95  -0.179626758 -0.411636699 0
96  -0.312308881  0.380764115 0
97   0.282294301  0.008370630 1
98  -0.406405013 -0.162505003 1
99  -0.033220958  0.394334625 0
100  0.011505460 -0.468028372 0
101  0.099988959 -0.262770303 0
102 -0.167176460  0.186490351 0
103 -0.011386966 -0.274181577 0
104  0.454473827 -0.181505412 1
105 -0.017097603 -0.326016183 0
106  0.390350222  0.301429584 1
107  0.414438187 -0.353717942 1
108  0.108734982  0.322717392 0
109 -0.089310223 -0.169002171 0
110 -0.352905309 -0.125830612 1
111  0.435299803  0.129745424 1
112 -0.198771100 -0.403366257 0
113 -0.439279428 -0.478006285 0
114  0.447726940  0.493044780 0
115  0.220596273  0.083938846 1
116 -0.357705704  0.281823065 1
117  0.049284656  0.391896123 0
118  0.454091239  0.254865454 1
119  0.085483353  0.479203730 0
120 -0.095489718 -0.455852932 0
121  0.147893479  0.403400885 0
122 -0.180179383  0.365500814 0
123 -0.192279989  0.275407591 0
124 -0.280232369 -0.123183583 1
125 -0.130511134 -0.457891951 0
126  0.484219203 -0.135588918 1
127 -0.345797699 -0.226248730 1
128 -0.408956000  0.350467481 1
129 -0.358093092 -0.137598288 1
130  0.190007102 -0.195522209 0
131  0.119256483  0.259135339 0
132  0.391394117  0.344834624 1
133  0.172999093 -0.042078439 1
134  0.237077738  0.229631664 1
135  0.021135726 -0.395921364 0
136  0.159838450 -0.280016822 0
137  0.321805460  0.453951248 0
138  0.286281552  0.252037541 1
139  0.479821917  0.318955383 1
140 -0.060568464 -0.082222898 0
141 -0.188297798  0.093838493 1
142 -0.090525047  0.298823558 0
143 -0.489532888  0.388408650 1
144 -0.316150476 -0.114808407 1
145  0.342729319 -0.409677916 0
146 -0.268838218  0.125743637 1
147 -0.260900044  0.245636217 1
148 -0.423308835 -0.414621543 1
149 -0.254276322 -0.199482389 1
150  0.232135206  0.114562157 1
151  0.347453165  0.252932058 1
152 -0.002472733  0.417057133 0
153 -0.112090970 -0.024249494 1
154 -0.253551006  0.067119466 1
155 -0.388903539  0.236620346 1
156 -0.110005565  0.357447424 0
157  0.071935314  0.409146504 0
158 -0.283107237 -0.443617775 0
159 -0.055231998  0.002908304 1
160 -0.282009331 -0.149455227 1
161  0.002299563  0.345556123 0
162 -0.146095428  0.306435164 0
163  0.149985159 -0.382668773 0
164 -0.125286043  0.212686555 0
165 -0.144554619 -0.264731142 0
166  0.033687945 -0.425043249 0
167  0.240334360  0.435646147 0
168 -0.278897062 -0.342832175 0
169 -0.087253881  0.147057350 0
170 -0.234313313 -0.326482254 0
171  0.129973053 -0.479925991 0
172 -0.316171509  0.021313846 1
173  0.363644111 -0.413722107 0
174  0.246568004 -0.216997654 1
175  0.168284650 -0.079565099 1
176  0.118017873  0.087278124 1
177 -0.127761940  0.306694430 0
178  0.029835686 -0.298004791 0
179  0.374682343 -0.040575328 1
180  0.081750100 -0.051856886 1
181  0.339767765  0.233747758 1
182 -0.187551835  0.214754811 0
183  0.208290322  0.331221979 0
184 -0.234982194  0.386566121 0
185  0.094343194  0.452644574 0
186 -0.018710200  0.050616722 0
187 -0.234967269  0.022335766 1
188  0.064590435 -0.329301745 0
189  0.413188223 -0.020728440 1
190  0.401874389 -0.246179204 1
191 -0.225833378 -0.460219363 0
192 -0.178517244  0.134800510 1
193  0.485640884  0.039560914 1
194  0.119993310 -0.359896450 0
195  0.437314089 -0.216288223 1
196 -0.033467298  0.083030595 0
197 -0.093167407 -0.334974108 0
198  0.159230324 -0.403698889 0
199 -0.347653383 -0.071386909 1
200  0.072867058 -0.144238434 0
201 -0.261273973  0.344933540 0
202  0.462358936 -0.239867526 1
203  0.101365726 -0.476855508 0
204  0.015029727  0.362399540 0
205 -0.097426658 -0.165412039 0
206  0.380246541  0.131788872 1
207 -0.135908135  0.046426259 1
208 -0.211760719 -0.123555507 1
209 -0.329354765 -0.314126985 1
210 -0.327828254 -0.071059409 1
211 -0.017957394  0.130773599 0
212 -0.247035071  0.020842351 1
213 -0.283745210  0.159621337 1
214  0.174376388  0.229360931 0
215 -0.452336373 -0.013177127 1
216  0.200853087 -0.115543384 1
217 -0.148111362 -0.493166491 0
218 -0.091056002 -0.496315772 0
219  0.320951324  0.494936440 0
220  0.418857348 -0.392113120 1
221 -0.217471670 -0.081330842 1
222  0.461104794  0.217884547 1
223  0.228394428  0.242539132 0
224  0.186375082  0.371998824 0
225 -0.447156057  0.107867961 1
226 -0.104779865  0.256203351 0
227 -0.022154620  0.347241298 0
228  0.060253264  0.112779649 0
229  0.198261595  0.293224168 0
230  0.415683538 -0.477145317 0
231  0.118351227 -0.083085450 1
232 -0.071578491  0.374882387 0
233  0.042080367  0.147347016 0
234 -0.441521511  0.424478987 1
235 -0.239143143 -0.327774904 0
236 -0.102848047 -0.184187211 0
237 -0.302255263  0.304256279 0
238  0.331927563  0.488963411 0
239 -0.347112777 -0.177614448 1
240  0.303418542 -0.498808372 0
241  0.046826157  0.492170619 0
242  0.162317642 -0.351647608 0
243 -0.328301506 -0.450635703 0
244  0.133055360  0.097353483 1
245 -0.188130253 -0.178671557 1
246  0.224554346  0.027929000 1
247 -0.101060175  0.295422020 0
248  0.469356411 -0.431067403 1
249  0.467398371  0.195020697 1
250  0.226702539  0.448631507 0
251 -0.242783254 -0.222230683 1
252 -0.278212065 -0.250593415 1
253  0.093045652 -0.340564520 0
254 -0.232478568 -0.428369944 0
255  0.031070399  0.057437583 0
256  0.285291671 -0.010162948 1
257 -0.331939189 -0.005707638 1
258 -0.095600819  0.388371585 0
259 -0.028423722 -0.463295640 0
260  0.368106807 -0.296256186 1
261  0.425707956  0.013781139 1
262  0.381977559 -0.263660111 1
263  0.174186843  0.075435670 1
264  0.450166979 -0.017742089 1
265  0.016444894  0.069352689 0
266  0.076519021 -0.355847685 0
267 -0.163668794 -0.354291237 0
268 -0.152675369 -0.087716887 1
269 -0.479975699  0.183010282 1
270  0.002813046  0.153427820 0
271  0.371043414  0.415986333 0
272 -0.493699216  0.313113132 1
273 -0.427942876  0.175890001 1
274 -0.335788775  0.307499109 1
275  0.270334074 -0.371935580 0
276  0.235184306 -0.249221466 0
277  0.471875636 -0.168453008 1
278 -0.033527623 -0.092730710 0
279 -0.425615487  0.135367577 1
280  0.148818124  0.308612045 0
281  0.258593170 -0.241198799 1
282 -0.362893919  0.319467140 1
283 -0.103415405 -0.484522331 0
284 -0.275014671  0.154400767 1
285 -0.442041439  0.311719584 1
286 -0.104107312 -0.038631073 1
287 -0.435071700 -0.296582898 1
288 -0.274113567 -0.481231783 0
289 -0.445370891 -0.216867303 1
290  0.170282040  0.415272626 0
291 -0.202258217  0.433696521 0
292 -0.399278418  0.042531614 1
293 -0.428095903 -0.280577454 1
294  0.380440569 -0.010572196 1
295  0.254247402  0.301148736 0
296  0.316605888 -0.092369342 1
297  0.482140374 -0.396288433 1
298 -0.396400355 -0.219434825 1
299 -0.400958171 -0.138533779 1
300  0.298831611 -0.240755159 1
301  0.284575267 -0.029318166 1
302 -0.490570095 -0.134154527 1
303  0.279065883 -0.378727946 0
304  0.229390652 -0.453006319 0
305  0.130131853 -0.237203696 0
306 -0.019089170  0.468641168 0
307 -0.343363149 -0.011504518 1
308 -0.491784480 -0.022177970 1
309 -0.047541606  0.248792881 0
310 -0.007706671  0.167640231 0
311 -0.110412888 -0.450584403 0
312 -0.035334058  0.195105244 0
313  0.213279001 -0.136738433 1
314 -0.444698074  0.384133607 1
315 -0.145216902  0.275297230 0
316  0.302812273 -0.360796359 0
317  0.335708837 -0.204990729 1
318 -0.262250595 -0.373917215 0
319 -0.146013897  0.089901624 1
320  0.356885420  0.061675609 1
321  0.353763374  0.188721120 1
322 -0.204104545 -0.188729191 1
323 -0.352951675  0.105586841 1
324  0.203992061  0.491034319 0
325 -0.396193312  0.243204915 1
326 -0.466272227 -0.424142870 1
327  0.499404528 -0.048831094 1
328 -0.465125196 -0.446463065 1
329 -0.161608716 -0.160444486 1
330  0.415063762  0.233952149 1
331  0.117235270 -0.495893060 0
332 -0.213714651  0.271909482 0
333  0.237797403 -0.037024789 1
334  0.334054309  0.220840350 1
335 -0.185729220  0.166505717 1
336 -0.007433452  0.072073721 0
337  0.197373766  0.203812950 0
338  0.141462354  0.157221060 0
339  0.143922915 -0.210647855 0
340  0.477853405 -0.402760544 1
341 -0.085264666  0.462421321 0
342 -0.380595220  0.236334029 1
343  0.026029660  0.112723469 0
344 -0.274926649 -0.380071119 0
345 -0.013588236  0.050259046 0
346 -0.129785202 -0.237243722 0
347  0.483350181  0.398360831 1
348 -0.111680885 -0.490820054 0
349 -0.270755160 -0.263765064 1
350  0.123297546 -0.369955405 0
351 -0.363459803 -0.173766853 1
352  0.467469494  0.226398888 1
353  0.015071808  0.491745235 0
354 -0.336929671  0.215133528 1
355  0.121902295  0.004439811 1
356  0.485954165 -0.063952328 1
357  0.168771517  0.448825251 0
358 -0.081084103 -0.379818512 0
359 -0.176655007 -0.424854823 0
360  0.335255320  0.389021493 0
361 -0.356182956 -0.075547905 1
362 -0.307184053 -0.457639321 0
363  0.396738683  0.147442144 1
364 -0.191880446 -0.031380838 1
365 -0.136699456  0.117926123 1
366  0.283946479 -0.229184567 1
367 -0.306621320 -0.342704713 0
368 -0.482234188 -0.385745730 1
369 -0.093392134  0.007682863 1
370 -0.016832331  0.048032288 0
371 -0.078155050 -0.359353811 0
372 -0.157191198 -0.330230950 0
373  0.366483315  0.261985323 1
374 -0.044891949  0.027394941 1
375  0.033764874  0.360989358 0
376  0.463843332  0.173554988 1
377  0.274591542 -0.486958970 0
378 -0.291123651  0.193198901 1
379 -0.191213167  0.391713732 0
380  0.471342450  0.131850175 1
381  0.084900093 -0.392705345 0
382  0.260823625  0.421065204 0
383 -0.127290606  0.175362381 0
384  0.269193911 -0.351406287 0
385  0.037677183  0.245341842 0
386  0.413995450  0.442598862 0
387 -0.314703558 -0.079216624 1
388 -0.217781583 -0.202276922 1
389 -0.405037587 -0.240573330 1
390 -0.289512921 -0.277118683 1
391  0.477098996  0.065654343 1
392 -0.203697825  0.256650133 0
393  0.225983027  0.169603546 1
394  0.285687834  0.046522794 1
395 -0.394582254  0.311463658 1
396 -0.260405371  0.259166802 1
397 -0.229455128 -0.479931534 0
398 -0.398941506 -0.119096300 1
399 -0.382086159 -0.449119886 0
400  0.491236556  0.297902937 1
401  0.486054297  0.423699213 1
402 -0.362932529  0.042598370 1
403  0.405309582  0.352364600 1
404  0.076301838  0.083562863 0
405 -0.104551141  0.168323644 0
406 -0.050197516  0.011314597 1
407  0.206501901  0.262750589 0
408 -0.417497254  0.403362288 1
409 -0.160687420  0.320474512 0
410  0.180787551 -0.428568150 0
411 -0.183050752 -0.496103657 0
412  0.331568598 -0.447801384 0
413 -0.284827917  0.366560180 0
414 -0.002051064  0.076245169 0
415 -0.223950327 -0.186157428 1
416 -0.307976681  0.459465784 0
417  0.450621264  0.091193757 1
418 -0.178274462  0.031409336 1
419 -0.021543616 -0.116063328 0
420 -0.472007428 -0.180446771 1
421  0.047459468  0.308386255 0
422  0.144240221 -0.458080502 0
423  0.096263545 -0.136257305 0
424 -0.178062624  0.356596967 0
425  0.391114312  0.197946578 1
426  0.126256947  0.184486473 0
427 -0.197095085 -0.151984947 1
428 -0.111795336  0.054681829 1
429 -0.339524908 -0.362756382 0
430  0.362551898  0.284931556 1
431  0.453101214  0.386862572 1
432  0.063644683 -0.295904117 0
433 -0.170452587  0.270622961 0
434  0.496617219  0.096362961 1
435 -0.265180324  0.457669742 0
436  0.112671965 -0.341311602 0
437 -0.391821471  0.025974274 1
438 -0.012967430  0.373151368 0
439 -0.400554177  0.369706070 1
440 -0.338834237 -0.476311357 0
441 -0.217007129  0.475889693 0
442  0.083872336 -0.009775802 1
443  0.231707659 -0.110829677 1
444 -0.334479089 -0.082445040 1
445  0.366467763 -0.407074180 0
446  0.208574137 -0.338190790 0
447  0.260399536 -0.094583505 1
448 -0.352915891 -0.158185556 1
449 -0.141943037 -0.084742547 1
450  0.173332481 -0.195947531 0
451  0.023822602  0.060280537 0
452 -0.150198207 -0.344125849 0
453 -0.259469298  0.456579764 0
454 -0.441808204 -0.456033375 0
455 -0.263380258 -0.127842257 1
456  0.390077913  0.462615341 0
457  0.311827416  0.145427516 1
458  0.247516322 -0.438737386 0
459 -0.345088274 -0.090054072 1
460 -0.375257911 -0.074094866 1
461  0.474725808  0.008158085 1
462 -0.063870004 -0.050400089 1
463 -0.035983373  0.123261384 0
464 -0.334701919 -0.360022069 0
465  0.084936557  0.407946401 0
466 -0.229221981  0.069443275 1
467 -0.269903080  0.048280574 1
468  0.191207831 -0.383172377 0
469 -0.217147600  0.262028332 0
470  0.310398066 -0.021630553 1
471 -0.406083393  0.281968915 1
472  0.322030071 -0.453973451 0
473 -0.072571719  0.319844668 0
474  0.255887260 -0.230591553 1
475  0.162385507 -0.217149585 0
476 -0.055472603  0.143221559 0
477  0.127146184  0.448118838 0
478 -0.499534651 -0.493001546 1
479 -0.282756486 -0.148382328 1
480  0.204872246 -0.080954275 1
481 -0.284833512 -0.042123203 1
482  0.313933721  0.211692579 1
483 -0.192236126  0.419848058 0
484  0.187742686  0.127110701 1
485  0.432681079  0.402181778 1
486 -0.384220335  0.257329159 1
487 -0.372294312 -0.362141634 1
488  0.178223857 -0.346745122 0
489 -0.071051404 -0.308687813 0
490  0.334400978 -0.066814859 1
491  0.471438067 -0.412780698 1
492 -0.429511022 -0.276219999 1
493 -0.040214807  0.072148680 0
494  0.201585118 -0.099830822 1
495 -0.413059832  0.065465381 1
496  0.492944016  0.329623879 1
497 -0.246901006  0.142113819 1
498 -0.450461564 -0.108501248 1
499  0.186324947  0.209579855 0
500  0.286927353 -0.391175927 0

Part B

library(ggplot2)

ggplot(df, aes(x = x1, y = x2, color = y)) +
  geom_point(alpha = 0.7) +
  labs(title = "Data Colored by Class",
       x = "X1", y = "X2", color = "Class") +
  theme_minimal()

Part C

# Fit logistic regression using x1 and x2 as predictors
logit_model <- glm(y ~ x1 + x2, data = df, family = binomial)

# Summary of the model
summary(logit_model)

Call:
glm(formula = y ~ x1 + x2, family = binomial, data = df)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.04792    0.08949   0.535    0.592
x1          -0.03999    0.31516  -0.127    0.899
x2           0.11509    0.30829   0.373    0.709

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 692.86  on 499  degrees of freedom
Residual deviance: 692.71  on 497  degrees of freedom
AIC: 698.71

Number of Fisher Scoring iterations: 3

Part D

# Predict probabilities using the logistic regression model
prob_pred <- predict(logit_model, type = "response")

# Convert probabilities to class predictions (0 or 1)
y_pred <- ifelse(prob_pred > 0.5, 1, 0)

# Add predictions to the data frame
df$y_pred <- as.factor(y_pred)

# Plot predicted class labels
ggplot(df, aes(x = x1, y = x2, color = y_pred)) +
  geom_point(alpha = 0.7) +
  labs(title = "Logistic Regression Predictions (Linear Boundary)",
       x = "X1", y = "X2", color = "Predicted Class") +
  theme_minimal()

Part E

# Create engineered features
df$x1_sq <- df$x1^2
df$x2_sq <- df$x2^2
df$x1_x2 <- df$x1 * df$x2

# Fit logistic regression with non-linear terms
logit_nl <- glm(y ~ x1 + x2 + x1_sq + x2_sq + x1_x2, data = df, family = binomial)
Warning: glm.fit: algorithm did not converge
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# View model summary
summary(logit_nl)

Call:
glm(formula = y ~ x1 + x2 + x1_sq + x2_sq + x1_x2, family = binomial, 
    data = df)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)     -7.12    1170.02  -0.006    0.995
x1            -146.56   10983.22  -0.013    0.989
x2              55.75   11290.45   0.005    0.996
x1_sq        12828.87  479505.56   0.027    0.979
x2_sq       -12842.99  481258.81  -0.027    0.979
x1_x2          566.19   36927.06   0.015    0.988

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 6.9286e+02  on 499  degrees of freedom
Residual deviance: 2.3990e-06  on 494  degrees of freedom
AIC: 12

Number of Fisher Scoring iterations: 25

Part F

# Predict probabilities using the non-linear logistic regression model
prob_pred_nl <- predict(logit_nl, type = "response")

# Convert probabilities to class predictions
y_pred_nl <- ifelse(prob_pred_nl > 0.5, 1, 0)

# Add predictions to the data frame
df$y_pred_nl <- as.factor(y_pred_nl)

# Plot predicted class labels (non-linear decision boundary)
ggplot(df, aes(x = x1, y = x2, color = y_pred_nl)) +
  geom_point(alpha = 0.7) +
  labs(title = "Non-Linear Logistic Regression Predictions",
       x = "X1", y = "X2", color = "Predicted Class") +
  theme_minimal()

df$log_x1 <- log(abs(df$x1) + 1e-5)
df$log_x2 <- log(abs(df$x2) + 1e-5)

Part G

library(e1071)

# Fit support vector classifier (linear kernel is default)
svc_model <- svm(y ~ x1 + x2, data = df, kernel = "linear", cost = 1, scale = FALSE)

# Predict class labels on training data
svc_pred <- predict(svc_model, df)

# Add predictions to the data frame
df$svc_pred <- svc_pred
ggplot(df, aes(x = x1, y = x2, color = svc_pred)) +
  geom_point(alpha = 0.7) +
  labs(title = "Support Vector Classifier (Linear Kernel)",
       x = "X1", y = "X2", color = "Predicted Class") +
  theme_minimal()

Part H

# Fit SVM with radial kernel
svm_rbf <- svm(y ~ x1 + x2, data = df, kernel = "radial", cost = 1, gamma = 1, scale = FALSE)

# Predict class labels
rbf_pred <- predict(svm_rbf, df)

# Add predictions to the data frame
df$svm_rbf_pred <- rbf_pred
ggplot(df, aes(x = x1, y = x2, color = svm_rbf_pred)) +
  geom_point(alpha = 0.7) +
  labs(title = "SVM with Radial Kernel (RBF)",
       x = "X1", y = "X2", color = "Predicted Class") +
  theme_minimal()

The support vector machine with a radial basis function (RBF) kernel produced a clearly non-linear decision boundary that closely reflects the true structure of the data. Unlike the linear logistic regression model and the linear support vector classifier, the RBF SVM successfully captured the underlying quadratic relationship between the predictors. The curved decision boundary effectively separates the two classes, with relatively few misclassifications, most of which occur near the boundary where the classification is naturally more uncertain. This result demonstrates the strength of kernel-based SVMs in handling non-linear classification problems without requiring explicit feature engineering.

Problem 7

Part A

library(ISLR2)
data(Auto)
# Remove missing values just in case
Auto <- na.omit(Auto)

# Compute the median mpg
mpg_median <- median(Auto$mpg)

# Create binary variable: 1 if mpg > median, else 0
Auto$mpg_high <- as.factor(ifelse(Auto$mpg > mpg_median, 1, 0))

Part B

# Remove the original mpg variable
auto_data <- Auto[, !(names(Auto) %in% c("mpg"))]

# Make sure 'mpg_high' is the response and a factor
auto_data$mpg_high <- as.factor(auto_data$mpg_high)
set.seed(1)  # for reproducibility

# Perform cross-validation with different cost values
tune_out <- tune(
  svm,
  mpg_high ~ .,
  data = auto_data,
  kernel = "linear",
  ranges = list(cost = c(0.01, 0.1, 1, 10, 100))
)

# Show best model and cross-validation performance
summary(tune_out)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost
  0.1

- best performance: 0.08673077 

- Detailed performance results:
   cost      error dispersion
1 1e-02 0.08923077 0.04698309
2 1e-01 0.08673077 0.04040897
3 1e+00 0.09961538 0.04923181
4 1e+01 0.11237179 0.05701890
5 1e+02 0.11750000 0.06208951

The cross-validation results indicate that lower values of the cost parameter (such as 0.01 or 0.1) yield better classification performance. This suggests that allowing a wider margin and tolerating some misclassification on the training data helps the model generalize better. As the cost increases, the classifier becomes more rigid, and overfitting appears to degrade performance. Therefore, a cost value of 0.1 or 0.01 is preferred for this dataset.

Part C

set.seed(1)

# Tune SVM with radial kernel
tune_radial <- tune(
  svm,
  mpg_high ~ .,
  data = auto_data,
  kernel = "radial",
  ranges = list(
    cost = c(0.1, 1, 10),
    gamma = c(0.5, 1, 2)
  )
)

summary(tune_radial)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost gamma
   10     1

- best performance: 0.07897436 

- Detailed performance results:
  cost gamma      error dispersion
1  0.1   0.5 0.08410256 0.04164179
2  1.0   0.5 0.08673077 0.04708817
3 10.0   0.5 0.09173077 0.04008042
4  0.1   1.0 0.55115385 0.04366593
5  1.0   1.0 0.07903846 0.04891067
6 10.0   1.0 0.07897436 0.04869339
7  0.1   2.0 0.55115385 0.04366593
8  1.0   2.0 0.13769231 0.06926822
9 10.0   2.0 0.13512821 0.06692968
set.seed(1)

# Tune SVM with polynomial kernel
tune_poly <- tune(
  svm,
  mpg_high ~ .,
  data = auto_data,
  kernel = "polynomial",
  ranges = list(
    cost = c(0.1, 1, 10),
    degree = c(2, 3, 4)
  )
)

summary(tune_poly)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost degree
   10      2

- best performance: 0.520641 

- Detailed performance results:
  cost degree     error dispersion
1  0.1      2 0.5511538 0.04366593
2  1.0      2 0.5511538 0.04366593
3 10.0      2 0.5206410 0.08505283
4  0.1      3 0.5511538 0.04366593
5  1.0      3 0.5511538 0.04366593
6 10.0      3 0.5511538 0.04366593
7  0.1      4 0.5511538 0.04366593
8  1.0      4 0.5511538 0.04366593
9 10.0      4 0.5511538 0.04366593

The polynomial kernel SVM consistently performed poorly, with cross-validation error rates around 55% regardless of cost or degree. This indicates that polynomial transformations of the input features failed to capture the underlying patterns in the data. In contrast, the radial kernel SVM achieved the best performance, particularly with cost = 10 and gamma = 1.0, yielding a cross-validation error of about 7.9%. These results suggest that a flexible, non-linear boundary such as that provided by the radial kernel is more appropriate for this classification task.

Part D

# Example: Use the best radial kernel model
svm_rbf_best <- tune_radial$best.model

# Plot decision boundary between horsepower and weight
plot(svm_rbf_best, auto_data, horsepower ~ weight)

# Try a few more pairs
plot(svm_rbf_best, auto_data, acceleration ~ weight)

plot(svm_rbf_best, auto_data, displacement ~ cylinders)

# Linear SVM from earlier
svm_linear <- tune_out$best.model

# Same variable pairs for consistency
plot(svm_linear, auto_data, horsepower ~ weight)

To support the findings from Parts (b) and (c), we compared SVM models using both linear and radial kernels. The radial SVM (with cost = 10, gamma = 1) produced curved, adaptive decision boundaries that effectively separated high and low MPG vehicles—particularly in plots like weight vs. horsepower—aligning with its low cross-validation error. Other variable pairs, such as weight vs. acceleration and cylinders vs. displacement, showed less clear separation due to overlapping features or discrete groupings.

In contrast, the linear SVM produced a straight boundary that worked moderately well but struggled in regions where classes overlap, especially in the mid-range of weight and horsepower. This limitation reflects its higher error rate compared to the radial model. Overall, the radial kernel’s flexibility allows it to better capture complex, nonlinear patterns in the data, leading to superior classification performance.

Problem 8

Part A

library(ISLR2)
data(OJ)

# Set seed for reproducibility
set.seed(123)

# Randomly select 800 indices for the training set
train_index <- sample(1:nrow(OJ), 800)

# Create training and test sets
train_data <- OJ[train_index, ]
test_data <- OJ[-train_index, ]
# Fit the support vector classifier with cost = 0.01
svm_fit <- svm(Purchase ~ ., data = train_data, kernel = "linear", cost = 0.01, scale = TRUE)

# Summary of the fitted model
summary(svm_fit)

Call:
svm(formula = Purchase ~ ., data = train_data, kernel = "linear", 
    cost = 0.01, scale = TRUE)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  0.01 

Number of Support Vectors:  442

 ( 220 222 )


Number of Classes:  2 

Levels: 
 CH MM

Part C

# 1. Predict on training data
train_pred <- predict(svm_fit, train_data)

# 2. Calculate training error rate
train_error <- mean(train_pred != train_data$Purchase)
print(paste("Training Error Rate:", round(train_error, 4)))
[1] "Training Error Rate: 0.165"
# 3. Predict on test data
test_pred <- predict(svm_fit, test_data)

# 4. Calculate test error rate
test_error <- mean(test_pred != test_data$Purchase)
print(paste("Test Error Rate:", round(test_error, 4)))
[1] "Test Error Rate: 0.1778"

Part D

# Load e1071 if not already loaded
library(e1071)

# Set seed for reproducibility
set.seed(123)

# Tune SVM using 10-fold CV with linear kernel and various cost values
tuned_svm <- tune(
  svm,
  Purchase ~ .,
  data = train_data,
  kernel = "linear",
  ranges = list(cost = c(0.01, 0.1, 0.5, 1, 5, 10))
)

# View results
summary(tuned_svm)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost
    1

- best performance: 0.16875 

- Detailed performance results:
   cost   error dispersion
1  0.01 0.17375 0.04910660
2  0.10 0.17500 0.04823265
3  0.50 0.17000 0.04216370
4  1.00 0.16875 0.03963812
5  5.00 0.17250 0.04241004
6 10.00 0.17000 0.04005205

Part E

best_svm <- tuned_svm$best.model
train_pred_best <- predict(best_svm, train_data)
train_error_best <- mean(train_pred_best != train_data$Purchase)
print(paste("Training Error Rate (Best Model):", round(train_error_best, 4)))
[1] "Training Error Rate (Best Model): 0.16"
test_pred_best <- predict(best_svm, test_data)
test_error_best <- mean(test_pred_best != test_data$Purchase)
print(paste("Test Error Rate (Best Model):", round(test_error_best, 4)))
[1] "Test Error Rate (Best Model): 0.1556"

Part F

# Fit radial SVM with cost = 0.01 and default gamma
svm_radial <- svm(Purchase ~ ., data = train_data, kernel = "radial", cost = 0.01)

# View model summary
summary(svm_radial)

Call:
svm(formula = Purchase ~ ., data = train_data, kernel = "radial", 
    cost = 0.01)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  0.01 

Number of Support Vectors:  629

 ( 313 316 )


Number of Classes:  2 

Levels: 
 CH MM
# Predict on training set
train_pred_radial <- predict(svm_radial, train_data)
train_error_radial <- mean(train_pred_radial != train_data$Purchase)
print(paste("Radial SVM Training Error (cost = 0.01):", round(train_error_radial, 4)))
[1] "Radial SVM Training Error (cost = 0.01): 0.3912"
# Predict on test set
test_pred_radial <- predict(svm_radial, test_data)
test_error_radial <- mean(test_pred_radial != test_data$Purchase)
print(paste("Radial SVM Test Error (cost = 0.01):", round(test_error_radial, 4)))
[1] "Radial SVM Test Error (cost = 0.01): 0.3852"
# Tune radial SVM with several cost values
set.seed(123)
tuned_radial <- tune(
  svm,
  Purchase ~ .,
  data = train_data,
  kernel = "radial",
  ranges = list(cost = c(0.01, 0.1, 0.5, 1, 5, 10))
)

# View best model summary
summary(tuned_radial)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost
    1

- best performance: 0.16125 

- Detailed performance results:
   cost   error dispersion
1  0.01 0.39125 0.04411554
2  0.10 0.17625 0.06108112
3  0.50 0.16875 0.05472469
4  1.00 0.16125 0.04875178
5  5.00 0.16500 0.03717451
6 10.00 0.17000 0.04794383
# Extract best model
best_radial <- tuned_radial$best.model

# Training error
train_pred_best_radial <- predict(best_radial, train_data)
train_error_best_radial <- mean(train_pred_best_radial != train_data$Purchase)
print(paste("Radial SVM Training Error (Best Model):", round(train_error_best_radial, 4)))
[1] "Radial SVM Training Error (Best Model): 0.1388"
# Test error
test_pred_best_radial <- predict(best_radial, test_data)
test_error_best_radial <- mean(test_pred_best_radial != test_data$Purchase)
print(paste("Radial SVM Test Error (Best Model):", round(test_error_best_radial, 4)))
[1] "Radial SVM Test Error (Best Model): 0.1889"

Part G

# Fit polynomial kernel SVM
svm_poly <- svm(Purchase ~ ., data = train_data, kernel = "polynomial", cost = 0.01, degree = 2)

# Model summary
summary(svm_poly)

Call:
svm(formula = Purchase ~ ., data = train_data, kernel = "polynomial", 
    cost = 0.01, degree = 2)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  0.01 
     degree:  2 
     coef.0:  0 

Number of Support Vectors:  631

 ( 313 318 )


Number of Classes:  2 

Levels: 
 CH MM
# Predict on training set
train_pred_poly <- predict(svm_poly, train_data)
train_error_poly <- mean(train_pred_poly != train_data$Purchase)
print(paste("Polynomial SVM Training Error (cost = 0.01):", round(train_error_poly, 4)))
[1] "Polynomial SVM Training Error (cost = 0.01): 0.3725"
# Predict on test set
test_pred_poly <- predict(svm_poly, test_data)
test_error_poly <- mean(test_pred_poly != test_data$Purchase)
print(paste("Polynomial SVM Test Error (cost = 0.01):", round(test_error_poly, 4)))
[1] "Polynomial SVM Test Error (cost = 0.01): 0.3741"
# Tune polynomial kernel SVM
set.seed(123)
tuned_poly <- tune(
  svm,
  Purchase ~ .,
  data = train_data,
  kernel = "polynomial",
  degree = 2,
  ranges = list(cost = c(0.01, 0.1, 0.5, 1, 5, 10))
)

# Summary of tuning results
summary(tuned_poly)

Parameter tuning of 'svm':

- sampling method: 10-fold cross validation 

- best parameters:
 cost
    5

- best performance: 0.16875 

- Detailed performance results:
   cost   error dispersion
1  0.01 0.39000 0.04281744
2  0.10 0.31000 0.05361903
3  0.50 0.20000 0.06291529
4  1.00 0.19250 0.06645801
5  5.00 0.16875 0.05958479
6 10.00 0.17125 0.06010696
# Extract best model
best_poly <- tuned_poly$best.model

# Predict and calculate training error
train_pred_best_poly <- predict(best_poly, train_data)
train_error_best_poly <- mean(train_pred_best_poly != train_data$Purchase)
print(paste("Polynomial SVM Training Error (Best Model):", round(train_error_best_poly, 4)))
[1] "Polynomial SVM Training Error (Best Model): 0.1462"
# Predict and calculate test error
test_pred_best_poly <- predict(best_poly, test_data)
test_error_best_poly <- mean(test_pred_best_poly != test_data$Purchase)
print(paste("Polynomial SVM Test Error (Best Model):", round(test_error_best_poly, 4)))
[1] "Polynomial SVM Test Error (Best Model): 0.2037"

We compared linear, radial, and polynomial (degree = 2) SVMs on the OJ dataset. The linear SVM with cost = 0.1 achieved the lowest test error (15.56%), indicating a mostly linear relationship in the data. The radial SVM fit the training data better but slightly overfit, with a test error of 18.89%. The polynomial SVM performed worst, with a test error of 20.37%. Overall, the linear kernel provided the best balance of accuracy and generalization.

Part H

Based on the results from all models, the linear SVM provides the best overall performance on the OJ dataset. It achieved the lowest test error rate (15.56%), suggesting it generalizes best to unseen data. Although the radial and polynomial kernels offered more flexibility, they resulted in higher test error rates due to overfitting or model mismatch. Therefore, the linear kernel is the most effective and reliable choice for this classification task.