R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Chapter 1:

1.1: Recognizing a road sign with kNN

After several trips with a human behind the wheel, it is time for the self-driving car to attempt the test course alone.

As it begins to drive away, its camera captures the following image:

Stop Sign

Can you apply a kNN classifier to help the car recognize this sign?

Instructions

100XP

The dataset signs is loaded in your workspace along with the dataframe next_sign, which holds the observation you want to classify.

Load the class package.

Create a vector of sign labels to use with kNN by extracting the column sign_type from signs.

Identify the next_sign using the knn() function.

Set the train argument equal to the signs data frame without the first column.

Set the test argument equal to the data frame next_sign.

Use the vector of labels you created as the cl argument.

# Load the 'class' package
# install.packages("class")
library(class)
signs<-read.csv("signs.csv")
signs
##      sign_type  r1  g1  b1  r2  g2  b2  r3  g3  b3  r4  g4  b4  r5  g5  b5
## 1   pedestrian 155 228 251 135 188 101 156 227 245 145 211 228 166 233 245
## 2   pedestrian 142 217 242 166 204  44 142 217 242 147 219 242 164 228 229
## 3   pedestrian  57  54  50 187 201  68  51  51  45  59  62  65 156 171  50
## 4   pedestrian  22  35  41 171 178  26  19  27  29  19  27  29  42  37   3
## 5   pedestrian 169 179 170 231 254  27  97 107  99 123 147 152 221 236 117
## 6   pedestrian  75  67  60 131  89  53 214 144  75 156 169 190  67  50  36
## 7   pedestrian 136 149 157 200 203 107 150 167 134 171 218 252 171 158 108
## 8   pedestrian 149 225 241  34  45   1 155 226 238 147 222 242 170 191 113
## 9   pedestrian  13  34  28   5  21  11 123 154 140  21  46  41  36  60  26
## 10  pedestrian 123 124 107  83  61  26 116 124 115  67  67  52  70  53  26
## 11  pedestrian 129 141 137  35  42  37  36  28  12  44  53  49 138 148 141
## 12  pedestrian 131 148 140  61  42  10  93 114 108  27  38  34  52  41  20
## 13  pedestrian 122 141 133  12  28  11 163 188 178 133 154 146  86 113  80
## 14  pedestrian 171 193 181  30  49  26  93 119  98 179 201 188  59  84  44
## 15  pedestrian  53  66  58   1  13   2  82  99  67 117 131 123  12  27   5
## 16  pedestrian 211 235 226  37  50  45 146 164 156 211 235 226  37  41  28
## 17  pedestrian  21  34  28  10  29   4 146 171 145  60  74  68  68  93  52
## 18  pedestrian  86 105  98  83  87  71  54  70  65  60  74  68 110 119 109
## 19  pedestrian 171 197 187  60  83  60 163 186 173  39  52  48  75 109  83
## 20  pedestrian  99  92  75 196 205  85 133 139  41  83  76  60 226 235 109
## 21  pedestrian  26  37  28  45  67  19  53  66  58  91 107  99  11  28   5
## 22  pedestrian  60  79  74 135 172  70 100 123  81 106 124 117  92 123  50
## 23  pedestrian  65  79  72  60  89  36  17  35  12  99 108  92 170 204 124
## 24  pedestrian  75  84  76  82 116  52 119 112  89 126 120  96  50  68  44
## 25  pedestrian  42  45  37 182 226  77 166 204  80  54  57  49 187 230  76
## 26  pedestrian 219 229 223 148 187  55 181 221  66 178 187 180 181 221  66
## 27  pedestrian 155 170 164 152 123  76 165 148 118  38  41  36  66  60  43
## 28  pedestrian  57  62  54  88 124  28 173 206 132 124 139 131  62  99   7
## 29  pedestrian  53  67  57  97 131  36  96 122  61 106 125 115 133 172  40
## 30  pedestrian  52  58  45  37  60   4 209 238 151  46  52  42  42  64   5
## 31  pedestrian 221 241 238  38  49  36  33  44  30  92 105  93  57  67  54
## 32  pedestrian  50  69  60  54  90  13 197 235 236 201 238 237 124 162  59
## 33  pedestrian  35  43  37  60  74  51 120 147  84  51  59  52  92 130  44
## 34  pedestrian  37  43  36 107 137  68  74  99  39  69  76  66  20  51   1
## 35  pedestrian  49  62  52 193 228 122  84 110  51  45  59  50  26  51   5
## 36  pedestrian  44  52  42 177 220 100  66  85  29 151 165 156 157 196  84
## 37  pedestrian 191 189 186 131  92  43  60  44  27  61  58  51 171 126  69
## 38  pedestrian 147 170 164  52  75  22 209 236 233 205 233 229 157 190 108
## 39  pedestrian 193 222 217  90 124  43 166 194 186 188 218 212 209 237 227
## 40  pedestrian  53  53  42 190 196 172 173 183 107  81  78  65 148 156  84
## 41  pedestrian  35  43  35 123 139  76 169 187 100  43  51  44 155 172  99
## 42  pedestrian  84  99  90  61  49  13 126 133 117 194 220 212  99  86  53
## 43  pedestrian  67  68  59 144 149 132 108 125 115  52  59  50  75  70  50
## 44  pedestrian 146 157 147  50  35  11 116 100  69  51  60  51  76  69  45
## 45  pedestrian  56  62  54  55  81  19 140 163 100  26  35  21  53  76  19
## 46  pedestrian 188 212 205 148 173 156 133 162 124  46  60  50 202 224 212
## 47       speed  76  82  60 204 212 187 226 229 196  99 105  83 186 197 179
## 48       speed 163 123  76 254 236 195 204 171 124 195 155 100 253 229 187
## 49       speed 102 113 118  94 105 110  83  93  98  83  81  76 109 121 126
## 50       speed  88  93 106  85  90 102  85  90 102 158 198 250 117 124 154
## 51       speed  98 110 121  98 110 121  30  41  50  28  29  29  83  94 105
## 52       speed 196 176 155 180 170 155 172 166 154 148 126 107  68  68  67
## 53       speed 139 125 116 187 181 178 204 194 187  58  54  52  51  42  36
## 54       speed  83  92 100 106 125 140 115 135 153  89  94  98  89 100 109
## 55       speed  99 105 118  94 100 113  86  92 106  41  38  42  44  51  67
## 56       speed  52  66  76  99 117 132 101 121 134 171 129  85  25  38  49
## 57       speed  29  33  36  94 106 116 107 116 125  36  28  23  27  30  34
## 58       speed  76  99 124 110 138 165 114 142 170  75  76  77  84 107 131
## 59       speed 234 254 236 234 254 236 228 251 229  86  99  90 222 243 226
## 60       speed   6  14  17  44  61  65  39  56  60  92 108 105  26  38  41
## 61       speed  68  91  90  68  91  90  77  98  98  29  33  30  27  41  38
## 62       speed  24  38  38  59  77  76  59  77  76 156 180 170   5  18  18
## 63       speed  44  60  59 109 132 130 109 132 130 202 230 227 109 132 130
## 64       speed  29  44  43 140 162 154 122 140 133 205 231 226  20  37  35
## 65       speed  42  46  37 187 196 186 155 164 155 187 196 186 140 148 139
## 66       speed 125 145 141 121 142 138 118 140 136 137 160 148  82  95  89
## 67       speed 108 131 126 116 137 133 118 140 137  36  52  50 161 180 173
## 68       speed   3  12  12  74  93  92  74  93  92  36  45  43  12  21  20
## 69       speed  36  51  51  68  90  90  66  87  88  52  65  61  52  65  61
## 70       speed  44  60  58  66  84  83  66  84  83  30  41  38  52  68  67
## 71       speed 170 207 208 166 185 173 113 127 115 179 228 234 162 196 196
## 72       speed 100 114 108  75  94  90 146 166 161  84  91  83  84  91  83
## 73       speed 116 122 108 122 129 116  60  65  54  85  97  86 179 189 178
## 74       speed  43  58  60  82 109 115  78 108 113  25  36  35  89 103 104
## 75       speed  68  82  84 114 140 148 109 138 147  75  82  77  37  43  42
## 76       speed  36  44  42  86 104 102  75  93  91  41  49  46  50  58  54
## 77       speed  44  44  43 127 136 140 117 126 129  82  82  77  52  53  50
## 78       speed  77  69  58 131 133 130 123 125 123  84  77  66 131 125 115
## 79       speed 156 170 165 229 234 218 234 237 220  59  58  45 130 135 123
## 80       speed  20  35  36 147 173 173 148 175 176  67  78  74  38  48  46
## 81       speed  27  36  35 117 134 136 123 138 139  91 101 100 106 117 116
## 82       speed  52  66  61 108 129 124 106 126 122  27  37  34 172 197 195
## 83       speed  67  82  77 117 138 132 110 131 125 195 217 213  13  28  27
## 84       speed  67  75  75  92 101 101  94 105 105  28  36  35  59  68  68
## 85       speed  59  77  75  94 115 113  97 117 115 166 196 196  12  27  27
## 86       speed  58  67  62 107 124 116 102 118 113  59  69  65  58  67  62
## 87       speed 203 202 197 210 213 211 213 217 214 109  92  82 211 210 205
## 88       speed 210 229 227 101 121 118 101 121 118 236 255 254 236 255 254
## 89       speed 214 233 234  92 107 107  46  58  58  65  77  77 140 155 155
## 90       speed  52  58  52 125 149 145 123 147 141  60  66  60  37  44  41
## 91       speed  67  78  73 148 178 172 148 178 172  35  46  43  57  67  61
## 92       speed 100 121 115 175 209 204 173 203 198  29  42  34 193 214 210
## 93       speed  53  66  59 142 170 169 159 189 184  53  65  54  43  52  43
## 94       speed  85  97  93 156 180 179 156 180 179  84  90  84 122 134 130
## 95       speed 141 162 157 149 173 169 146 170 164 125 145 141 100 114 109
## 96        stop 226 244 251  77  41  50  75  35  43 216 220 220  97 110 129
## 97        stop 163 164 165  62  19  28  62  19  28  43  46  50  76  94 115
## 98        stop 187 172 156 140  49  50 132  44  44 188 185 181 144  83  77
## 99        stop 160 164 163 163  42  45 178  49  51 190 218 244 188 172 154
## 100       stop 149 208 250 178  58  61 171  57  59 147 206 250 188 181 170
## 101       stop 148 179 186 195  60  53 180  58  51 128 172 188 216 160 140
## 102       stop 147 218 237 205  45  44 211  44  45 140 219 242 169  75  53
## 103       stop 195 242 230  46  37  34  46  37  34 188 237 228  58  70  65
## 104       stop  43  44  45  81  37  42  81  37  42 157 158 146  84  98 120
## 105       stop 149 180 162  46  54  57  44  50  52 124 155 140  59  83  83
## 106       stop  26  38  42  37  33  37  40  35  41  30  46  49  36  30  35
## 107       stop 146 137 132  92  37  49  99  41  52  84  69  59 131 141 158
## 108       stop  18  21  20  41  13  15  43  14  17  13  19  18  62  76  74
## 109       stop  36  43  28 213  89  85 227  76  76  72  77  53 236 254 252
## 110       stop 188 217 221  68  22  25  68  22  25  60  68  59  81  54  54
## 111       stop 191 218 221  82  26  27  82  26  27 106 114 110 115 133 131
## 112       stop 147 180 186  30  14  18  32  14  18 164 186 180  50  67  69
## 113       stop  67  68  43 218  96  87 229  82  75  83  80  51 242 196 184
## 114       stop  91 172 227 202  91  92 157 205 229  89 166 221 234  99  99
## 115       stop  35  45  42  77  20  21  74  15  17  91 108 107  92  87  88
## 116       stop 171 194 189  91  58  51  91  58  51 195 222 217 101 121 117
## 117       stop  46  54  49  83  34  36  83  34  36  42  44  38 114 126 121
## 118       stop  25  34  30  57  34  34  57  34  34  25  34  30  57  44  44
## 119       stop  90  98  92 116  37  38 116  37  38 186 205 202 106  30  29
## 120       stop 195 230 233  52  22  22  60  30  32 194 228 230 194 228 230
## 121       stop  44  50  44  85  38  40  85  41  42  60  66  60 132 146 140
## 122       stop  33  45  41  94  39  41  93  41  42  62  75  73 124 145 142
## 123       stop  91 106 100  77  42  43  77  42  43 122 141 139 107 125 123
## 124       stop  42  43  37  77  29  29  77  29  29  46  49  43 110 121 121
## 125       stop  21  33  28  91  42  43  85  35  36 203 226 222 164 186 181
## 126       stop 116 114 108 125  40  39 123  38  37  33  30  26 123  38  37
## 127       stop 180 182 185 133  33  33 109  28  27 139 137 134 131  60  60
## 128       stop  33  35  30  83  27  22  83  27  22  35  37  33 116  99  92
## 129       stop  27  28  20  61  11  11  53  10  10  92 106 101  66  14  16
## 130       stop  50  52  45  89  30  28  85  27  26  20  27  21 118 138 131
## 131       stop  55  64  55  83  25  23  83  25  23 163 185 180  92  74  69
## 132       stop 117 114 106 147  43  44 147  43  44  94  89  77 196 188 187
## 133       stop 179 195 188  67  22  24  65  21  23  28  35  28 106 115 109
## 134       stop 109 129 124  76  27  28  76  27  28 117 137 131 123 138 132
## 135       stop  26  30  21  93  29  27  93  29  27 147 163 156 139 156 149
## 136       stop 203 202 197 172  68  67 172  68  67 211 213 211 187 146 141
## 137       stop 130 131 108 236  69  75 236  75  82 194 210 204 236  69  75
## 138       stop 188 211 203  85  34  30  84  30  28 151 176 171 148 171 165
## 139       stop 173 206 211 149  76  68 156  83  75 168 198 201 141 159 148
## 140       stop 154 178 173  68  28  27  68  28  27 179 204 195  90 101  93
## 141       stop 217 238 234 101  37  36 124  41  38 213 237 233 147 158 149
## 142       stop 167 190 185 186  42  41 180  37  36 186 209 198 187  41  37
## 143       stop  83  98  84 171  70  51 172  74  53 179 204 197 173 196 187
## 144       stop 130 155 155  65  30  29  65  30  29 139 164 163  61  49  44
## 145       stop  74  78  65 145  56  55 124  57  54 204 235 234 125  34  34
## 146       stop 162 187 181 114  47  38 115  51  41 174 201 194 140 132 124
##      r6  g6  b6  r7  g7  b7  r8  g8  b8  r9  g9  b9 r10 g10 b10 r11 g11
## 1   212 254  52 212 254  11 188 229 117 170 216 120 211 254   3 212 254
## 2    84 116  17 217 254  26 155 203 128 213 253  51 217 255  21 217 255
## 3   254 255  36 211 226  70  78  73  64 220 234  59 254 255  51 253 255
## 4   217 228  19 221 235  20 181 183  73 237 234  44 251 254   2 235 243
## 5   205 225  80 235 254  60  90 110   9 216 236  66 229 255  12 235 254
## 6    37  36  42  44  42  44 192 131  73 123  74  22  36  34  37  44  42
## 7   157 186  11  26  35  10 180 211 236 129 109  73 161 190  10 161 190
## 8    26  37  12  34  45  19 221 249 184 226 246  59  30  40  34  34  44
## 9    75 108  44  13  27  25 133 163 126  83 125  19  13  27  25   9  23
## 10   26  26  21  52  45  27 117 109  83 110  74  12  98  70  26  20  21
## 11   60  45  18   9  13  17  29  37  33  59  42  12  20  19  11  28  28
## 12    4   9  10  44  33  12  58  70  65  61  42  10  20  18  10  11  12
## 13   69 106  27  76 112  30 116 138 126 122 140 125  69 106  27  74 109
## 14   60  92  26  60  92  26  51  76  35  27  44  26  60  92  26  14  26
## 15   76 100  35  76  99  43  60  76  50  26  43  14  68  94  20  70  97
## 16   21  27  22  85  66  28 211 235 226  66  76  69  90  67  26  76  61
## 17   85 124  34  10  29   4  67  83  69  50  62  57  82 123  27  82 123
## 18  100  78  20  35  35  19  25  34  30  33  42  38 122 101  43  41  38
## 19   77 115  34  39  65  26  21  33  29 101 132 121  77 116  27  76 116
## 20   66  77  36  67  78  44 140 132 114 238 255  25  66  75  53  67  78
## 21  163 204   4 146 179  59  60  75  67  94 121  46 163 204   4 166 209
## 22   27  42  29  28  51  12  44  58  44 147 196  12  28  43  33  33  53
## 23  174 226  43 165 220  12 131 141 124  34  45  37 166 222   6 165 220
## 24  148 195  66 155 213  14 127 121  94 139 172  98 160 217  27 155 213
## 25   13  25  11   6  25   1  98 124  45 179 229   2  13  22  17   5  14
## 26   25  35  27  26  45   3 148 180  54 198 249   2  22  31  26  21  34
## 27   76  66  44 178 126  50  38  41  36  78  90  84 180 126  44 172 121
## 28  178 227  10 169 214  27 116 131 123  65  73  62 175 225   5 178 227
## 29   87 115  46 190 233  35  44  58  45  36  50  38 186 228  43 188 233
## 30  198 241  51  55  76  18  90 115  35  45  58  27 202 246  18 202 246
## 31  181 229   1 181 229   1  27  37  26  38  49  36 185 233   1 181 229
## 32  156 204  12 156 204  12 201 238 237 108 140 138 156 204  12 156 204
## 33  149 203  20  18  36  21  49  60  45  43  51  45 146 197  21 149 203
## 34  187 236  36 188 236  44  43  50  37  38  50  28 186 234  19 179 229
## 35  177 221  35 158 195  66  41  60  28  59  74  54 171 215  35 173 218
## 36  188 237  42 188 236  59  50  59  45  36  44  35 188 237  36 188 237
## 37   33  30  32  43  36  27  77  73  67 179 118  30  29  28  28  29  28
## 38  150 189  28  21  35  20 201 227 222  41  60  25 153 194  19 153 194
## 39   41  58  45 149 193  44 188 218 212 168 202 155  44  59  51 145 190
## 40   30  36  11  60  68   2 125 132  55 195 206  20  25  29  19  36  43
## 41   11  21   2  13  27   1 171 188 107 170 194  12  20  27  20  13  27
## 42   52  46  27  99  76  35 178 203 197 131  94  27  99  80  43 134 104
## 43   38  44  37  45  33   5  42  45  37 158 113  21  27  32  29  36  26
## 44   60  46  19  28  26  13  92  83  58  85  58  13 132 107  59 109  89
## 45  175 209  68 152 181  74 101 122  67 140 163 100 171 208  36 162 197
## 46   27  45  12  30  52   4  85 108  51 146 180  35  30  41  34  27  38
## 47   69  73  44 253 254 219  99 105  83  60  68  58 123 133 115  35  44
## 48  254 235 187 254 243 189 163 123  76 155 123  84 254 236 195  60  52
## 49   45  57  62 117 128 134  75  85  90  44  53  58  21  29  33 118 130
## 50   58  60  69  77  82  93 163 201 250 140 138 156  61  66  76  85  90
## 51   28  37  43 101 114 124  43  41  38  28  37  43  46  57  66  28  37
## 52   99 101  99 173 177 173  75  68  61 162 166 163 173 177 173 164 172
## 53   99  93  91 202 190 180  35  26  20 178 174 172 189 190 194 138 133
## 54   84  98 108 117 137 151 167 167 161  84  98 108  69  81  91  91 109
## 55   27  30  41  97 102 115  52  49  52 107 108 117  90  94 106  35  36
## 56   21  34  43  99 117 132 173 146 119  41  55  65  99 117 132  28  42
## 57   10  14  19 109 123 133  36  28  23  18  22  27  68  77  85 110 125
## 58   84 110 137 122 147 174  53  54  57  92 113 134 122 140 158 122 147
## 59   61  84  66 238 255 241  82  84  68 211 229 211 122 140 124  50  60
## 60   41  58  61  41  58  61  43  52  52  19  21  20  41  58  61   6  14
## 61   85 106 105   5  18  18  34  36  34  60  74  68   5  18  18  13  26
## 62    1  13  12  59  77  76 171 187 171  34  49  45  74  90  90  17  28
## 63   83 101  98 105 126 124 196 221 218 173 197 194 114 134 131  13  26
## 64   59  78  74 139 160 151 203 228 220  77  97  92  29  44  43  29  44
## 65   20  27  20 190 200 190  26  30  26 172 180 171  35  42  36  29  37
## 66  113 134 131 116 137 133  51  65  62  83  98  93 121 142 138  27  45
## 67   85 108 106 113 134 130  45  58  52  25  38  37 114 132 125  93 113
## 68   13  23  24  74  93  92  20  28  28  26  33  30  34  47  48  20  28
## 69   12  27  27  66  86  85  30  40  37  66  86  85  66  86  85  18  29
## 70   46  64  63  62  82  81  30  41  38  42  53  50  66  84  83  20  33
## 71  133 154 147 181 203 188 181 218 219  75  90  77 189 210 196 186 206
## 72  141 162 155 141 162 155  61  66  59  84  91  83  75  94  90  60  84
## 73  226 236 222 203 213 202 164 174 162 203 213 202 226 237 226 235 251
## 74   22  34  34  84 113 118  22  34  34  97 114 116  14  33  37  84 113
## 75   12  27  34 122 145 150  83  90  85  13  25  29  45  66  74 108 133
## 76   78  97  94  89 105 103  36  44  42  50  58  54  13  28  27  75  93
## 77  117 126 129 119 128 131  52  53  50  61  61  58 121 130 133  73  78
## 78   27  28  27 131 133 130 100  93  82  12  13  11  20  21  19  94  98
## 79  229 234 218 234 237 220  45  49  38 164 173 162 250 252 235 234 237
## 80  154 176 174 141 167 169  50  58  54 196 213 210 147 173 173 117 142
## 81   27  36  35 110 129 129  91 101 100  98 109 108  10  19  21  14  25
## 82  121 138 134 105 123 118  60  74  69 131 149 146  21  33  29 105 123
## 83  114 134 129 117 138 132  35  46  41  17  30  29  17  30  29 110 131
## 84   14  23  25  12  20  20  20  25  22  41  46  42  20  27  26  12  20
## 85    5  20  20  93 113 111 157 186 187  12  27  27  12  27  27  94 115
## 86  117 134 129 107 124 116  66  68  62  59  69  65 118 137 130 107 124
## 87  130 126 121 213 217 214  92  73  62 196 194 189 156 162 162 206 211
## 88   18  29  29  37  50  50 236 255 254 236 255 254  33  44  45 110 131
## 89   92 107 107  61  74  75  85  99  99 214 233 234  26  36  37  90 104
## 90  125 149 145 131 154 149  57  62  57  46  53  49   5  20  18  36  52
## 91  148 178 172 157 185 180  52  62  57  44  53  50  37  49  46  19  36
## 92  132 153 148 178 205 202  49  62  51 132 161 157  28  52  50  12  36
## 93  157 186 179 157 186 179  59  69  59 113 127 121  78  99  98  59  77
## 94  124 147 148 163 187 187  41  44  38  92 105 101  29  46  49  14  26
## 95  149 173 169 146 170 164 115 134 129 100 114 109 125 148 145  42  53
## 96   91 102 121  90  99 117  83  94 114  93 106 124  85  98 116  84  77
## 97   77  91 108  76  94 115  83 100 121  90 102 121  83 100 121  62  19
## 98  187 172 156 163 100  91 179 157 140 194 173 155 203 186 164 172 118
## 99  179 155 141 188 172 154 179 155 141 164 154 146 180 163 148 172 156
## 100 203 188 177 190 185 171 194 156 147 202 173 163 187 179 166 163  83
## 101 185 190 172 250 236 210 180  58  51 219  69  57 206 118  98 196  66
## 102 244 252 228 252 253 235 211  44  45 179  74  59 253 235 212 252 252
## 103  54  69  65  59  67  61  59  67  61  59  73  67  53  62  57  46  37
## 104  82  94 114  83  91 109  82  94 114  76  50  60  82  94 114  77  36
## 105  52  67  68  46  75  73  59  83  83  52  67  68  57  68  68  44  50
## 106  37  53  58  50  74  82  37  33  37  36  30  35  37  33  37  37  33
## 107 127 142 161 130 142 161 107  73  83  93  34  43 124  99 109 130 142
## 108  67  77  75  67  69  67  51  43  43  45  35  35  70  83  81  37  12
## 109 227 250 244 244 255 253 245 226 221 233 179 172 220 246 241 220 156
## 110  91 107 101  99 114 109  60  27  28  68  19  21  92  90  86  60  19
## 111 107 117 114  88  63  64  78  25  26 100  85  83 115 133 131  85  65
## 112  50  67  69  50  67  69  30  14  18  33  26  29  50  67  69  30  14
## 113 227 241 229 242 204 191 229  91  83 229  82  75 227 219 205 220  75
## 114 228 252 243 234  99  99  98 180 234 101 187 238 194  84  85 244 235
## 115 107 126 130  93  82  83  61  27  28  90  67  69 109 129 131  81  22
## 116 100 118 115 101 121 117 107 118 114  91  78  73 107 118 114 106 122
## 117 109 122 117 114 126 121 109 122 117  77  52  50 109 122 117 130 131
## 118  67  84  82  67  77  75  57  34  34  57  44  44  63  82  80  52  40
## 119 163 181 179 165 187 185 116  40  38 116  40  38 170 157 156 163 181
## 120  59  28  28  52  22  22 190 225 228 194 228 230  59  28  28  74  92
## 121 124 138 133 110 106 102  76  43  43  85  38  40 124 133 130  77  35
## 122 123 141 139 140 141 138  83  33  31  84  35  36 131 141 138  84  35
## 123  99 117 116 114 132 131 122 141 139  68  35  36 107 125 123 106 108
## 124 110 120 118 116 123 123  69  26  27 121 125 125 116 123 123 115 116
## 125  85  35  36  89  39  40  93 113 109 195 218 213  82  28  29  82  28
## 126 173 173 172 163 130 126 125  41  41 125  41  41 164 134 133 149 115
## 127 124  67  67 115  29  28 114 113 109 125  34  34 139  92  93 133  33
## 128 132 139 132 132 123 116  99  77  73 132 141 137 132 139 132  84  52
## 129  59  20  19  52   4   4  60  28  27  52   4   4  73  36  35  77  61
## 130 131 142 137 123 139 133 117 135 129  75  36  34 125 131 124  85  27
## 131 118 139 132 131 147 140  85  27  25  85  27  25 122 140 133  96  70
## 132 189 189 188 196 194 194 188 156 154 196 188 187 189 189 188 188 164
## 133  99 115 109 102 121 115  91  77  74  91  90  85 107 118 113  85  78
## 134 123 138 132 115 131 125  99  85  82  75  49  46 116 134 129 109 129
## 135 139 158 153 163 171 164 165 169 158  75  28  26 156 170 164  93  75
## 136 203 197 195 203 189 187 164  98  92 178  76  73 210 196 188 211 189
## 137 243 220 219 236  73  78 162 174 163 243 225 222 236  69  75 245 245
## 138  92  67  59  90  37  34 163 187 180 155 179 173  90  37  34  85  34
## 139 157 174 162 162 174 162 163 178 164 163 178 164 149 163 149 163 171
## 140  93 107  98  93 107  98  99 109 100  99 100  92  86 100  91  97  92
## 141 164 181 170 164 181 170 164 140 130 187 181 170 164 181 170 166 185
## 142 235 246 234 244 243 229 156  61  50 193  43  38 250 246 233 201 150
## 143 187 198 187 180 197 187 180 195 182 204 211 199 203 204 194 171 131
## 144  70  82  73  75  83  75  60  36  33  75  83  75  70  82  73  60  36
## 145 166 193 187 130  36  36 147 164 156 147 164 156 123  29  30 116  27
## 146 146 149 140 111  81  68 109  49  37 109  49  37 153 165 153 106  58
##     b11 r12 g12 b12 r13 g13 b13 r14 g14 b14 r15 g15 b15 r16 g16 b16
## 1    19 172 235 244 172 235 244 172 228 235 177 235 244  22  52  53
## 2    21 158 225 237 164 227 237 182 228 143 171 228 196 164 227 237
## 3    44  66  68  68  69  65  59  76  84  22  82  93  17  58  60  60
## 4    12  19  27  29  20  29  34  64  61   4 211 222  78  19  27  29
## 5    60 163 168 152 124 117  91 188 205  78 125 147  20 160 183 187
## 6    44 197 114  21 171 102  26 197 114  21 123  74  22 180 107  26
## 7     6 187 215 236 141 142 140 189 171 140 214 221 201 188 211 227
## 8    35 241 255  54 205 229  46 226 246  59 235 252  67 237 254  53
## 9    18  85 128  21  83 125  19  85 128  21  85 128  21  83 125  19
## 10   20 113  76  14 106  69   9 102  67   6 106  69   9  43  29  11
## 11   19  59  42  12  59  42  12  59  42  12  55  41  11  60  45  18
## 12    9  61  42  10  61  42  10  61  42  10  58  39   6  58  39   6
## 13   28 178 198 186 125 146 139 145 166 156 133 153 141  25  37  29
## 14   26 147 174 169 157 180 170 164 188 178 132 156 133 157 180 170
## 15   25  77  90  82   9  21  13  17  28  22  20  35  11 115 129 118
## 16   29  38  54  49  50  59  54  60  65  54  85  99  92  50  59  54
## 17   27 109 131 123  41  53  45  37  50  43  21  34  28  65  77  69
## 18   19  25  34  30  33  42  38  51  62  57  27  37  34  22  33  29
## 19   22  21  33  29  59  75  69  49  69  50  27  44  27  21  33  29
## 20   44 244 255  18 244 255  52 242 254  27 244 255  18 248 255  44
## 21    3  37  53  28  38  49  41 173 203 115  19  36  10  45  58  50
## 22   18 147 196  12 149 196  35 147 196  12 147 196  12 146 195   5
## 23   12 106 115  99  38  50  44  29  57  18 134 146 122  38  50  44
## 24   19 115 108  85  83 107 100  98 126  78  60  80  47  77  83  67
## 25    9 175 227   2 184 233   2 179 229   2 179 229   2 179 229   2
## 26   20 209 253  20 198 247   2 198 249   2 203 250   4 204 250  11
## 27   45  51  45  27 179 194 188 171 154 123  46  49  44  59  57  45
## 28    5 107 125  97 108 123 115 119 146  72  42  56  36 108 123 115
## 29   18  48  67  36  34  45  37  75  90  76  41  53  43  34  45  37
## 30   18  44  50  38  46  52  42  59  75  27  46  57  36  49  54  42
## 31    1  33  44  30  43  53  42  41  52  38 123 139 132  27  37  26
## 32    6 197 235 236 154 189 186  53  77  74  44  65  54  92 124 122
## 33   13  35  53  28  43  51  45  53  69  44  54  64  58  45  54  49
## 34   12  44  55  28  69  76  66 107 137  68  43  50  37  49  54  45
## 35   34  58  84  28  33  46  38  52  75  27  51  65  54  41  54  44
## 36   36  50  59  45  28  36  29  79 100  34  35  42  30  58  67  52
## 37   28 180 120  35 179 118  30 173 115  29 173 115  29 180 120  35
## 38   19 135 158 153  44  58  51  26  45  11 164 197 131  53  72  66
## 39   36 188 218 212 188 218 212 149 193  44 185 214 209 188 218 212
## 40    2 196 210  14 197 210  19 197 211  27 201 214  36 201 214  27
## 41    1 165 189  11 170 194  12 165 188  19 170 194  19 170 194  12
## 42   43 139 101  34 131  94  27 140 105  42 141 105  36 139  99  29
## 43    3 162 115  21 162 115  21 162 115  21 158 114  26 162 115  26
## 44   52 118  89  27 123  90  27 123  90  27 123  92  34 118  85  21
## 45   26  19  35   3  60  67  59  56  76  26  28  36  28  28  36  28
## 46   29 146 181  29 146 179  50 146 180  43 146 180  43 146 180  43
## 47   36 100  97  62  27  38  33 252 253 213 212 219 189 124 122  92
## 48   35 189 155 107 107  75  35 244 213 172 253 227 179 228 201 163
## 49  137  44  53  58  27  34  37 102 113 118  99 109 115  59  66  69
## 50  102 156 186 228 147 165 203  86  92 105  85  90 102  30  33  42
## 51   43  92  90  85  82  84  84 106 118 129 106 118 129  84  89  92
## 52  171 123 126 123  91  89  85 164 172 171 173 177 173 130 125 116
## 53  131 158 153 149  76  75  75 172 173 177 172 171 173 139 138 139
## 54  124 108 109 107  91 109 124 122 141 156 125 146 161  91  84  69
## 55   44 213 212 218  44  41  44  99 105 118 105 110 123  50  54  66
## 56   51  99  91  83 107 107 104 101 121 134 101 121 134 123 115 104
## 57  137  53  42  35  12  20  27 115 129 140 115 129 140 116 104  91
## 58  174  74  82  92  67  83 100 122 149 178 125 153 179  89 106 124
## 59   51  44  50  37 204 225 205 228 251 229 228 251 229  43  53  44
## 60   17  19  21  20  20  27  27  36  50  52  41  58  61  11  18  20
## 61   26  34  36  34 170 189 174  68  91  90  85 106 105  29  33  30
## 62   28 177 190 173 163 182 170  59  77  76  59  77  76 181 198 186
## 63   27  99 117 115  58  70  66 101 124 122 101 124 122  53  65  61
## 64   43 203 228 220 137 157 150 138 159 152 140 162 154 172 194 188
## 65   34  90  93  85  50  53  45 185 194 182 179 189 179  98 101  93
## 66   43 102 124 121  35  52  50 106 126 123 108 130 126  82  95  89
## 67  109  36  50  45  34  45  37 110 132 129 108 131 126  49  61  52
## 68   28  28  36  34  44  57  54  82 101  99  74  93  92  24  30  29
## 69   30  34  45  42  68  90  90  68  90  90  68  90  90  27  37  34
## 70   31  34  45  42  34  45  42  66  84  83  62  81  78  34  45  42
## 71  193 130 157 155  57  70  54 185 205 190 181 203 188  67  79  72
## 72   82  86  96  87  99 108 100 141 162 155 141 162 155  66  69  61
## 73  244 205 217 206 156 169 158 226 237 226 226 237 226 202 210 197
## 74  118  29  42  43  50  66  68  82 109 115  78 108 113  25  36  35
## 75  141  41  46  44  33  38  36 114 140 148 114 140 148  44  49  46
## 76   91  41  49  46  56  63  59  75  93  91  75  93  91  36  44  42
## 77   81  61  61  58  69  69  66 114 123 125 111 120 123  68  66  61
## 78   97  74  65  54 147 149 146 138 138 134 138 138 134  82  73  62
## 79  220  27  30  20 204 210 196 228 233 214 234 237 220  83  82  69
## 80  144  58  66  62 162 181 180 147 173 173 147 173 173 139 157 155
## 81   27  33  41  38  62  72  70 108 125 125 114 130 131  66  73  70
## 82  118  37  49  45  52  66  61 108 129 124 108 129 124  67  78  74
## 83  125  89 103  98  27  38  33 110 131 125 110 133 129  36  50  44
## 84   20  37  41  37  36  44  43 106 116 116 106 116 116  29  33  29
## 85  113 190 218 217  46  65  63  93 113 111  93 113 111  36  49  45
## 86  116  74  83  75  58  67  62 114 131 124 107 124 116  69  82  75
## 87  210 100  84  74 147 139 131 202 206 203 202 206 203  92  76  66
## 88  128 226 250 246 226 250 246 105 125 123 105 125 123 204 233 230
## 89  103  59  66  62  59  66  62  36  47  48  92 107 107 173 190 192
## 90   50  49  54  49  28  36  34 123 147 141 123 147 141  37  44  41
## 91   35  49  59  54  44  53  50 142 171 164 139 164 157  49  59  54
## 92   34  33  45  36  20  44  42 173 203 198 178 205 202  19  30  25
## 93   74  51  61  51 206 221 213 157 186 179 157 186 179  51  61  51
## 94   27  45  50  43  61  66  59 156 180 179 156 180 179  84  90  84
## 95   52  82  94  89  73  83  77 146 170 164 155 178 172  85  98  92
## 96   90  68  30  38 219 218 211  75  35  43  75  35  43  50  49  45
## 97   28  68  26  33  58  53  51  65  19  28  65  19  28  38  41  45
## 98  107 124  43  44 189 154 113 156  45  45 117  38  40 147 122  99
## 99  145 156  37  42 202 166 125  76  24  30  91  25  30 166 171 173
## 100  79 171  51  52 163 217 252 164  43  45 171  43  46 201 163 113
## 101  56 156  53  47 157 210 228 235  76  59 196  66  56 116 131 128
## 102 229 211  44  45 162 219 227 211  44  45 211  44  45 124 154 150
## 103  34  46  37  34 171 204 180  46  37  34  46  37  34 218 251 228
## 104  41  75  29  34  51  51  51  74  26  30  74  26  30  51  51  51
## 105  52  49  51  52 124 140 115  43  43  44  42  37  37  79  96  77
## 106  37  36  30  35  28  43  45  36  30  35  36  30  35  28  43  45
## 107 161  98  29  34  67  53  44  98  29  34  97  27  30  59  45  36
## 108  14  44  17  19  52  52  51  44  17  19  44  17  19  18  21  20
## 109 151 227  83  83  66  70  58 210  81  77 227  76  76  42  46  34
## 110  20  68  19  21  92  99  93  68  19  21  68  19  21 135 169 177
## 111  62  82  26  27 140 157 155  76  21  22  82  26  27  67  75  68
## 112  18  30  14  18  44  49  38  30  14  18  30  14  18 101 122 116
## 113  68 228  76  69  67  68  52 198  68  65 226  60  58  36  38  26
## 114 228 225 142 138 116 193 238 245 251 244 221  85  85 100 168 212
## 115  22  83  26  26  75  86  82  82  25  23  83  26  26  75  83  76
## 116 118  90  53  46  67  85  82  90  53  46  90  53  46 125 145 142
## 117 126  77  28  29  36  42  36  77  28  29  77  28  29  28  34  29
## 118  38  57  34  34  25  34  30  57  31  30  57  34  34 123 151 145
## 119 179 116  37  38  33  38  34 116  37  38 116  37  38  38  45  41
## 120  91 186 220 222 205 237 236  66  30  29  67  32  30 205 237 236
## 121  36  83  36  37 132 146 140  83  36  37  83  36  37 116 123 117
## 122  36  97  41  43  93  89  75  93  36  37  93  36  37 125 147 146
## 123 107  77  42  43  76  90  76  75  37  38  75  37  38  44  52  44
## 124 116  81  33  34  38  41  35  77  29  29  77  29  29  51  52  44
## 125  29  29  38  33  75  86  81  85  35  36  91  42  43 115 131 126
## 126 115 123  38  37  82  76  68 125  41  41 123  38  37 136 134 129
## 127  33  51  51  43  50  50  38 137  34  34 133  33  33  42  41  30
## 128  50  85  29  26  27  30  26  92  35  33  91  33  29  30  33  28
## 129  59  77  57  55  27  28  20  65  12  12  65  12  12  13  18   9
## 130  26  85  27  26  25  30  25  83  25  23  82  23  21  52  49  37
## 131  65  85  27  25  46  58  50  88  29  27  85  27  25 107 124 116
## 132 162 147  43  44  94  89  77 147  43  44 147  43  44  86  81  69
## 133  73  67  22  24 146 158 150  68  25  26  68  25  26  68  82  76
## 134 124  76  27  28  36  43  35  76  27  28  76  27  28 178 203 197
## 135  71  93  29  27  45  51  42  98  32  30  97  30  29  61  67  58
## 136 187 172  68  67  99  99  83 165  61  60 177  74  70 117 117 107
## 137 244 236 241 237 115 114  85 233  65  70 230  58  63 115 116  92
## 138  30  99 115 108 107 126 121  97  43  36  88  35  31  99 118 114
## 139 157 154  80  71 129 131 110 155  89  79 156 100  90 195 197 171
## 140  84  68  28  27  76  88  79  68  28  27  68  28  27  82  93  84
## 141 172 129  44  42 179 205 202 124  41  38 124  41  38  50  58  46
## 142 138 180  37  36 163 187 181 194  44  40 186  38  36 188 211 202
## 143 115 171  70  51  59  61  43 177  76  54 172  74  53  51  53  35
## 144  33  62  27  26  52  57  44  65  30  29  65  30  29  92 101  81
## 145  26 174 206 209  74  78  65 125  34  34 130  36  36 109 118 106
## 146  46 114  49  38 198 220 209 115  51  41 115  51  41 218 238 226
next_sign<-read.csv("next_sign.csv")
next_sign
##    r1  g1  b1  r2 g2 b2  r3 g3 b3  r4  g4  b4  r5  g5  b5  r6  g6  b6  r7
## 1 204 227 220 196 59 51 202 67 59 204 227 220 236 250 234 242 252 235 205
##    g7  b7  r8 g8 b8  r9 g9 b9 r10 g10 b10 r11 g11 b11 r12 g12 b12 r13 g13
## 1 148 131 190 50 43 179 70 57 242 229 212 190  50  43 193  51  44 170 197
##   b13 r14 g14 b14 r15 g15 b15 r16 g16 b16
## 1 196 190  50  43 190  47  41 165 195 196
# Create a vector of labels
sign_types <- signs$sign_type

# Classify the next sign observed
knn(train = signs[,-1], test = next_sign, cl = sign_types)
## [1] stop
## Levels: pedestrian speed stop

1.2: Thinking like kNN

With your help, the test car successfully identified the sign and stopped safely at the intersection.

How did the knn() function correctly classify the stop sign?

Answer the question

50 XP

Possible Answers

It learned that stop signs are red

press 1

The sign was in some way similar to another stop sign

press 2 [ans]

Stop signs have eight sides

press 3

The other types of signs were less likely

press 4

1.3: Exploring the traffic sign dataset

To better understand how the knn() function was able to classify the stop sign, it may help to examine the training dataset it used.

Each previously observed street sign was divided into a 4x4 grid, and the red, green, and blue level for each of the 16 center pixels is recorded as illustrated here.

Stop Sign Data Encoding

Stop Sign Data Encoding

The result is a dataset that records the sign_type as well as 16 x 3 = 48 color properties of each sign.

Instructions

100 XP

Use the str() function to examine the signs dataset.

Use table() to count the number of observations of each sign type by passing it the column containing the labels.

Run the provided aggregate() command to see whether the average red level might vary by sign type.

# Examine the structure of the signs dataset
str(signs)
## 'data.frame':    146 obs. of  49 variables:
##  $ sign_type: Factor w/ 3 levels "pedestrian","speed",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ r1       : int  155 142 57 22 169 75 136 149 13 123 ...
##  $ g1       : int  228 217 54 35 179 67 149 225 34 124 ...
##  $ b1       : int  251 242 50 41 170 60 157 241 28 107 ...
##  $ r2       : int  135 166 187 171 231 131 200 34 5 83 ...
##  $ g2       : int  188 204 201 178 254 89 203 45 21 61 ...
##  $ b2       : int  101 44 68 26 27 53 107 1 11 26 ...
##  $ r3       : int  156 142 51 19 97 214 150 155 123 116 ...
##  $ g3       : int  227 217 51 27 107 144 167 226 154 124 ...
##  $ b3       : int  245 242 45 29 99 75 134 238 140 115 ...
##  $ r4       : int  145 147 59 19 123 156 171 147 21 67 ...
##  $ g4       : int  211 219 62 27 147 169 218 222 46 67 ...
##  $ b4       : int  228 242 65 29 152 190 252 242 41 52 ...
##  $ r5       : int  166 164 156 42 221 67 171 170 36 70 ...
##  $ g5       : int  233 228 171 37 236 50 158 191 60 53 ...
##  $ b5       : int  245 229 50 3 117 36 108 113 26 26 ...
##  $ r6       : int  212 84 254 217 205 37 157 26 75 26 ...
##  $ g6       : int  254 116 255 228 225 36 186 37 108 26 ...
##  $ b6       : int  52 17 36 19 80 42 11 12 44 21 ...
##  $ r7       : int  212 217 211 221 235 44 26 34 13 52 ...
##  $ g7       : int  254 254 226 235 254 42 35 45 27 45 ...
##  $ b7       : int  11 26 70 20 60 44 10 19 25 27 ...
##  $ r8       : int  188 155 78 181 90 192 180 221 133 117 ...
##  $ g8       : int  229 203 73 183 110 131 211 249 163 109 ...
##  $ b8       : int  117 128 64 73 9 73 236 184 126 83 ...
##  $ r9       : int  170 213 220 237 216 123 129 226 83 110 ...
##  $ g9       : int  216 253 234 234 236 74 109 246 125 74 ...
##  $ b9       : int  120 51 59 44 66 22 73 59 19 12 ...
##  $ r10      : int  211 217 254 251 229 36 161 30 13 98 ...
##  $ g10      : int  254 255 255 254 255 34 190 40 27 70 ...
##  $ b10      : int  3 21 51 2 12 37 10 34 25 26 ...
##  $ r11      : int  212 217 253 235 235 44 161 34 9 20 ...
##  $ g11      : int  254 255 255 243 254 42 190 44 23 21 ...
##  $ b11      : int  19 21 44 12 60 44 6 35 18 20 ...
##  $ r12      : int  172 158 66 19 163 197 187 241 85 113 ...
##  $ g12      : int  235 225 68 27 168 114 215 255 128 76 ...
##  $ b12      : int  244 237 68 29 152 21 236 54 21 14 ...
##  $ r13      : int  172 164 69 20 124 171 141 205 83 106 ...
##  $ g13      : int  235 227 65 29 117 102 142 229 125 69 ...
##  $ b13      : int  244 237 59 34 91 26 140 46 19 9 ...
##  $ r14      : int  172 182 76 64 188 197 189 226 85 102 ...
##  $ g14      : int  228 228 84 61 205 114 171 246 128 67 ...
##  $ b14      : int  235 143 22 4 78 21 140 59 21 6 ...
##  $ r15      : int  177 171 82 211 125 123 214 235 85 106 ...
##  $ g15      : int  235 228 93 222 147 74 221 252 128 69 ...
##  $ b15      : int  244 196 17 78 20 22 201 67 21 9 ...
##  $ r16      : int  22 164 58 19 160 180 188 237 83 43 ...
##  $ g16      : int  52 227 60 27 183 107 211 254 125 29 ...
##  $ b16      : int  53 237 60 29 187 26 227 53 19 11 ...
# Count the number of signs of each type
table(signs$sign_type)
## 
## pedestrian      speed       stop 
##         46         49         51
# Check r10's average red level by sign type
aggregate(r10 ~ sign_type, data = signs, mean)
##    sign_type       r10
## 1 pedestrian 113.71739
## 2      speed  80.63265
## 3       stop 132.39216

1.4: Classifying a collection of road signs

Now that the autonomous vehicle has successfully stopped on its own, your team feels confident allowing the car to continue the test course.

The test course includes 59 additional road signs divided into three types:

Stop Sign

Stop Sign

Pedestrian Sign

Pedestrian Sign

Speed Limit Sign

Speed Limit Sign

At the conclusion of the trial, you are asked to measure the car’s overall performance at recognizing these signs.

Instructions

100 XP

The class package and the dataset signs are already loaded in your workspace. So is the dataframe test_signs, which holds a set of observations you’ll test your model on.

Classify the test_signs data using knn().

Set train equal to the observations in signs without labels.

Use test_signs for the test argument, again without labels.

For the cl argument, use the vector of labels provided for you.

Use table() to explore the classifier’s performance at identifying the three sign types.

Create the vector signs_actual by extracting the labels from test_signs.

Pass the vector of predictions and the vector of actual signs to table() to cross tabulate them.

Compute the overall accuracy of the kNN learner using the mean() function.

# Loading test_signs dataframe
test_signs<-read.csv("test_signs.csv")
# Use kNN to identify the test road signs
sign_types <- signs$sign_type
signs_pred <- knn(train = signs[,-1], test = test_signs[,-1], cl = sign_types)

# Create a confusion matrix of the actual versus predicted values
signs_actual <- test_signs$sign_type
table(signs_pred, signs_actual )
##             signs_actual
## signs_pred   pedestrian speed stop
##   pedestrian         19     2    0
##   speed               0    17    0
##   stop                0     2   19
# Compute the accuracy
mean(signs_pred == signs_actual)
## [1] 0.9322034

1.5: Understanding the impact of ‘k’

There is a complex relationship between k and classification accuracy. Bigger is not always better.

Which of these is a valid reason for keeping k as small as possible (but no smaller)?

Answer the question

50 XP

Possible Answers

A smaller k requires less processing power

press 1

A smaller k reduces the impact of noisy data

press 2

A smaller k minimizes the chance of a tie vote

press 3

A smaller k may utilize more subtle patterns

press 4 [ans]

1.6: Testing other ‘k’ values

By default, the knn() function in the class package uses only the single nearest neighbor.

Setting a k parameter allows the algorithm to consider additional nearby neighbors. This enlarges the collection of neighbors which will vote on the predicted class.

Compare k values of 1, 7, and 15 to examine the impact on traffic sign classification accuracy.

Instructions

100 XP

The class package is already loaded in your workspace along with the datasets signs, signs_test, and sign_types. The object signs_actual holds the true values of the signs.

Compute the accuracy of the default k = 1 model using the given code.

Modify the knn() function call by setting k = 7.

Revise the code once more by setting k = 15 and compare the three accuracy values.

# Loading test_signs dataframe
signs_test<-read.csv("test_signs.csv")

# Compute the accuracy of the baseline model (default k = 1)
k_1 <- knn(train = signs[,-1], test = signs_test[,-1], cl = sign_types)
k_1
##  [1] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
##  [7] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [13] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [19] pedestrian stop       pedestrian speed      speed      speed     
## [25] speed      speed      speed      stop       pedestrian speed     
## [31] speed      speed      speed      speed      speed      speed     
## [37] speed      speed      speed      speed      stop       stop      
## [43] stop       stop       stop       stop       stop       stop      
## [49] stop       stop       stop       stop       stop       stop      
## [55] stop       stop       stop       stop       stop      
## Levels: pedestrian speed stop
mean(k_1 == signs_test[,1])
## [1] 0.9322034
# Create a confusion matrix of the actual versus predicted values
signs_actual <- test_signs$sign_type
table(k_1, signs_actual)
##             signs_actual
## k_1          pedestrian speed stop
##   pedestrian         19     2    0
##   speed               0    17    0
##   stop                0     2   19
# Modify the above to set k = 7
k_7 <- knn(train = signs[,-1], test = signs_test[,-1], cl = sign_types,k=7)
k_7
##  [1] pedestrian pedestrian pedestrian stop       pedestrian pedestrian
##  [7] pedestrian pedestrian speed      pedestrian pedestrian pedestrian
## [13] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [19] pedestrian speed      speed      speed      speed      speed     
## [25] speed      speed      speed      stop       speed      speed     
## [31] speed      speed      speed      speed      speed      speed     
## [37] speed      speed      speed      speed      stop       stop      
## [43] stop       stop       stop       stop       stop       stop      
## [49] stop       stop       stop       stop       stop       stop      
## [55] stop       stop       stop       stop       stop      
## Levels: pedestrian speed stop
mean(k_7 == signs_test[,1])
## [1] 0.9491525
table(k_7, signs_actual)
##             signs_actual
## k_7          pedestrian speed stop
##   pedestrian         17     0    0
##   speed               1    20    0
##   stop                1     1   19
# Set k = 15 and compare to the above
k_15 <-  knn(train = signs[,-1], test = signs_test[,-1], cl = sign_types,k=15)
k_15
##  [1] pedestrian stop       pedestrian stop       pedestrian stop      
##  [7] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [13] pedestrian speed      pedestrian pedestrian pedestrian pedestrian
## [19] stop       speed      speed      speed      speed      speed     
## [25] speed      speed      speed      stop       speed      speed     
## [31] speed      speed      speed      speed      speed      speed     
## [37] speed      speed      speed      speed      stop       stop      
## [43] stop       stop       stop       stop       stop       stop      
## [49] stop       stop       stop       stop       stop       stop      
## [55] stop       stop       stop       stop       stop      
## Levels: pedestrian speed stop
mean(k_15 == signs_test[,1])
## [1] 0.8983051
table(k_15, signs_actual)
##             signs_actual
## k_15         pedestrian speed stop
##   pedestrian         14     0    0
##   speed               1    20    0
##   stop                4     1   19

1.7: Seeing how the neighbors voted

When multiple nearest neighbors hold a vote, it can sometimes be useful to examine whether the voters were unanimous or widely separated.

For example, knowing more about the voters’ confidence in the classification could allow an autonomous vehicle to use caution in the case there is any chance at all that a stop sign is ahead.

In this exercise, you will learn how to obtain the voting results from the knn() function.

Instructions

100 XP

The class package has already been loaded in your workspace along with the dataset signs.

Build a kNN model with the prob = TRUE parameter to compute the vote proportions. Set k = 7.

Use the attr() function to obtain the vote proportions for the predicted class. These are stored in the attribute “prob”.

Examine the first several vote outcomes and percentages using the head() function to see how the confidence varies from sign to sign.

1.8: Why normalize data?

Before applying kNN to a classification task, it is common practice to rescale the data using a technique like min-max normalization. What is the purpose of this step?

Answer the question

50 XP

Possible Answers

To ensure all data elements may contribute equal shares to distance.

press 1 [ans]

To help the kNN algorithm converge on a solution faster.

press 2

To convert all of the data elements to numbers.

press 3

To redistribute the data as a normal bell curve.

press 4

Chapter 2: Naive Bayes

Naive Bayes uses principles from the field of statistics to make predictions. This chapter will introduce the basics of Bayesian methods while exploring how to apply these techniques to iPhone-like destination suggestions.

2.1: Computing probabilities

The where9am data frame contains 91 days (thirteen weeks) worth of data in which Brett recorded his location at 9am each day as well as whether the daytype was a weekend or weekday.

Using the conditional probability formula below, you can compute the probability that Brett is working in the office, given that it is a weekday.

P(A|B)=P(A and B)P(B) Calculations like these are the basis of the Naive Bayes destination prediction model you’ll develop in later exercises.

Instructions

100 XP

Find P(office) using nrow() and subset() to count rows in the dataset and save the result as p_A.

Find P(weekday), using nrow() and subset() again, and save the result as p_B.

Use nrow() and subset() a final time to find P(office and weekday). Save the result as p_AB.

Compute P(office | weekday) and save the result as p_A_given_B.

Print the value of p_A_given_B.

# Loading test_signs dataframe
where9am<-read.csv("where9am.csv")
where9am$daytype <- factor(where9am$daytype)
# Compute P(A) 
p_A <- nrow(subset(where9am, location == "office")) / 91

# Compute P(B)
p_B <- nrow(subset(where9am, daytype == "weekday")) / 91

# Compute the observed P(A and B)
p_AB <- nrow(subset(where9am, where9am$location == "office" & where9am$daytype == "weekday")) / 91

# Compute P(A | B) and print its value
p_A_given_B <- p_AB / p_B
p_A_given_B
## [1] 0.6

2.2: Understanding dependent events

In the previous exercise, you found that there is a 55% chance Brett is in the office at 9am given that it is a weekday. On the other hand, if Brett is never in the office on a weekend, which of the following is/are true?

Answer the question

50 XP

Possible Answers

P(office and weekend) = 0.

press 1

P(office | weekend) = 0.

press 2

Brett’s location is dependent on the day of the week.

press 3

All of the above.

press 4

2.3: A simple Naive Bayes location model

The previous exercises showed that the probability that Brett is at work or at home at 9am is highly dependent on whether it is the weekend or a weekday.

To see this finding in action, use the where9am data frame to build a Naive Bayes model on the same data.

You can then use this model to predict the future: where does the model think that Brett will be at 9am on Thursday and at 9am on Saturday?

Instructions

100 XP

The dataframe where9am is available in your workspace. This dataset contains information about Brett’s location at 9am on different days.

Load the naivebayes package.

Use naive_bayes() with a formula like y ~ x to build a model of location as a function of daytype.

Forecast the Thursday 9am location using predict() with the thursday9am object as the newdata argument.

Do the same for predicting the saturday9am location.

# Load the naivebayes package
# install.packages("e1071", repos = "https://cran.rstudio.com")
library(e1071)
thursday9am<-read.csv("thursday9am.csv")
thursday9am$daytype <- factor(thursday9am$daytype)
thursday9am
##   X daytype
## 1 1 weekday
saturday9am<-read.csv("saturday9am.csv")
saturday9am$daytype <- factor(saturday9am$daytype)
saturday9am
##   X daytype
## 1 1 weekend
# Build the location prediction model
locmodel <- naiveBayes(location ~ daytype, data = where9am)

# Predict Thursday's 9am location
predict(locmodel, thursday9am)
## [1] office
## Levels: appointment campus home office
# Predict Saturdays's 9am location
predict(locmodel, saturday9am)
## [1] home
## Levels: appointment campus home office

2.4: Examining “raw” probabilities

The naivebayes package offers several ways to peek inside a Naive Bayes model.

Typing the name of the model object provides the a priori (overall) and conditional probabilities of each of the model’s predictors. If one were so inclined, you might use these for calculating posterior (predicted) probabilities by hand.

Alternatively, R will compute the posterior probabilities for you if the type = “prob” parameter is supplied to the predict() function.

Using these methods, examine how the model’s predicted 9am location probability varies from day-to-day.

Instructions

100 XP

The model locmodel that you fit in the previous exercise is in your workspace.

Print the locmodel object to the console to view the computed a priori and conditional probabilities.

Use the predict() function similarly to the previous exercise, but with type = “prob” to see the predicted probabilities for Thursday at 9am.

Compare these to the predicted probabilities for Saturday at 9am.

# The 'naivebayes' package is loaded into the workspace
# and the Naive Bayes 'locmodel' has been built

# Examine the location prediction model
print(locmodel)
## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## 
## A-priori probabilities:
## Y
## appointment      campus        home      office 
##  0.01098901  0.10989011  0.45054945  0.42857143 
## 
## Conditional probabilities:
##              daytype
## Y               weekday   weekend
##   appointment 1.0000000 0.0000000
##   campus      1.0000000 0.0000000
##   home        0.3658537 0.6341463
##   office      1.0000000 0.0000000
# Obtain the predicted probabilities for Thursday at 9am
predict(locmodel, thursday9am, type ="raw") 
##      appointment    campus      home office
## [1,]  0.01538462 0.1538462 0.2307692    0.6
# type: predict naiveBayes in actual RStudio takes on either "raw" or "type" (default), whereas Datacamp uses "probs"


# Obtain the predicted probabilities for Saturday at 9am
predict(locmodel, saturday9am, type = "raw")
##       appointment       campus      home      office
## [1,] 3.838772e-05 0.0003838772 0.9980806 0.001497121
# type: predict naiveBayes in actual RStudio takes on either "raw" or "type" (default), whereas Datacamp uses "probs"

2.5: Understanding independence

Understanding the idea of event independence will become important as you learn more about how “naive” Bayes got its name. Which of the following is true about independent events?

Answer the question

50 XP

Possible Answers

The events cannot occur at the same time.

press 1

A Venn diagram will always show no intersection.

press 2

Knowing the outcome of one event does not help predict the other.

press 3

At least one of the events is completely random.

press 4

2.6: Who are you calling naive?

The Naive Bayes algorithm got its name because it makes a “naive” assumption about event independence.

What is the purpose of making this assumption?

Answer the question

50 XP

Possible Answers

Independent events can never have a joint probability of zero.

press 1

The joint probability calculation is simpler for independent events.

press 2 [ans]

Conditional probability is undefined for dependent events.

press 3

Dependent events cannot be used to make predictions.

press 4

2.7: A more sophisticated location model

The locations dataset records Brett’s location every hour for 13 weeks. Each hour, the tracking information includes the daytype (weekend or weekday) as well as the hourtype (morning, afternoon, evening, or night).

Using this data, build a more sophisticated model to see how Brett’s predicted location not only varies by the day of week but also by the time of day.

Instructions

100 XP

The dataset locations is already loaded in your workspace.

Use the R formula interface to build a model where location depends on both daytype and hourtype. Recall that the function naive_bayes() takes 2 arguments: formula and data.

Predict Brett’s location on a weekday afternoon using the dataframe weekday_afternoon and the predict() function.

Do the same for a weekday_evening.

# The 'naivebayes' package is loaded into the workspace already

weekday_afternoon<-read.csv("weekday_afternoon.csv")
# weekday_afternoon$daytype <- factor(thursday9am$daytype)
weekday_afternoon
##   X daytype  hourtype location
## 1 1 weekday afternoon   office
weekday_evening<-read.csv("weekday_evening.csv")
#saturday9am$daytype <- factor(saturday9am$daytype)
weekday_evening
##   X daytype  hourtype location
## 1 1 weekday afternoon   office
locations<-read.csv("locations.csv")
# Build a NB model of location
locmodel <- naiveBayes(location ~ daytype + hourtype, data = locations)

# Predict Brett's location on a weekday afternoon
predict(locmodel, weekday_afternoon)
## [1] office
## Levels: appointment campus home office restaurant store theater
# Predict Brett's location on a weekday evening

predict(locmodel, weekday_evening, type ="raw")
##      appointment     campus      home    office restaurant       store
## [1,] 0.004300045 0.08385089 0.2482618 0.5848062 0.07304768 0.005733394
##           theater
## [1,] 1.290014e-08

2.8: Preparing for unforeseen circumstances

While Brett was tracking his location over 13 weeks, he never went into the office during the weekend. Consequently, the joint probability of P(office and weekend) = 0.

Explore how this impacts the predicted probability that Brett may go to work on the weekend in the future. Additionally, you can see how using the Laplace correction will allow a small chance for these types of unforeseen circumstances.

Instructions

100 XP

The model locmodel is already in your workspace, along with the dataframe weekend_afternoon.

Use the locmodel to output predicted probabilities for a weekend afternoon by using the predict() function. Remember to set the type argument.

Create a new naive Bayes model with the Laplace smoothing parameter set to 1. You can do this by setting the laplace argument in your call to naive_bayes(). Save this as locmodel2.

See how the new predicted probabilities compare by using the predict() function on your new model.

# The 'naivebayes' package is loaded into the workspace already
# The Naive Bayes location model (locmodel) has already been built
weekend_afternoon<-read.csv("weekend_afternoon.csv")
# Observe the predicted probabilities for a weekend afternoon
predict(locmodel,weekend_afternoon,type="raw")
##      appointment       campus      home      office restaurant      store
## [1,]  0.02462883 0.0004802622 0.8439145 0.003349521  0.1111338 0.01641922
##          theater
## [1,] 7.38865e-05
# Build a new model using the Laplace correction
locmodel2 <- naiveBayes(location ~ daytype + hourtype, data = locations,laplace=1)

# Observe the new predicted probabilities for a weekend afternoon
predict(locmodel2,weekend_afternoon,type="raw")
##      appointment      campus      home      office restaurant      store
## [1,]  0.02013872 0.006187715 0.8308154 0.007929249  0.1098743 0.01871085
##          theater
## [1,] 0.006343697

2.9: Understanding the Laplace correction

By default, the naive_bayes() function in the naivebayes package does not use the Laplace correction. What is the risk of leaving this parameter unset?

Answer the question

50 XP

Possible Answers

Some potential outcomes may be predicted to be impossible.

press 1 [ans]

The algorithm may have a divide by zero error.

press 2

Naive Bayes will ignore features with zero values.

press 3

The model may not estimate probabilities for some cases.

press 4

2.10: Handling numeric predictors

Numeric data is often binned before it is used with Naive Bayes. Which of these is not an example of binning?

Answer the question 50 XP

Possible Answers

age values recoded as ‘child’ or ‘adult’ categories

press 1

geographic coordinates recoded into geographic regions (West, East, etc.)

press 2

test scores divided into four groups by percentile

press 3

income values standardized to follow a normal bell curve

press 4 [ans]

Chapter 3: Logistic Regression

Logistic regression involves fitting a curve to numeric data to make predictions about binary events. Arguably one of the most widely used machine learning methods, this chapter will provide an overview of the technique while illustrating how to apply it to fundraising data.

3.1: Building simple logistic regression models

The donors dataset contains 93,462 examples of people mailed in a fundraising solicitation for paralyzed military veterans. The donated column is 1 if the person made a donation in response to the mailing and 0 otherwise. This binary outcome will be the dependent variable for the logistic regression model.

The remaining columns are features of the prospective donors that may influence their donation behavior. These are the model’s independent variables.

When building a regression model, it is often helpful to form a hypothesis about which independent variables will be predictive of the dependent variable. The bad_address column, which is set to 1 for an invalid mailing address and 0 otherwise, seems like it might reduce the chances of a donation. Similarly, one might suspect that religious interest (interest_religion) and interest in veterans affairs (interest_veterans) would be associated with greater charitable giving.

In this exercise, you will use these three factors to create a simple model of donation behavior.

Instructions

100 XP

The dataset donors is available in your workspace.

Examine donors using the str() function.

Count the number of occurrences of each level of the donated variable using the table() function.

Fit a logistic regression model using the formula interface and the three independent variables described above.

Call glm() with the formula as its first argument and the dataframe as the data argument.

Save the result as donation_model.

Summarize the model object with summary().

donors<-read.csv("donors.csv")
# Examine the dataset to identify potential independent variables
str(donors)
## 'data.frame':    93462 obs. of  13 variables:
##  $ donated          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ veteran          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ bad_address      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age              : int  60 46 NA 70 78 NA 38 NA NA 65 ...
##  $ has_children     : int  0 1 0 0 1 0 1 0 0 0 ...
##  $ wealth_rating    : int  0 3 1 2 1 0 2 3 1 0 ...
##  $ interest_veterans: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ interest_religion: int  0 0 0 0 1 0 0 0 0 0 ...
##  $ pet_owner        : int  0 0 0 0 0 0 1 0 0 0 ...
##  $ catalog_shopper  : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ recency          : Factor w/ 2 levels "CURRENT","LAPSED": 1 1 1 1 1 1 1 1 1 1 ...
##  $ frequency        : Factor w/ 2 levels "FREQUENT","INFREQUENT": 1 1 1 1 1 2 2 1 2 2 ...
##  $ money            : Factor w/ 2 levels "HIGH","MEDIUM": 2 1 2 2 2 2 2 2 2 2 ...
# Explore the dependent variable
table(donors$donated)
## 
##     0     1 
## 88751  4711
# Build the donation model
donation_model <- glm(donated~bad_address+interest_religion+interest_veterans, data = donors, family = "binomial")

# Summarize the model results
summary(donation_model)
## 
## Call:
## glm(formula = donated ~ bad_address + interest_religion + interest_veterans, 
##     family = "binomial", data = donors)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3480  -0.3192  -0.3192  -0.3192   2.5678  
## 
## Coefficients:
##                   Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)       -2.95139    0.01652 -178.664   <2e-16 ***
## bad_address       -0.30780    0.14348   -2.145   0.0319 *  
## interest_religion  0.06724    0.05069    1.327   0.1847    
## interest_veterans  0.11009    0.04676    2.354   0.0186 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 37330  on 93461  degrees of freedom
## Residual deviance: 37316  on 93458  degrees of freedom
## AIC: 37324
## 
## Number of Fisher Scoring iterations: 5