This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
After several trips with a human behind the wheel, it is time for the self-driving car to attempt the test course alone.
As it begins to drive away, its camera captures the following image:
Stop Sign
Can you apply a kNN classifier to help the car recognize this sign?
Instructions
100XP
The dataset signs is loaded in your workspace along with the dataframe next_sign, which holds the observation you want to classify.
Load the class package.
Create a vector of sign labels to use with kNN by extracting the column sign_type from signs.
Identify the next_sign using the knn() function.
Set the train argument equal to the signs data frame without the first column.
Set the test argument equal to the data frame next_sign.
Use the vector of labels you created as the cl argument.
# Load the 'class' package
# install.packages("class")
library(class)
signs<-read.csv("signs.csv")
signs
## sign_type r1 g1 b1 r2 g2 b2 r3 g3 b3 r4 g4 b4 r5 g5 b5
## 1 pedestrian 155 228 251 135 188 101 156 227 245 145 211 228 166 233 245
## 2 pedestrian 142 217 242 166 204 44 142 217 242 147 219 242 164 228 229
## 3 pedestrian 57 54 50 187 201 68 51 51 45 59 62 65 156 171 50
## 4 pedestrian 22 35 41 171 178 26 19 27 29 19 27 29 42 37 3
## 5 pedestrian 169 179 170 231 254 27 97 107 99 123 147 152 221 236 117
## 6 pedestrian 75 67 60 131 89 53 214 144 75 156 169 190 67 50 36
## 7 pedestrian 136 149 157 200 203 107 150 167 134 171 218 252 171 158 108
## 8 pedestrian 149 225 241 34 45 1 155 226 238 147 222 242 170 191 113
## 9 pedestrian 13 34 28 5 21 11 123 154 140 21 46 41 36 60 26
## 10 pedestrian 123 124 107 83 61 26 116 124 115 67 67 52 70 53 26
## 11 pedestrian 129 141 137 35 42 37 36 28 12 44 53 49 138 148 141
## 12 pedestrian 131 148 140 61 42 10 93 114 108 27 38 34 52 41 20
## 13 pedestrian 122 141 133 12 28 11 163 188 178 133 154 146 86 113 80
## 14 pedestrian 171 193 181 30 49 26 93 119 98 179 201 188 59 84 44
## 15 pedestrian 53 66 58 1 13 2 82 99 67 117 131 123 12 27 5
## 16 pedestrian 211 235 226 37 50 45 146 164 156 211 235 226 37 41 28
## 17 pedestrian 21 34 28 10 29 4 146 171 145 60 74 68 68 93 52
## 18 pedestrian 86 105 98 83 87 71 54 70 65 60 74 68 110 119 109
## 19 pedestrian 171 197 187 60 83 60 163 186 173 39 52 48 75 109 83
## 20 pedestrian 99 92 75 196 205 85 133 139 41 83 76 60 226 235 109
## 21 pedestrian 26 37 28 45 67 19 53 66 58 91 107 99 11 28 5
## 22 pedestrian 60 79 74 135 172 70 100 123 81 106 124 117 92 123 50
## 23 pedestrian 65 79 72 60 89 36 17 35 12 99 108 92 170 204 124
## 24 pedestrian 75 84 76 82 116 52 119 112 89 126 120 96 50 68 44
## 25 pedestrian 42 45 37 182 226 77 166 204 80 54 57 49 187 230 76
## 26 pedestrian 219 229 223 148 187 55 181 221 66 178 187 180 181 221 66
## 27 pedestrian 155 170 164 152 123 76 165 148 118 38 41 36 66 60 43
## 28 pedestrian 57 62 54 88 124 28 173 206 132 124 139 131 62 99 7
## 29 pedestrian 53 67 57 97 131 36 96 122 61 106 125 115 133 172 40
## 30 pedestrian 52 58 45 37 60 4 209 238 151 46 52 42 42 64 5
## 31 pedestrian 221 241 238 38 49 36 33 44 30 92 105 93 57 67 54
## 32 pedestrian 50 69 60 54 90 13 197 235 236 201 238 237 124 162 59
## 33 pedestrian 35 43 37 60 74 51 120 147 84 51 59 52 92 130 44
## 34 pedestrian 37 43 36 107 137 68 74 99 39 69 76 66 20 51 1
## 35 pedestrian 49 62 52 193 228 122 84 110 51 45 59 50 26 51 5
## 36 pedestrian 44 52 42 177 220 100 66 85 29 151 165 156 157 196 84
## 37 pedestrian 191 189 186 131 92 43 60 44 27 61 58 51 171 126 69
## 38 pedestrian 147 170 164 52 75 22 209 236 233 205 233 229 157 190 108
## 39 pedestrian 193 222 217 90 124 43 166 194 186 188 218 212 209 237 227
## 40 pedestrian 53 53 42 190 196 172 173 183 107 81 78 65 148 156 84
## 41 pedestrian 35 43 35 123 139 76 169 187 100 43 51 44 155 172 99
## 42 pedestrian 84 99 90 61 49 13 126 133 117 194 220 212 99 86 53
## 43 pedestrian 67 68 59 144 149 132 108 125 115 52 59 50 75 70 50
## 44 pedestrian 146 157 147 50 35 11 116 100 69 51 60 51 76 69 45
## 45 pedestrian 56 62 54 55 81 19 140 163 100 26 35 21 53 76 19
## 46 pedestrian 188 212 205 148 173 156 133 162 124 46 60 50 202 224 212
## 47 speed 76 82 60 204 212 187 226 229 196 99 105 83 186 197 179
## 48 speed 163 123 76 254 236 195 204 171 124 195 155 100 253 229 187
## 49 speed 102 113 118 94 105 110 83 93 98 83 81 76 109 121 126
## 50 speed 88 93 106 85 90 102 85 90 102 158 198 250 117 124 154
## 51 speed 98 110 121 98 110 121 30 41 50 28 29 29 83 94 105
## 52 speed 196 176 155 180 170 155 172 166 154 148 126 107 68 68 67
## 53 speed 139 125 116 187 181 178 204 194 187 58 54 52 51 42 36
## 54 speed 83 92 100 106 125 140 115 135 153 89 94 98 89 100 109
## 55 speed 99 105 118 94 100 113 86 92 106 41 38 42 44 51 67
## 56 speed 52 66 76 99 117 132 101 121 134 171 129 85 25 38 49
## 57 speed 29 33 36 94 106 116 107 116 125 36 28 23 27 30 34
## 58 speed 76 99 124 110 138 165 114 142 170 75 76 77 84 107 131
## 59 speed 234 254 236 234 254 236 228 251 229 86 99 90 222 243 226
## 60 speed 6 14 17 44 61 65 39 56 60 92 108 105 26 38 41
## 61 speed 68 91 90 68 91 90 77 98 98 29 33 30 27 41 38
## 62 speed 24 38 38 59 77 76 59 77 76 156 180 170 5 18 18
## 63 speed 44 60 59 109 132 130 109 132 130 202 230 227 109 132 130
## 64 speed 29 44 43 140 162 154 122 140 133 205 231 226 20 37 35
## 65 speed 42 46 37 187 196 186 155 164 155 187 196 186 140 148 139
## 66 speed 125 145 141 121 142 138 118 140 136 137 160 148 82 95 89
## 67 speed 108 131 126 116 137 133 118 140 137 36 52 50 161 180 173
## 68 speed 3 12 12 74 93 92 74 93 92 36 45 43 12 21 20
## 69 speed 36 51 51 68 90 90 66 87 88 52 65 61 52 65 61
## 70 speed 44 60 58 66 84 83 66 84 83 30 41 38 52 68 67
## 71 speed 170 207 208 166 185 173 113 127 115 179 228 234 162 196 196
## 72 speed 100 114 108 75 94 90 146 166 161 84 91 83 84 91 83
## 73 speed 116 122 108 122 129 116 60 65 54 85 97 86 179 189 178
## 74 speed 43 58 60 82 109 115 78 108 113 25 36 35 89 103 104
## 75 speed 68 82 84 114 140 148 109 138 147 75 82 77 37 43 42
## 76 speed 36 44 42 86 104 102 75 93 91 41 49 46 50 58 54
## 77 speed 44 44 43 127 136 140 117 126 129 82 82 77 52 53 50
## 78 speed 77 69 58 131 133 130 123 125 123 84 77 66 131 125 115
## 79 speed 156 170 165 229 234 218 234 237 220 59 58 45 130 135 123
## 80 speed 20 35 36 147 173 173 148 175 176 67 78 74 38 48 46
## 81 speed 27 36 35 117 134 136 123 138 139 91 101 100 106 117 116
## 82 speed 52 66 61 108 129 124 106 126 122 27 37 34 172 197 195
## 83 speed 67 82 77 117 138 132 110 131 125 195 217 213 13 28 27
## 84 speed 67 75 75 92 101 101 94 105 105 28 36 35 59 68 68
## 85 speed 59 77 75 94 115 113 97 117 115 166 196 196 12 27 27
## 86 speed 58 67 62 107 124 116 102 118 113 59 69 65 58 67 62
## 87 speed 203 202 197 210 213 211 213 217 214 109 92 82 211 210 205
## 88 speed 210 229 227 101 121 118 101 121 118 236 255 254 236 255 254
## 89 speed 214 233 234 92 107 107 46 58 58 65 77 77 140 155 155
## 90 speed 52 58 52 125 149 145 123 147 141 60 66 60 37 44 41
## 91 speed 67 78 73 148 178 172 148 178 172 35 46 43 57 67 61
## 92 speed 100 121 115 175 209 204 173 203 198 29 42 34 193 214 210
## 93 speed 53 66 59 142 170 169 159 189 184 53 65 54 43 52 43
## 94 speed 85 97 93 156 180 179 156 180 179 84 90 84 122 134 130
## 95 speed 141 162 157 149 173 169 146 170 164 125 145 141 100 114 109
## 96 stop 226 244 251 77 41 50 75 35 43 216 220 220 97 110 129
## 97 stop 163 164 165 62 19 28 62 19 28 43 46 50 76 94 115
## 98 stop 187 172 156 140 49 50 132 44 44 188 185 181 144 83 77
## 99 stop 160 164 163 163 42 45 178 49 51 190 218 244 188 172 154
## 100 stop 149 208 250 178 58 61 171 57 59 147 206 250 188 181 170
## 101 stop 148 179 186 195 60 53 180 58 51 128 172 188 216 160 140
## 102 stop 147 218 237 205 45 44 211 44 45 140 219 242 169 75 53
## 103 stop 195 242 230 46 37 34 46 37 34 188 237 228 58 70 65
## 104 stop 43 44 45 81 37 42 81 37 42 157 158 146 84 98 120
## 105 stop 149 180 162 46 54 57 44 50 52 124 155 140 59 83 83
## 106 stop 26 38 42 37 33 37 40 35 41 30 46 49 36 30 35
## 107 stop 146 137 132 92 37 49 99 41 52 84 69 59 131 141 158
## 108 stop 18 21 20 41 13 15 43 14 17 13 19 18 62 76 74
## 109 stop 36 43 28 213 89 85 227 76 76 72 77 53 236 254 252
## 110 stop 188 217 221 68 22 25 68 22 25 60 68 59 81 54 54
## 111 stop 191 218 221 82 26 27 82 26 27 106 114 110 115 133 131
## 112 stop 147 180 186 30 14 18 32 14 18 164 186 180 50 67 69
## 113 stop 67 68 43 218 96 87 229 82 75 83 80 51 242 196 184
## 114 stop 91 172 227 202 91 92 157 205 229 89 166 221 234 99 99
## 115 stop 35 45 42 77 20 21 74 15 17 91 108 107 92 87 88
## 116 stop 171 194 189 91 58 51 91 58 51 195 222 217 101 121 117
## 117 stop 46 54 49 83 34 36 83 34 36 42 44 38 114 126 121
## 118 stop 25 34 30 57 34 34 57 34 34 25 34 30 57 44 44
## 119 stop 90 98 92 116 37 38 116 37 38 186 205 202 106 30 29
## 120 stop 195 230 233 52 22 22 60 30 32 194 228 230 194 228 230
## 121 stop 44 50 44 85 38 40 85 41 42 60 66 60 132 146 140
## 122 stop 33 45 41 94 39 41 93 41 42 62 75 73 124 145 142
## 123 stop 91 106 100 77 42 43 77 42 43 122 141 139 107 125 123
## 124 stop 42 43 37 77 29 29 77 29 29 46 49 43 110 121 121
## 125 stop 21 33 28 91 42 43 85 35 36 203 226 222 164 186 181
## 126 stop 116 114 108 125 40 39 123 38 37 33 30 26 123 38 37
## 127 stop 180 182 185 133 33 33 109 28 27 139 137 134 131 60 60
## 128 stop 33 35 30 83 27 22 83 27 22 35 37 33 116 99 92
## 129 stop 27 28 20 61 11 11 53 10 10 92 106 101 66 14 16
## 130 stop 50 52 45 89 30 28 85 27 26 20 27 21 118 138 131
## 131 stop 55 64 55 83 25 23 83 25 23 163 185 180 92 74 69
## 132 stop 117 114 106 147 43 44 147 43 44 94 89 77 196 188 187
## 133 stop 179 195 188 67 22 24 65 21 23 28 35 28 106 115 109
## 134 stop 109 129 124 76 27 28 76 27 28 117 137 131 123 138 132
## 135 stop 26 30 21 93 29 27 93 29 27 147 163 156 139 156 149
## 136 stop 203 202 197 172 68 67 172 68 67 211 213 211 187 146 141
## 137 stop 130 131 108 236 69 75 236 75 82 194 210 204 236 69 75
## 138 stop 188 211 203 85 34 30 84 30 28 151 176 171 148 171 165
## 139 stop 173 206 211 149 76 68 156 83 75 168 198 201 141 159 148
## 140 stop 154 178 173 68 28 27 68 28 27 179 204 195 90 101 93
## 141 stop 217 238 234 101 37 36 124 41 38 213 237 233 147 158 149
## 142 stop 167 190 185 186 42 41 180 37 36 186 209 198 187 41 37
## 143 stop 83 98 84 171 70 51 172 74 53 179 204 197 173 196 187
## 144 stop 130 155 155 65 30 29 65 30 29 139 164 163 61 49 44
## 145 stop 74 78 65 145 56 55 124 57 54 204 235 234 125 34 34
## 146 stop 162 187 181 114 47 38 115 51 41 174 201 194 140 132 124
## r6 g6 b6 r7 g7 b7 r8 g8 b8 r9 g9 b9 r10 g10 b10 r11 g11
## 1 212 254 52 212 254 11 188 229 117 170 216 120 211 254 3 212 254
## 2 84 116 17 217 254 26 155 203 128 213 253 51 217 255 21 217 255
## 3 254 255 36 211 226 70 78 73 64 220 234 59 254 255 51 253 255
## 4 217 228 19 221 235 20 181 183 73 237 234 44 251 254 2 235 243
## 5 205 225 80 235 254 60 90 110 9 216 236 66 229 255 12 235 254
## 6 37 36 42 44 42 44 192 131 73 123 74 22 36 34 37 44 42
## 7 157 186 11 26 35 10 180 211 236 129 109 73 161 190 10 161 190
## 8 26 37 12 34 45 19 221 249 184 226 246 59 30 40 34 34 44
## 9 75 108 44 13 27 25 133 163 126 83 125 19 13 27 25 9 23
## 10 26 26 21 52 45 27 117 109 83 110 74 12 98 70 26 20 21
## 11 60 45 18 9 13 17 29 37 33 59 42 12 20 19 11 28 28
## 12 4 9 10 44 33 12 58 70 65 61 42 10 20 18 10 11 12
## 13 69 106 27 76 112 30 116 138 126 122 140 125 69 106 27 74 109
## 14 60 92 26 60 92 26 51 76 35 27 44 26 60 92 26 14 26
## 15 76 100 35 76 99 43 60 76 50 26 43 14 68 94 20 70 97
## 16 21 27 22 85 66 28 211 235 226 66 76 69 90 67 26 76 61
## 17 85 124 34 10 29 4 67 83 69 50 62 57 82 123 27 82 123
## 18 100 78 20 35 35 19 25 34 30 33 42 38 122 101 43 41 38
## 19 77 115 34 39 65 26 21 33 29 101 132 121 77 116 27 76 116
## 20 66 77 36 67 78 44 140 132 114 238 255 25 66 75 53 67 78
## 21 163 204 4 146 179 59 60 75 67 94 121 46 163 204 4 166 209
## 22 27 42 29 28 51 12 44 58 44 147 196 12 28 43 33 33 53
## 23 174 226 43 165 220 12 131 141 124 34 45 37 166 222 6 165 220
## 24 148 195 66 155 213 14 127 121 94 139 172 98 160 217 27 155 213
## 25 13 25 11 6 25 1 98 124 45 179 229 2 13 22 17 5 14
## 26 25 35 27 26 45 3 148 180 54 198 249 2 22 31 26 21 34
## 27 76 66 44 178 126 50 38 41 36 78 90 84 180 126 44 172 121
## 28 178 227 10 169 214 27 116 131 123 65 73 62 175 225 5 178 227
## 29 87 115 46 190 233 35 44 58 45 36 50 38 186 228 43 188 233
## 30 198 241 51 55 76 18 90 115 35 45 58 27 202 246 18 202 246
## 31 181 229 1 181 229 1 27 37 26 38 49 36 185 233 1 181 229
## 32 156 204 12 156 204 12 201 238 237 108 140 138 156 204 12 156 204
## 33 149 203 20 18 36 21 49 60 45 43 51 45 146 197 21 149 203
## 34 187 236 36 188 236 44 43 50 37 38 50 28 186 234 19 179 229
## 35 177 221 35 158 195 66 41 60 28 59 74 54 171 215 35 173 218
## 36 188 237 42 188 236 59 50 59 45 36 44 35 188 237 36 188 237
## 37 33 30 32 43 36 27 77 73 67 179 118 30 29 28 28 29 28
## 38 150 189 28 21 35 20 201 227 222 41 60 25 153 194 19 153 194
## 39 41 58 45 149 193 44 188 218 212 168 202 155 44 59 51 145 190
## 40 30 36 11 60 68 2 125 132 55 195 206 20 25 29 19 36 43
## 41 11 21 2 13 27 1 171 188 107 170 194 12 20 27 20 13 27
## 42 52 46 27 99 76 35 178 203 197 131 94 27 99 80 43 134 104
## 43 38 44 37 45 33 5 42 45 37 158 113 21 27 32 29 36 26
## 44 60 46 19 28 26 13 92 83 58 85 58 13 132 107 59 109 89
## 45 175 209 68 152 181 74 101 122 67 140 163 100 171 208 36 162 197
## 46 27 45 12 30 52 4 85 108 51 146 180 35 30 41 34 27 38
## 47 69 73 44 253 254 219 99 105 83 60 68 58 123 133 115 35 44
## 48 254 235 187 254 243 189 163 123 76 155 123 84 254 236 195 60 52
## 49 45 57 62 117 128 134 75 85 90 44 53 58 21 29 33 118 130
## 50 58 60 69 77 82 93 163 201 250 140 138 156 61 66 76 85 90
## 51 28 37 43 101 114 124 43 41 38 28 37 43 46 57 66 28 37
## 52 99 101 99 173 177 173 75 68 61 162 166 163 173 177 173 164 172
## 53 99 93 91 202 190 180 35 26 20 178 174 172 189 190 194 138 133
## 54 84 98 108 117 137 151 167 167 161 84 98 108 69 81 91 91 109
## 55 27 30 41 97 102 115 52 49 52 107 108 117 90 94 106 35 36
## 56 21 34 43 99 117 132 173 146 119 41 55 65 99 117 132 28 42
## 57 10 14 19 109 123 133 36 28 23 18 22 27 68 77 85 110 125
## 58 84 110 137 122 147 174 53 54 57 92 113 134 122 140 158 122 147
## 59 61 84 66 238 255 241 82 84 68 211 229 211 122 140 124 50 60
## 60 41 58 61 41 58 61 43 52 52 19 21 20 41 58 61 6 14
## 61 85 106 105 5 18 18 34 36 34 60 74 68 5 18 18 13 26
## 62 1 13 12 59 77 76 171 187 171 34 49 45 74 90 90 17 28
## 63 83 101 98 105 126 124 196 221 218 173 197 194 114 134 131 13 26
## 64 59 78 74 139 160 151 203 228 220 77 97 92 29 44 43 29 44
## 65 20 27 20 190 200 190 26 30 26 172 180 171 35 42 36 29 37
## 66 113 134 131 116 137 133 51 65 62 83 98 93 121 142 138 27 45
## 67 85 108 106 113 134 130 45 58 52 25 38 37 114 132 125 93 113
## 68 13 23 24 74 93 92 20 28 28 26 33 30 34 47 48 20 28
## 69 12 27 27 66 86 85 30 40 37 66 86 85 66 86 85 18 29
## 70 46 64 63 62 82 81 30 41 38 42 53 50 66 84 83 20 33
## 71 133 154 147 181 203 188 181 218 219 75 90 77 189 210 196 186 206
## 72 141 162 155 141 162 155 61 66 59 84 91 83 75 94 90 60 84
## 73 226 236 222 203 213 202 164 174 162 203 213 202 226 237 226 235 251
## 74 22 34 34 84 113 118 22 34 34 97 114 116 14 33 37 84 113
## 75 12 27 34 122 145 150 83 90 85 13 25 29 45 66 74 108 133
## 76 78 97 94 89 105 103 36 44 42 50 58 54 13 28 27 75 93
## 77 117 126 129 119 128 131 52 53 50 61 61 58 121 130 133 73 78
## 78 27 28 27 131 133 130 100 93 82 12 13 11 20 21 19 94 98
## 79 229 234 218 234 237 220 45 49 38 164 173 162 250 252 235 234 237
## 80 154 176 174 141 167 169 50 58 54 196 213 210 147 173 173 117 142
## 81 27 36 35 110 129 129 91 101 100 98 109 108 10 19 21 14 25
## 82 121 138 134 105 123 118 60 74 69 131 149 146 21 33 29 105 123
## 83 114 134 129 117 138 132 35 46 41 17 30 29 17 30 29 110 131
## 84 14 23 25 12 20 20 20 25 22 41 46 42 20 27 26 12 20
## 85 5 20 20 93 113 111 157 186 187 12 27 27 12 27 27 94 115
## 86 117 134 129 107 124 116 66 68 62 59 69 65 118 137 130 107 124
## 87 130 126 121 213 217 214 92 73 62 196 194 189 156 162 162 206 211
## 88 18 29 29 37 50 50 236 255 254 236 255 254 33 44 45 110 131
## 89 92 107 107 61 74 75 85 99 99 214 233 234 26 36 37 90 104
## 90 125 149 145 131 154 149 57 62 57 46 53 49 5 20 18 36 52
## 91 148 178 172 157 185 180 52 62 57 44 53 50 37 49 46 19 36
## 92 132 153 148 178 205 202 49 62 51 132 161 157 28 52 50 12 36
## 93 157 186 179 157 186 179 59 69 59 113 127 121 78 99 98 59 77
## 94 124 147 148 163 187 187 41 44 38 92 105 101 29 46 49 14 26
## 95 149 173 169 146 170 164 115 134 129 100 114 109 125 148 145 42 53
## 96 91 102 121 90 99 117 83 94 114 93 106 124 85 98 116 84 77
## 97 77 91 108 76 94 115 83 100 121 90 102 121 83 100 121 62 19
## 98 187 172 156 163 100 91 179 157 140 194 173 155 203 186 164 172 118
## 99 179 155 141 188 172 154 179 155 141 164 154 146 180 163 148 172 156
## 100 203 188 177 190 185 171 194 156 147 202 173 163 187 179 166 163 83
## 101 185 190 172 250 236 210 180 58 51 219 69 57 206 118 98 196 66
## 102 244 252 228 252 253 235 211 44 45 179 74 59 253 235 212 252 252
## 103 54 69 65 59 67 61 59 67 61 59 73 67 53 62 57 46 37
## 104 82 94 114 83 91 109 82 94 114 76 50 60 82 94 114 77 36
## 105 52 67 68 46 75 73 59 83 83 52 67 68 57 68 68 44 50
## 106 37 53 58 50 74 82 37 33 37 36 30 35 37 33 37 37 33
## 107 127 142 161 130 142 161 107 73 83 93 34 43 124 99 109 130 142
## 108 67 77 75 67 69 67 51 43 43 45 35 35 70 83 81 37 12
## 109 227 250 244 244 255 253 245 226 221 233 179 172 220 246 241 220 156
## 110 91 107 101 99 114 109 60 27 28 68 19 21 92 90 86 60 19
## 111 107 117 114 88 63 64 78 25 26 100 85 83 115 133 131 85 65
## 112 50 67 69 50 67 69 30 14 18 33 26 29 50 67 69 30 14
## 113 227 241 229 242 204 191 229 91 83 229 82 75 227 219 205 220 75
## 114 228 252 243 234 99 99 98 180 234 101 187 238 194 84 85 244 235
## 115 107 126 130 93 82 83 61 27 28 90 67 69 109 129 131 81 22
## 116 100 118 115 101 121 117 107 118 114 91 78 73 107 118 114 106 122
## 117 109 122 117 114 126 121 109 122 117 77 52 50 109 122 117 130 131
## 118 67 84 82 67 77 75 57 34 34 57 44 44 63 82 80 52 40
## 119 163 181 179 165 187 185 116 40 38 116 40 38 170 157 156 163 181
## 120 59 28 28 52 22 22 190 225 228 194 228 230 59 28 28 74 92
## 121 124 138 133 110 106 102 76 43 43 85 38 40 124 133 130 77 35
## 122 123 141 139 140 141 138 83 33 31 84 35 36 131 141 138 84 35
## 123 99 117 116 114 132 131 122 141 139 68 35 36 107 125 123 106 108
## 124 110 120 118 116 123 123 69 26 27 121 125 125 116 123 123 115 116
## 125 85 35 36 89 39 40 93 113 109 195 218 213 82 28 29 82 28
## 126 173 173 172 163 130 126 125 41 41 125 41 41 164 134 133 149 115
## 127 124 67 67 115 29 28 114 113 109 125 34 34 139 92 93 133 33
## 128 132 139 132 132 123 116 99 77 73 132 141 137 132 139 132 84 52
## 129 59 20 19 52 4 4 60 28 27 52 4 4 73 36 35 77 61
## 130 131 142 137 123 139 133 117 135 129 75 36 34 125 131 124 85 27
## 131 118 139 132 131 147 140 85 27 25 85 27 25 122 140 133 96 70
## 132 189 189 188 196 194 194 188 156 154 196 188 187 189 189 188 188 164
## 133 99 115 109 102 121 115 91 77 74 91 90 85 107 118 113 85 78
## 134 123 138 132 115 131 125 99 85 82 75 49 46 116 134 129 109 129
## 135 139 158 153 163 171 164 165 169 158 75 28 26 156 170 164 93 75
## 136 203 197 195 203 189 187 164 98 92 178 76 73 210 196 188 211 189
## 137 243 220 219 236 73 78 162 174 163 243 225 222 236 69 75 245 245
## 138 92 67 59 90 37 34 163 187 180 155 179 173 90 37 34 85 34
## 139 157 174 162 162 174 162 163 178 164 163 178 164 149 163 149 163 171
## 140 93 107 98 93 107 98 99 109 100 99 100 92 86 100 91 97 92
## 141 164 181 170 164 181 170 164 140 130 187 181 170 164 181 170 166 185
## 142 235 246 234 244 243 229 156 61 50 193 43 38 250 246 233 201 150
## 143 187 198 187 180 197 187 180 195 182 204 211 199 203 204 194 171 131
## 144 70 82 73 75 83 75 60 36 33 75 83 75 70 82 73 60 36
## 145 166 193 187 130 36 36 147 164 156 147 164 156 123 29 30 116 27
## 146 146 149 140 111 81 68 109 49 37 109 49 37 153 165 153 106 58
## b11 r12 g12 b12 r13 g13 b13 r14 g14 b14 r15 g15 b15 r16 g16 b16
## 1 19 172 235 244 172 235 244 172 228 235 177 235 244 22 52 53
## 2 21 158 225 237 164 227 237 182 228 143 171 228 196 164 227 237
## 3 44 66 68 68 69 65 59 76 84 22 82 93 17 58 60 60
## 4 12 19 27 29 20 29 34 64 61 4 211 222 78 19 27 29
## 5 60 163 168 152 124 117 91 188 205 78 125 147 20 160 183 187
## 6 44 197 114 21 171 102 26 197 114 21 123 74 22 180 107 26
## 7 6 187 215 236 141 142 140 189 171 140 214 221 201 188 211 227
## 8 35 241 255 54 205 229 46 226 246 59 235 252 67 237 254 53
## 9 18 85 128 21 83 125 19 85 128 21 85 128 21 83 125 19
## 10 20 113 76 14 106 69 9 102 67 6 106 69 9 43 29 11
## 11 19 59 42 12 59 42 12 59 42 12 55 41 11 60 45 18
## 12 9 61 42 10 61 42 10 61 42 10 58 39 6 58 39 6
## 13 28 178 198 186 125 146 139 145 166 156 133 153 141 25 37 29
## 14 26 147 174 169 157 180 170 164 188 178 132 156 133 157 180 170
## 15 25 77 90 82 9 21 13 17 28 22 20 35 11 115 129 118
## 16 29 38 54 49 50 59 54 60 65 54 85 99 92 50 59 54
## 17 27 109 131 123 41 53 45 37 50 43 21 34 28 65 77 69
## 18 19 25 34 30 33 42 38 51 62 57 27 37 34 22 33 29
## 19 22 21 33 29 59 75 69 49 69 50 27 44 27 21 33 29
## 20 44 244 255 18 244 255 52 242 254 27 244 255 18 248 255 44
## 21 3 37 53 28 38 49 41 173 203 115 19 36 10 45 58 50
## 22 18 147 196 12 149 196 35 147 196 12 147 196 12 146 195 5
## 23 12 106 115 99 38 50 44 29 57 18 134 146 122 38 50 44
## 24 19 115 108 85 83 107 100 98 126 78 60 80 47 77 83 67
## 25 9 175 227 2 184 233 2 179 229 2 179 229 2 179 229 2
## 26 20 209 253 20 198 247 2 198 249 2 203 250 4 204 250 11
## 27 45 51 45 27 179 194 188 171 154 123 46 49 44 59 57 45
## 28 5 107 125 97 108 123 115 119 146 72 42 56 36 108 123 115
## 29 18 48 67 36 34 45 37 75 90 76 41 53 43 34 45 37
## 30 18 44 50 38 46 52 42 59 75 27 46 57 36 49 54 42
## 31 1 33 44 30 43 53 42 41 52 38 123 139 132 27 37 26
## 32 6 197 235 236 154 189 186 53 77 74 44 65 54 92 124 122
## 33 13 35 53 28 43 51 45 53 69 44 54 64 58 45 54 49
## 34 12 44 55 28 69 76 66 107 137 68 43 50 37 49 54 45
## 35 34 58 84 28 33 46 38 52 75 27 51 65 54 41 54 44
## 36 36 50 59 45 28 36 29 79 100 34 35 42 30 58 67 52
## 37 28 180 120 35 179 118 30 173 115 29 173 115 29 180 120 35
## 38 19 135 158 153 44 58 51 26 45 11 164 197 131 53 72 66
## 39 36 188 218 212 188 218 212 149 193 44 185 214 209 188 218 212
## 40 2 196 210 14 197 210 19 197 211 27 201 214 36 201 214 27
## 41 1 165 189 11 170 194 12 165 188 19 170 194 19 170 194 12
## 42 43 139 101 34 131 94 27 140 105 42 141 105 36 139 99 29
## 43 3 162 115 21 162 115 21 162 115 21 158 114 26 162 115 26
## 44 52 118 89 27 123 90 27 123 90 27 123 92 34 118 85 21
## 45 26 19 35 3 60 67 59 56 76 26 28 36 28 28 36 28
## 46 29 146 181 29 146 179 50 146 180 43 146 180 43 146 180 43
## 47 36 100 97 62 27 38 33 252 253 213 212 219 189 124 122 92
## 48 35 189 155 107 107 75 35 244 213 172 253 227 179 228 201 163
## 49 137 44 53 58 27 34 37 102 113 118 99 109 115 59 66 69
## 50 102 156 186 228 147 165 203 86 92 105 85 90 102 30 33 42
## 51 43 92 90 85 82 84 84 106 118 129 106 118 129 84 89 92
## 52 171 123 126 123 91 89 85 164 172 171 173 177 173 130 125 116
## 53 131 158 153 149 76 75 75 172 173 177 172 171 173 139 138 139
## 54 124 108 109 107 91 109 124 122 141 156 125 146 161 91 84 69
## 55 44 213 212 218 44 41 44 99 105 118 105 110 123 50 54 66
## 56 51 99 91 83 107 107 104 101 121 134 101 121 134 123 115 104
## 57 137 53 42 35 12 20 27 115 129 140 115 129 140 116 104 91
## 58 174 74 82 92 67 83 100 122 149 178 125 153 179 89 106 124
## 59 51 44 50 37 204 225 205 228 251 229 228 251 229 43 53 44
## 60 17 19 21 20 20 27 27 36 50 52 41 58 61 11 18 20
## 61 26 34 36 34 170 189 174 68 91 90 85 106 105 29 33 30
## 62 28 177 190 173 163 182 170 59 77 76 59 77 76 181 198 186
## 63 27 99 117 115 58 70 66 101 124 122 101 124 122 53 65 61
## 64 43 203 228 220 137 157 150 138 159 152 140 162 154 172 194 188
## 65 34 90 93 85 50 53 45 185 194 182 179 189 179 98 101 93
## 66 43 102 124 121 35 52 50 106 126 123 108 130 126 82 95 89
## 67 109 36 50 45 34 45 37 110 132 129 108 131 126 49 61 52
## 68 28 28 36 34 44 57 54 82 101 99 74 93 92 24 30 29
## 69 30 34 45 42 68 90 90 68 90 90 68 90 90 27 37 34
## 70 31 34 45 42 34 45 42 66 84 83 62 81 78 34 45 42
## 71 193 130 157 155 57 70 54 185 205 190 181 203 188 67 79 72
## 72 82 86 96 87 99 108 100 141 162 155 141 162 155 66 69 61
## 73 244 205 217 206 156 169 158 226 237 226 226 237 226 202 210 197
## 74 118 29 42 43 50 66 68 82 109 115 78 108 113 25 36 35
## 75 141 41 46 44 33 38 36 114 140 148 114 140 148 44 49 46
## 76 91 41 49 46 56 63 59 75 93 91 75 93 91 36 44 42
## 77 81 61 61 58 69 69 66 114 123 125 111 120 123 68 66 61
## 78 97 74 65 54 147 149 146 138 138 134 138 138 134 82 73 62
## 79 220 27 30 20 204 210 196 228 233 214 234 237 220 83 82 69
## 80 144 58 66 62 162 181 180 147 173 173 147 173 173 139 157 155
## 81 27 33 41 38 62 72 70 108 125 125 114 130 131 66 73 70
## 82 118 37 49 45 52 66 61 108 129 124 108 129 124 67 78 74
## 83 125 89 103 98 27 38 33 110 131 125 110 133 129 36 50 44
## 84 20 37 41 37 36 44 43 106 116 116 106 116 116 29 33 29
## 85 113 190 218 217 46 65 63 93 113 111 93 113 111 36 49 45
## 86 116 74 83 75 58 67 62 114 131 124 107 124 116 69 82 75
## 87 210 100 84 74 147 139 131 202 206 203 202 206 203 92 76 66
## 88 128 226 250 246 226 250 246 105 125 123 105 125 123 204 233 230
## 89 103 59 66 62 59 66 62 36 47 48 92 107 107 173 190 192
## 90 50 49 54 49 28 36 34 123 147 141 123 147 141 37 44 41
## 91 35 49 59 54 44 53 50 142 171 164 139 164 157 49 59 54
## 92 34 33 45 36 20 44 42 173 203 198 178 205 202 19 30 25
## 93 74 51 61 51 206 221 213 157 186 179 157 186 179 51 61 51
## 94 27 45 50 43 61 66 59 156 180 179 156 180 179 84 90 84
## 95 52 82 94 89 73 83 77 146 170 164 155 178 172 85 98 92
## 96 90 68 30 38 219 218 211 75 35 43 75 35 43 50 49 45
## 97 28 68 26 33 58 53 51 65 19 28 65 19 28 38 41 45
## 98 107 124 43 44 189 154 113 156 45 45 117 38 40 147 122 99
## 99 145 156 37 42 202 166 125 76 24 30 91 25 30 166 171 173
## 100 79 171 51 52 163 217 252 164 43 45 171 43 46 201 163 113
## 101 56 156 53 47 157 210 228 235 76 59 196 66 56 116 131 128
## 102 229 211 44 45 162 219 227 211 44 45 211 44 45 124 154 150
## 103 34 46 37 34 171 204 180 46 37 34 46 37 34 218 251 228
## 104 41 75 29 34 51 51 51 74 26 30 74 26 30 51 51 51
## 105 52 49 51 52 124 140 115 43 43 44 42 37 37 79 96 77
## 106 37 36 30 35 28 43 45 36 30 35 36 30 35 28 43 45
## 107 161 98 29 34 67 53 44 98 29 34 97 27 30 59 45 36
## 108 14 44 17 19 52 52 51 44 17 19 44 17 19 18 21 20
## 109 151 227 83 83 66 70 58 210 81 77 227 76 76 42 46 34
## 110 20 68 19 21 92 99 93 68 19 21 68 19 21 135 169 177
## 111 62 82 26 27 140 157 155 76 21 22 82 26 27 67 75 68
## 112 18 30 14 18 44 49 38 30 14 18 30 14 18 101 122 116
## 113 68 228 76 69 67 68 52 198 68 65 226 60 58 36 38 26
## 114 228 225 142 138 116 193 238 245 251 244 221 85 85 100 168 212
## 115 22 83 26 26 75 86 82 82 25 23 83 26 26 75 83 76
## 116 118 90 53 46 67 85 82 90 53 46 90 53 46 125 145 142
## 117 126 77 28 29 36 42 36 77 28 29 77 28 29 28 34 29
## 118 38 57 34 34 25 34 30 57 31 30 57 34 34 123 151 145
## 119 179 116 37 38 33 38 34 116 37 38 116 37 38 38 45 41
## 120 91 186 220 222 205 237 236 66 30 29 67 32 30 205 237 236
## 121 36 83 36 37 132 146 140 83 36 37 83 36 37 116 123 117
## 122 36 97 41 43 93 89 75 93 36 37 93 36 37 125 147 146
## 123 107 77 42 43 76 90 76 75 37 38 75 37 38 44 52 44
## 124 116 81 33 34 38 41 35 77 29 29 77 29 29 51 52 44
## 125 29 29 38 33 75 86 81 85 35 36 91 42 43 115 131 126
## 126 115 123 38 37 82 76 68 125 41 41 123 38 37 136 134 129
## 127 33 51 51 43 50 50 38 137 34 34 133 33 33 42 41 30
## 128 50 85 29 26 27 30 26 92 35 33 91 33 29 30 33 28
## 129 59 77 57 55 27 28 20 65 12 12 65 12 12 13 18 9
## 130 26 85 27 26 25 30 25 83 25 23 82 23 21 52 49 37
## 131 65 85 27 25 46 58 50 88 29 27 85 27 25 107 124 116
## 132 162 147 43 44 94 89 77 147 43 44 147 43 44 86 81 69
## 133 73 67 22 24 146 158 150 68 25 26 68 25 26 68 82 76
## 134 124 76 27 28 36 43 35 76 27 28 76 27 28 178 203 197
## 135 71 93 29 27 45 51 42 98 32 30 97 30 29 61 67 58
## 136 187 172 68 67 99 99 83 165 61 60 177 74 70 117 117 107
## 137 244 236 241 237 115 114 85 233 65 70 230 58 63 115 116 92
## 138 30 99 115 108 107 126 121 97 43 36 88 35 31 99 118 114
## 139 157 154 80 71 129 131 110 155 89 79 156 100 90 195 197 171
## 140 84 68 28 27 76 88 79 68 28 27 68 28 27 82 93 84
## 141 172 129 44 42 179 205 202 124 41 38 124 41 38 50 58 46
## 142 138 180 37 36 163 187 181 194 44 40 186 38 36 188 211 202
## 143 115 171 70 51 59 61 43 177 76 54 172 74 53 51 53 35
## 144 33 62 27 26 52 57 44 65 30 29 65 30 29 92 101 81
## 145 26 174 206 209 74 78 65 125 34 34 130 36 36 109 118 106
## 146 46 114 49 38 198 220 209 115 51 41 115 51 41 218 238 226
next_sign<-read.csv("next_sign.csv")
next_sign
## r1 g1 b1 r2 g2 b2 r3 g3 b3 r4 g4 b4 r5 g5 b5 r6 g6 b6 r7
## 1 204 227 220 196 59 51 202 67 59 204 227 220 236 250 234 242 252 235 205
## g7 b7 r8 g8 b8 r9 g9 b9 r10 g10 b10 r11 g11 b11 r12 g12 b12 r13 g13
## 1 148 131 190 50 43 179 70 57 242 229 212 190 50 43 193 51 44 170 197
## b13 r14 g14 b14 r15 g15 b15 r16 g16 b16
## 1 196 190 50 43 190 47 41 165 195 196
# Create a vector of labels
sign_types <- signs$sign_type
# Classify the next sign observed
knn(train = signs[,-1], test = next_sign, cl = sign_types)
## [1] stop
## Levels: pedestrian speed stop
With your help, the test car successfully identified the sign and stopped safely at the intersection.
How did the knn() function correctly classify the stop sign?
Answer the question
50 XP
Possible Answers
It learned that stop signs are red
press 1
The sign was in some way similar to another stop sign
press 2 [ans]
Stop signs have eight sides
press 3
The other types of signs were less likely
press 4
To better understand how the knn() function was able to classify the stop sign, it may help to examine the training dataset it used.
Each previously observed street sign was divided into a 4x4 grid, and the red, green, and blue level for each of the 16 center pixels is recorded as illustrated here.
Stop Sign Data Encoding
The result is a dataset that records the sign_type as well as 16 x 3 = 48 color properties of each sign.
Instructions
100 XP
Use the str() function to examine the signs dataset.
Use table() to count the number of observations of each sign type by passing it the column containing the labels.
Run the provided aggregate() command to see whether the average red level might vary by sign type.
# Examine the structure of the signs dataset
str(signs)
## 'data.frame': 146 obs. of 49 variables:
## $ sign_type: Factor w/ 3 levels "pedestrian","speed",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ r1 : int 155 142 57 22 169 75 136 149 13 123 ...
## $ g1 : int 228 217 54 35 179 67 149 225 34 124 ...
## $ b1 : int 251 242 50 41 170 60 157 241 28 107 ...
## $ r2 : int 135 166 187 171 231 131 200 34 5 83 ...
## $ g2 : int 188 204 201 178 254 89 203 45 21 61 ...
## $ b2 : int 101 44 68 26 27 53 107 1 11 26 ...
## $ r3 : int 156 142 51 19 97 214 150 155 123 116 ...
## $ g3 : int 227 217 51 27 107 144 167 226 154 124 ...
## $ b3 : int 245 242 45 29 99 75 134 238 140 115 ...
## $ r4 : int 145 147 59 19 123 156 171 147 21 67 ...
## $ g4 : int 211 219 62 27 147 169 218 222 46 67 ...
## $ b4 : int 228 242 65 29 152 190 252 242 41 52 ...
## $ r5 : int 166 164 156 42 221 67 171 170 36 70 ...
## $ g5 : int 233 228 171 37 236 50 158 191 60 53 ...
## $ b5 : int 245 229 50 3 117 36 108 113 26 26 ...
## $ r6 : int 212 84 254 217 205 37 157 26 75 26 ...
## $ g6 : int 254 116 255 228 225 36 186 37 108 26 ...
## $ b6 : int 52 17 36 19 80 42 11 12 44 21 ...
## $ r7 : int 212 217 211 221 235 44 26 34 13 52 ...
## $ g7 : int 254 254 226 235 254 42 35 45 27 45 ...
## $ b7 : int 11 26 70 20 60 44 10 19 25 27 ...
## $ r8 : int 188 155 78 181 90 192 180 221 133 117 ...
## $ g8 : int 229 203 73 183 110 131 211 249 163 109 ...
## $ b8 : int 117 128 64 73 9 73 236 184 126 83 ...
## $ r9 : int 170 213 220 237 216 123 129 226 83 110 ...
## $ g9 : int 216 253 234 234 236 74 109 246 125 74 ...
## $ b9 : int 120 51 59 44 66 22 73 59 19 12 ...
## $ r10 : int 211 217 254 251 229 36 161 30 13 98 ...
## $ g10 : int 254 255 255 254 255 34 190 40 27 70 ...
## $ b10 : int 3 21 51 2 12 37 10 34 25 26 ...
## $ r11 : int 212 217 253 235 235 44 161 34 9 20 ...
## $ g11 : int 254 255 255 243 254 42 190 44 23 21 ...
## $ b11 : int 19 21 44 12 60 44 6 35 18 20 ...
## $ r12 : int 172 158 66 19 163 197 187 241 85 113 ...
## $ g12 : int 235 225 68 27 168 114 215 255 128 76 ...
## $ b12 : int 244 237 68 29 152 21 236 54 21 14 ...
## $ r13 : int 172 164 69 20 124 171 141 205 83 106 ...
## $ g13 : int 235 227 65 29 117 102 142 229 125 69 ...
## $ b13 : int 244 237 59 34 91 26 140 46 19 9 ...
## $ r14 : int 172 182 76 64 188 197 189 226 85 102 ...
## $ g14 : int 228 228 84 61 205 114 171 246 128 67 ...
## $ b14 : int 235 143 22 4 78 21 140 59 21 6 ...
## $ r15 : int 177 171 82 211 125 123 214 235 85 106 ...
## $ g15 : int 235 228 93 222 147 74 221 252 128 69 ...
## $ b15 : int 244 196 17 78 20 22 201 67 21 9 ...
## $ r16 : int 22 164 58 19 160 180 188 237 83 43 ...
## $ g16 : int 52 227 60 27 183 107 211 254 125 29 ...
## $ b16 : int 53 237 60 29 187 26 227 53 19 11 ...
# Count the number of signs of each type
table(signs$sign_type)
##
## pedestrian speed stop
## 46 49 51
# Check r10's average red level by sign type
aggregate(r10 ~ sign_type, data = signs, mean)
## sign_type r10
## 1 pedestrian 113.71739
## 2 speed 80.63265
## 3 stop 132.39216
Now that the autonomous vehicle has successfully stopped on its own, your team feels confident allowing the car to continue the test course.
The test course includes 59 additional road signs divided into three types:
Stop Sign
Pedestrian Sign
Speed Limit Sign
At the conclusion of the trial, you are asked to measure the car’s overall performance at recognizing these signs.
Instructions
100 XP
The class package and the dataset signs are already loaded in your workspace. So is the dataframe test_signs, which holds a set of observations you’ll test your model on.
Classify the test_signs data using knn().
Set train equal to the observations in signs without labels.
Use test_signs for the test argument, again without labels.
For the cl argument, use the vector of labels provided for you.
Use table() to explore the classifier’s performance at identifying the three sign types.
Create the vector signs_actual by extracting the labels from test_signs.
Pass the vector of predictions and the vector of actual signs to table() to cross tabulate them.
Compute the overall accuracy of the kNN learner using the mean() function.
# Loading test_signs dataframe
test_signs<-read.csv("test_signs.csv")
# Use kNN to identify the test road signs
sign_types <- signs$sign_type
signs_pred <- knn(train = signs[,-1], test = test_signs[,-1], cl = sign_types)
# Create a confusion matrix of the actual versus predicted values
signs_actual <- test_signs$sign_type
table(signs_pred, signs_actual )
## signs_actual
## signs_pred pedestrian speed stop
## pedestrian 19 2 0
## speed 0 17 0
## stop 0 2 19
# Compute the accuracy
mean(signs_pred == signs_actual)
## [1] 0.9322034
There is a complex relationship between k and classification accuracy. Bigger is not always better.
Which of these is a valid reason for keeping k as small as possible (but no smaller)?
Answer the question
50 XP
Possible Answers
A smaller k requires less processing power
press 1
A smaller k reduces the impact of noisy data
press 2
A smaller k minimizes the chance of a tie vote
press 3
A smaller k may utilize more subtle patterns
press 4 [ans]
By default, the knn() function in the class package uses only the single nearest neighbor.
Setting a k parameter allows the algorithm to consider additional nearby neighbors. This enlarges the collection of neighbors which will vote on the predicted class.
Compare k values of 1, 7, and 15 to examine the impact on traffic sign classification accuracy.
Instructions
100 XP
The class package is already loaded in your workspace along with the datasets signs, signs_test, and sign_types. The object signs_actual holds the true values of the signs.
Compute the accuracy of the default k = 1 model using the given code.
Modify the knn() function call by setting k = 7.
Revise the code once more by setting k = 15 and compare the three accuracy values.
# Loading test_signs dataframe
signs_test<-read.csv("test_signs.csv")
# Compute the accuracy of the baseline model (default k = 1)
k_1 <- knn(train = signs[,-1], test = signs_test[,-1], cl = sign_types)
k_1
## [1] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [7] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [13] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [19] pedestrian stop pedestrian speed speed speed
## [25] speed speed speed stop pedestrian speed
## [31] speed speed speed speed speed speed
## [37] speed speed speed speed stop stop
## [43] stop stop stop stop stop stop
## [49] stop stop stop stop stop stop
## [55] stop stop stop stop stop
## Levels: pedestrian speed stop
mean(k_1 == signs_test[,1])
## [1] 0.9322034
# Create a confusion matrix of the actual versus predicted values
signs_actual <- test_signs$sign_type
table(k_1, signs_actual)
## signs_actual
## k_1 pedestrian speed stop
## pedestrian 19 2 0
## speed 0 17 0
## stop 0 2 19
# Modify the above to set k = 7
k_7 <- knn(train = signs[,-1], test = signs_test[,-1], cl = sign_types,k=7)
k_7
## [1] pedestrian pedestrian pedestrian stop pedestrian pedestrian
## [7] pedestrian pedestrian speed pedestrian pedestrian pedestrian
## [13] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [19] pedestrian speed speed speed speed speed
## [25] speed speed speed stop speed speed
## [31] speed speed speed speed speed speed
## [37] speed speed speed speed stop stop
## [43] stop stop stop stop stop stop
## [49] stop stop stop stop stop stop
## [55] stop stop stop stop stop
## Levels: pedestrian speed stop
mean(k_7 == signs_test[,1])
## [1] 0.9491525
table(k_7, signs_actual)
## signs_actual
## k_7 pedestrian speed stop
## pedestrian 17 0 0
## speed 1 20 0
## stop 1 1 19
# Set k = 15 and compare to the above
k_15 <- knn(train = signs[,-1], test = signs_test[,-1], cl = sign_types,k=15)
k_15
## [1] pedestrian stop pedestrian stop pedestrian stop
## [7] pedestrian pedestrian pedestrian pedestrian pedestrian pedestrian
## [13] pedestrian speed pedestrian pedestrian pedestrian pedestrian
## [19] stop speed speed speed speed speed
## [25] speed speed speed stop speed speed
## [31] speed speed speed speed speed speed
## [37] speed speed speed speed stop stop
## [43] stop stop stop stop stop stop
## [49] stop stop stop stop stop stop
## [55] stop stop stop stop stop
## Levels: pedestrian speed stop
mean(k_15 == signs_test[,1])
## [1] 0.8983051
table(k_15, signs_actual)
## signs_actual
## k_15 pedestrian speed stop
## pedestrian 14 0 0
## speed 1 20 0
## stop 4 1 19
When multiple nearest neighbors hold a vote, it can sometimes be useful to examine whether the voters were unanimous or widely separated.
For example, knowing more about the voters’ confidence in the classification could allow an autonomous vehicle to use caution in the case there is any chance at all that a stop sign is ahead.
In this exercise, you will learn how to obtain the voting results from the knn() function.
Instructions
100 XP
The class package has already been loaded in your workspace along with the dataset signs.
Build a kNN model with the prob = TRUE parameter to compute the vote proportions. Set k = 7.
Use the attr() function to obtain the vote proportions for the predicted class. These are stored in the attribute “prob”.
Examine the first several vote outcomes and percentages using the head() function to see how the confidence varies from sign to sign.
Before applying kNN to a classification task, it is common practice to rescale the data using a technique like min-max normalization. What is the purpose of this step?
Answer the question
50 XP
Possible Answers
To ensure all data elements may contribute equal shares to distance.
press 1 [ans]
To help the kNN algorithm converge on a solution faster.
press 2
To convert all of the data elements to numbers.
press 3
To redistribute the data as a normal bell curve.
press 4
Naive Bayes uses principles from the field of statistics to make predictions. This chapter will introduce the basics of Bayesian methods while exploring how to apply these techniques to iPhone-like destination suggestions.
The where9am data frame contains 91 days (thirteen weeks) worth of data in which Brett recorded his location at 9am each day as well as whether the daytype was a weekend or weekday.
Using the conditional probability formula below, you can compute the probability that Brett is working in the office, given that it is a weekday.
P(A|B)=P(A and B)P(B) Calculations like these are the basis of the Naive Bayes destination prediction model you’ll develop in later exercises.
Instructions
100 XP
Find P(office) using nrow() and subset() to count rows in the dataset and save the result as p_A.
Find P(weekday), using nrow() and subset() again, and save the result as p_B.
Use nrow() and subset() a final time to find P(office and weekday). Save the result as p_AB.
Compute P(office | weekday) and save the result as p_A_given_B.
Print the value of p_A_given_B.
# Loading test_signs dataframe
where9am<-read.csv("where9am.csv")
where9am$daytype <- factor(where9am$daytype)
# Compute P(A)
p_A <- nrow(subset(where9am, location == "office")) / 91
# Compute P(B)
p_B <- nrow(subset(where9am, daytype == "weekday")) / 91
# Compute the observed P(A and B)
p_AB <- nrow(subset(where9am, where9am$location == "office" & where9am$daytype == "weekday")) / 91
# Compute P(A | B) and print its value
p_A_given_B <- p_AB / p_B
p_A_given_B
## [1] 0.6
In the previous exercise, you found that there is a 55% chance Brett is in the office at 9am given that it is a weekday. On the other hand, if Brett is never in the office on a weekend, which of the following is/are true?
Answer the question
50 XP
Possible Answers
P(office and weekend) = 0.
press 1
P(office | weekend) = 0.
press 2
Brett’s location is dependent on the day of the week.
press 3
All of the above.
press 4
The previous exercises showed that the probability that Brett is at work or at home at 9am is highly dependent on whether it is the weekend or a weekday.
To see this finding in action, use the where9am data frame to build a Naive Bayes model on the same data.
You can then use this model to predict the future: where does the model think that Brett will be at 9am on Thursday and at 9am on Saturday?
Instructions
100 XP
The dataframe where9am is available in your workspace. This dataset contains information about Brett’s location at 9am on different days.
Load the naivebayes package.
Use naive_bayes() with a formula like y ~ x to build a model of location as a function of daytype.
Forecast the Thursday 9am location using predict() with the thursday9am object as the newdata argument.
Do the same for predicting the saturday9am location.
# Load the naivebayes package
# install.packages("e1071", repos = "https://cran.rstudio.com")
library(e1071)
thursday9am<-read.csv("thursday9am.csv")
thursday9am$daytype <- factor(thursday9am$daytype)
thursday9am
## X daytype
## 1 1 weekday
saturday9am<-read.csv("saturday9am.csv")
saturday9am$daytype <- factor(saturday9am$daytype)
saturday9am
## X daytype
## 1 1 weekend
# Build the location prediction model
locmodel <- naiveBayes(location ~ daytype, data = where9am)
# Predict Thursday's 9am location
predict(locmodel, thursday9am)
## [1] office
## Levels: appointment campus home office
# Predict Saturdays's 9am location
predict(locmodel, saturday9am)
## [1] home
## Levels: appointment campus home office
The naivebayes package offers several ways to peek inside a Naive Bayes model.
Typing the name of the model object provides the a priori (overall) and conditional probabilities of each of the model’s predictors. If one were so inclined, you might use these for calculating posterior (predicted) probabilities by hand.
Alternatively, R will compute the posterior probabilities for you if the type = “prob” parameter is supplied to the predict() function.
Using these methods, examine how the model’s predicted 9am location probability varies from day-to-day.
Instructions
100 XP
The model locmodel that you fit in the previous exercise is in your workspace.
Print the locmodel object to the console to view the computed a priori and conditional probabilities.
Use the predict() function similarly to the previous exercise, but with type = “prob” to see the predicted probabilities for Thursday at 9am.
Compare these to the predicted probabilities for Saturday at 9am.
# The 'naivebayes' package is loaded into the workspace
# and the Naive Bayes 'locmodel' has been built
# Examine the location prediction model
print(locmodel)
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## appointment campus home office
## 0.01098901 0.10989011 0.45054945 0.42857143
##
## Conditional probabilities:
## daytype
## Y weekday weekend
## appointment 1.0000000 0.0000000
## campus 1.0000000 0.0000000
## home 0.3658537 0.6341463
## office 1.0000000 0.0000000
# Obtain the predicted probabilities for Thursday at 9am
predict(locmodel, thursday9am, type ="raw")
## appointment campus home office
## [1,] 0.01538462 0.1538462 0.2307692 0.6
# type: predict naiveBayes in actual RStudio takes on either "raw" or "type" (default), whereas Datacamp uses "probs"
# Obtain the predicted probabilities for Saturday at 9am
predict(locmodel, saturday9am, type = "raw")
## appointment campus home office
## [1,] 3.838772e-05 0.0003838772 0.9980806 0.001497121
# type: predict naiveBayes in actual RStudio takes on either "raw" or "type" (default), whereas Datacamp uses "probs"
Understanding the idea of event independence will become important as you learn more about how “naive” Bayes got its name. Which of the following is true about independent events?
Answer the question
50 XP
Possible Answers
The events cannot occur at the same time.
press 1
A Venn diagram will always show no intersection.
press 2
Knowing the outcome of one event does not help predict the other.
press 3
At least one of the events is completely random.
press 4
The Naive Bayes algorithm got its name because it makes a “naive” assumption about event independence.
What is the purpose of making this assumption?
Answer the question
50 XP
Possible Answers
Independent events can never have a joint probability of zero.
press 1
The joint probability calculation is simpler for independent events.
press 2 [ans]
Conditional probability is undefined for dependent events.
press 3
Dependent events cannot be used to make predictions.
press 4
The locations dataset records Brett’s location every hour for 13 weeks. Each hour, the tracking information includes the daytype (weekend or weekday) as well as the hourtype (morning, afternoon, evening, or night).
Using this data, build a more sophisticated model to see how Brett’s predicted location not only varies by the day of week but also by the time of day.
Instructions
100 XP
The dataset locations is already loaded in your workspace.
Use the R formula interface to build a model where location depends on both daytype and hourtype. Recall that the function naive_bayes() takes 2 arguments: formula and data.
Predict Brett’s location on a weekday afternoon using the dataframe weekday_afternoon and the predict() function.
Do the same for a weekday_evening.
# The 'naivebayes' package is loaded into the workspace already
weekday_afternoon<-read.csv("weekday_afternoon.csv")
# weekday_afternoon$daytype <- factor(thursday9am$daytype)
weekday_afternoon
## X daytype hourtype location
## 1 1 weekday afternoon office
weekday_evening<-read.csv("weekday_evening.csv")
#saturday9am$daytype <- factor(saturday9am$daytype)
weekday_evening
## X daytype hourtype location
## 1 1 weekday afternoon office
locations<-read.csv("locations.csv")
# Build a NB model of location
locmodel <- naiveBayes(location ~ daytype + hourtype, data = locations)
# Predict Brett's location on a weekday afternoon
predict(locmodel, weekday_afternoon)
## [1] office
## Levels: appointment campus home office restaurant store theater
# Predict Brett's location on a weekday evening
predict(locmodel, weekday_evening, type ="raw")
## appointment campus home office restaurant store
## [1,] 0.004300045 0.08385089 0.2482618 0.5848062 0.07304768 0.005733394
## theater
## [1,] 1.290014e-08
While Brett was tracking his location over 13 weeks, he never went into the office during the weekend. Consequently, the joint probability of P(office and weekend) = 0.
Explore how this impacts the predicted probability that Brett may go to work on the weekend in the future. Additionally, you can see how using the Laplace correction will allow a small chance for these types of unforeseen circumstances.
Instructions
100 XP
The model locmodel is already in your workspace, along with the dataframe weekend_afternoon.
Use the locmodel to output predicted probabilities for a weekend afternoon by using the predict() function. Remember to set the type argument.
Create a new naive Bayes model with the Laplace smoothing parameter set to 1. You can do this by setting the laplace argument in your call to naive_bayes(). Save this as locmodel2.
See how the new predicted probabilities compare by using the predict() function on your new model.
# The 'naivebayes' package is loaded into the workspace already
# The Naive Bayes location model (locmodel) has already been built
weekend_afternoon<-read.csv("weekend_afternoon.csv")
# Observe the predicted probabilities for a weekend afternoon
predict(locmodel,weekend_afternoon,type="raw")
## appointment campus home office restaurant store
## [1,] 0.02462883 0.0004802622 0.8439145 0.003349521 0.1111338 0.01641922
## theater
## [1,] 7.38865e-05
# Build a new model using the Laplace correction
locmodel2 <- naiveBayes(location ~ daytype + hourtype, data = locations,laplace=1)
# Observe the new predicted probabilities for a weekend afternoon
predict(locmodel2,weekend_afternoon,type="raw")
## appointment campus home office restaurant store
## [1,] 0.02013872 0.006187715 0.8308154 0.007929249 0.1098743 0.01871085
## theater
## [1,] 0.006343697
By default, the naive_bayes() function in the naivebayes package does not use the Laplace correction. What is the risk of leaving this parameter unset?
Answer the question
50 XP
Possible Answers
Some potential outcomes may be predicted to be impossible.
press 1 [ans]
The algorithm may have a divide by zero error.
press 2
Naive Bayes will ignore features with zero values.
press 3
The model may not estimate probabilities for some cases.
press 4
Numeric data is often binned before it is used with Naive Bayes. Which of these is not an example of binning?
Answer the question 50 XP
Possible Answers
age values recoded as ‘child’ or ‘adult’ categories
press 1
geographic coordinates recoded into geographic regions (West, East, etc.)
press 2
test scores divided into four groups by percentile
press 3
income values standardized to follow a normal bell curve
press 4 [ans]
Logistic regression involves fitting a curve to numeric data to make predictions about binary events. Arguably one of the most widely used machine learning methods, this chapter will provide an overview of the technique while illustrating how to apply it to fundraising data.
The donors dataset contains 93,462 examples of people mailed in a fundraising solicitation for paralyzed military veterans. The donated column is 1 if the person made a donation in response to the mailing and 0 otherwise. This binary outcome will be the dependent variable for the logistic regression model.
The remaining columns are features of the prospective donors that may influence their donation behavior. These are the model’s independent variables.
When building a regression model, it is often helpful to form a hypothesis about which independent variables will be predictive of the dependent variable. The bad_address column, which is set to 1 for an invalid mailing address and 0 otherwise, seems like it might reduce the chances of a donation. Similarly, one might suspect that religious interest (interest_religion) and interest in veterans affairs (interest_veterans) would be associated with greater charitable giving.
In this exercise, you will use these three factors to create a simple model of donation behavior.
Instructions
100 XP
The dataset donors is available in your workspace.
Examine donors using the str() function.
Count the number of occurrences of each level of the donated variable using the table() function.
Fit a logistic regression model using the formula interface and the three independent variables described above.
Call glm() with the formula as its first argument and the dataframe as the data argument.
Save the result as donation_model.
Summarize the model object with summary().
donors<-read.csv("donors.csv")
# Examine the dataset to identify potential independent variables
str(donors)
## 'data.frame': 93462 obs. of 13 variables:
## $ donated : int 0 0 0 0 0 0 0 0 0 0 ...
## $ veteran : int 0 0 0 0 0 0 0 0 0 0 ...
## $ bad_address : int 0 0 0 0 0 0 0 0 0 0 ...
## $ age : int 60 46 NA 70 78 NA 38 NA NA 65 ...
## $ has_children : int 0 1 0 0 1 0 1 0 0 0 ...
## $ wealth_rating : int 0 3 1 2 1 0 2 3 1 0 ...
## $ interest_veterans: int 0 0 0 0 0 0 0 0 0 0 ...
## $ interest_religion: int 0 0 0 0 1 0 0 0 0 0 ...
## $ pet_owner : int 0 0 0 0 0 0 1 0 0 0 ...
## $ catalog_shopper : int 0 0 0 0 1 0 0 0 0 0 ...
## $ recency : Factor w/ 2 levels "CURRENT","LAPSED": 1 1 1 1 1 1 1 1 1 1 ...
## $ frequency : Factor w/ 2 levels "FREQUENT","INFREQUENT": 1 1 1 1 1 2 2 1 2 2 ...
## $ money : Factor w/ 2 levels "HIGH","MEDIUM": 2 1 2 2 2 2 2 2 2 2 ...
# Explore the dependent variable
table(donors$donated)
##
## 0 1
## 88751 4711
# Build the donation model
donation_model <- glm(donated~bad_address+interest_religion+interest_veterans, data = donors, family = "binomial")
# Summarize the model results
summary(donation_model)
##
## Call:
## glm(formula = donated ~ bad_address + interest_religion + interest_veterans,
## family = "binomial", data = donors)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3480 -0.3192 -0.3192 -0.3192 2.5678
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.95139 0.01652 -178.664 <2e-16 ***
## bad_address -0.30780 0.14348 -2.145 0.0319 *
## interest_religion 0.06724 0.05069 1.327 0.1847
## interest_veterans 0.11009 0.04676 2.354 0.0186 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 37330 on 93461 degrees of freedom
## Residual deviance: 37316 on 93458 degrees of freedom
## AIC: 37324
##
## Number of Fisher Scoring iterations: 5