Previously….
A brave fellow student gave a presentation using the KNN matching algorithm to predict the class of an unknown (and possibly alien) species….
Zach left us with a few questions at the end of the presentation, including:
- How to measure success [of the knn model]?
- How to empirically establish k?
In the comments section during the presentation, Prof. Catlin suggested using a confusion matrix to evaluate the models with different k values. In this example, I will present the concept of the confusion matrix and apply it to Zach’s KNN models.
Ewoks and Humans and Wookies (minus the Ewoks)
For Zach’s presentation, he created a data set containing 200 Ewoks, 200 humans, and 200 Wookies, then built a data frame with simulated height and weight data for each case. Then, he used a training set of 399 observations from his data set to build two models with different \(k\) values (\(k=3\) and \(k=9\)), which were then applied to the remaining 201 observations (the test data set).
To simplify the example today, I’ve reduced the problem to having only two classes - Humans and Wookies. Their scatterpoint of Height vs. Weight is shown here:
ggplot(SWSpecies, aes(x=Height, y=Weight, color=Species, shape=Species)) +
geom_point()Running the process with two different \(k\) values
Code from Zach Dravis to apply the two models:
\(k=3\)
SpeciesPrediction <- knn(train = select(TrainingData, Height, Weight),
test = select(TestData, Height, Weight),
cl = TrainingData$Species,
k = 3)
FirstTest <- cbind(TestData, SpeciesPrediction)
FirstTest %>% kable("html", caption = "Test1: k = 3") %>%
kable_styling(bootstrap_options = c("striped")) %>%
scroll_box(height = "500px")| Species | Height | Weight | SpeciesPrediction |
|---|---|---|---|
| Human | 5.742613 | 119.28202 | Human |
| Human | 5.848384 | 108.37048 | Human |
| Human | 5.592757 | 148.52337 | Human |
| Human | 5.850367 | 204.32882 | Wookie |
| Human | 5.655841 | 147.01470 | Human |
| Human | 5.880231 | 173.31711 | Wookie |
| Human | 6.421232 | 116.92211 | Human |
| Human | 6.056181 | 143.35066 | Human |
| Human | 5.516332 | 166.98285 | Human |
| Human | 4.942775 | 139.35122 | Human |
| Human | 5.709029 | 173.55788 | Human |
| Human | 5.299882 | 170.82142 | Human |
| Human | 6.246747 | 131.01810 | Human |
| Human | 4.696459 | 126.84124 | Human |
| Human | 5.292124 | 220.73452 | Wookie |
| Human | 5.711004 | 144.22352 | Human |
| Human | 5.424132 | 150.19858 | Human |
| Human | 5.196924 | 148.36362 | Human |
| Human | 5.347639 | 132.95379 | Human |
| Human | 5.814768 | 122.19072 | Human |
| Human | 5.947586 | 115.92794 | Human |
| Human | 5.830106 | 177.41887 | Wookie |
| Human | 6.636742 | 120.41812 | Human |
| Human | 6.086749 | 170.02456 | Human |
| Human | 5.643855 | 192.19786 | Human |
| Human | 5.170115 | 184.73337 | Human |
| Human | 6.959570 | 145.83740 | Human |
| Human | 5.838708 | 133.82034 | Human |
| Human | 5.157840 | 117.63320 | Human |
| Human | 5.593246 | 149.44314 | Human |
| Human | 5.337803 | 143.02153 | Human |
| Human | 5.362648 | 154.68881 | Human |
| Human | 5.033248 | 131.87971 | Human |
| Human | 5.558423 | 168.87105 | Human |
| Human | 5.659580 | 171.42509 | Human |
| Human | 4.961229 | 182.27658 | Human |
| Human | 3.883424 | 217.49267 | Wookie |
| Human | 5.372563 | 155.93672 | Human |
| Human | 5.514759 | 132.73361 | Human |
| Human | 5.797137 | 147.68037 | Human |
| Human | 5.529568 | 101.16240 | Human |
| Human | 5.706699 | 153.89769 | Human |
| Human | 4.951114 | 137.22981 | Human |
| Human | 5.855588 | 162.18035 | Human |
| Human | 5.859444 | 141.93569 | Human |
| Human | 5.625826 | 108.57046 | Human |
| Human | 6.178637 | 151.09750 | Wookie |
| Human | 5.702234 | 109.40266 | Human |
| Human | 5.632182 | 176.91019 | Human |
| Human | 5.634022 | 95.11921 | Human |
| Human | 5.718465 | 145.74369 | Human |
| Human | 6.030062 | 163.73945 | Wookie |
| Human | 5.726095 | 104.41147 | Human |
| Human | 5.831599 | 191.97166 | Human |
| Human | 4.931813 | 123.43982 | Human |
| Human | 5.314751 | 134.81009 | Human |
| Human | 6.238485 | 154.89996 | Human |
| Human | 4.888048 | 139.88568 | Human |
| Human | 5.629034 | 118.92145 | Human |
| Human | 5.702501 | 162.28014 | Human |
| Human | 5.987902 | 134.27330 | Human |
| Human | 5.325562 | 165.19501 | Human |
| Human | 5.579313 | 134.85831 | Human |
| Human | 4.618373 | 121.28249 | Human |
| Human | 5.669298 | 147.16097 | Human |
| Human | 5.166717 | 169.48318 | Human |
| Human | 5.380677 | 157.26965 | Human |
| Human | 4.906117 | 148.23198 | Human |
| Human | 5.692468 | 91.56618 | Human |
| Human | 5.833290 | 193.54331 | Human |
| Human | 5.347693 | 148.80181 | Human |
| Human | 6.412505 | 124.08007 | Human |
| Human | 5.835280 | 143.94929 | Human |
| Human | 5.974316 | 157.83767 | Wookie |
| Human | 6.524701 | 104.44946 | Human |
| Human | 5.174443 | 171.51926 | Human |
| Human | 5.904310 | 157.40498 | Wookie |
| Human | 5.993290 | 186.08301 | Human |
| Human | 5.496915 | 179.58118 | Human |
| Human | 5.659526 | 133.86178 | Human |
| Human | 4.994089 | 135.29152 | Human |
| Human | 5.735084 | 120.13451 | Human |
| Human | 5.149515 | 139.14891 | Human |
| Human | 5.906841 | 178.33064 | Wookie |
| Human | 5.094285 | 203.76621 | Human |
| Human | 5.659699 | 177.85228 | Wookie |
| Human | 5.076739 | 95.10187 | Human |
| Human | 5.377118 | 174.65634 | Human |
| Human | 4.723571 | 181.08806 | Human |
| Human | 5.564217 | 168.52260 | Human |
| Human | 5.992722 | 124.35209 | Human |
| Human | 5.591624 | 181.93532 | Human |
| Human | 4.616885 | 171.88444 | Human |
| Human | 5.189733 | 169.87970 | Human |
| Human | 6.328022 | 138.55778 | Wookie |
| Human | 6.404903 | 175.37675 | Human |
| Human | 4.912482 | 153.34776 | Human |
| Human | 5.316648 | 147.96689 | Human |
| Human | 5.676813 | 194.61294 | Human |
| Human | 5.659578 | 172.71670 | Wookie |
| Wookie | 7.738585 | 205.66270 | Wookie |
| Wookie | 6.081447 | 171.74523 | Human |
| Wookie | 7.532295 | 204.03502 | Wookie |
| Wookie | 6.918085 | 165.21143 | Wookie |
| Wookie | 8.336956 | 145.18989 | Wookie |
| Wookie | 6.817417 | 240.22203 | Wookie |
| Wookie | 5.854670 | 232.83221 | Wookie |
| Wookie | 7.368876 | 171.59847 | Wookie |
| Wookie | 7.265878 | 263.63929 | Wookie |
| Wookie | 6.986780 | 190.37280 | Wookie |
| Wookie | 6.207587 | 185.81674 | Wookie |
| Wookie | 6.370814 | 152.31641 | Human |
| Wookie | 6.990531 | 198.78247 | Wookie |
| Wookie | 7.778583 | 193.15070 | Wookie |
| Wookie | 6.726657 | 203.37592 | Wookie |
| Wookie | 6.346413 | 222.25842 | Wookie |
| Wookie | 4.971848 | 245.33002 | Wookie |
| Wookie | 7.370687 | 216.37053 | Wookie |
| Wookie | 7.320505 | 249.62808 | Wookie |
| Wookie | 7.981893 | 253.27003 | Wookie |
| Wookie | 8.100042 | 53.47590 | Wookie |
| Wookie | 6.892708 | 181.86131 | Wookie |
| Wookie | 7.973885 | 228.49663 | Wookie |
| Wookie | 7.000985 | 144.01823 | Human |
| Wookie | 6.492692 | 151.96748 | Wookie |
| Wookie | 6.363853 | 213.68682 | Wookie |
| Wookie | 6.739794 | 177.92884 | Wookie |
| Wookie | 7.170610 | 113.42561 | Wookie |
| Wookie | 6.979449 | 245.29274 | Wookie |
| Wookie | 7.150658 | 157.95802 | Wookie |
| Wookie | 7.149207 | 219.31739 | Wookie |
| Wookie | 7.535950 | 220.62151 | Wookie |
| Wookie | 6.795114 | 276.06343 | Wookie |
| Wookie | 8.226511 | 209.82821 | Wookie |
| Wookie | 7.328270 | 233.28052 | Wookie |
| Wookie | 6.104848 | 257.02955 | Wookie |
| Wookie | 6.792513 | 179.68094 | Wookie |
| Wookie | 7.386912 | 117.22193 | Human |
| Wookie | 6.691708 | 207.14650 | Wookie |
| Wookie | 7.258746 | 266.29572 | Wookie |
| Wookie | 7.325797 | 129.68612 | Human |
| Wookie | 7.977608 | 250.60658 | Wookie |
| Wookie | 6.511228 | 196.99346 | Wookie |
| Wookie | 6.987339 | 204.97407 | Wookie |
| Wookie | 6.574231 | 182.53891 | Wookie |
| Wookie | 6.154650 | 251.84266 | Wookie |
| Wookie | 5.613272 | 271.20782 | Wookie |
| Wookie | 7.146647 | 278.54244 | Wookie |
| Wookie | 7.451174 | 151.62996 | Wookie |
| Wookie | 7.479369 | 237.96429 | Wookie |
| Wookie | 5.957505 | 181.19052 | Human |
| Wookie | 6.152742 | 150.82953 | Human |
| Wookie | 6.144134 | 219.49476 | Wookie |
| Wookie | 6.483496 | 199.10246 | Human |
| Wookie | 7.127648 | 239.58623 | Wookie |
| Wookie | 7.445863 | 289.61989 | Wookie |
| Wookie | 6.388725 | 251.83584 | Wookie |
| Wookie | 4.851740 | 165.21572 | Human |
| Wookie | 7.715368 | 206.98113 | Wookie |
| Wookie | 7.461464 | 270.66227 | Wookie |
| Wookie | 6.474347 | 258.72188 | Wookie |
| Wookie | 6.194754 | 120.76092 | Human |
| Wookie | 6.775007 | 247.32150 | Wookie |
| Wookie | 7.081080 | 182.97430 | Wookie |
| Wookie | 7.542401 | 168.73187 | Wookie |
| Wookie | 8.042571 | 144.93989 | Wookie |
| Wookie | 6.363338 | 184.41023 | Human |
| Wookie | 7.218042 | 234.43283 | Wookie |
| Wookie | 8.000632 | 271.76359 | Wookie |
| Wookie | 6.323993 | 206.94535 | Wookie |
| Wookie | 6.832927 | 196.34738 | Wookie |
| Wookie | 5.671399 | 189.88320 | Wookie |
| Wookie | 7.215567 | 247.68997 | Wookie |
| Wookie | 6.519106 | 145.94033 | Human |
| Wookie | 6.306297 | 212.50094 | Wookie |
| Wookie | 6.415017 | 251.08110 | Wookie |
| Wookie | 7.333378 | 253.09258 | Wookie |
| Wookie | 5.378148 | 220.85351 | Wookie |
| Wookie | 6.680867 | 141.88169 | Human |
| Wookie | 6.239712 | 158.94576 | Wookie |
| Wookie | 6.278411 | 163.97559 | Human |
| Wookie | 7.355976 | 127.83640 | Human |
| Wookie | 7.263816 | 202.00102 | Wookie |
| Wookie | 6.919735 | 261.14733 | Wookie |
| Wookie | 7.403138 | 77.66530 | Human |
| Wookie | 6.548421 | 237.91939 | Wookie |
| Wookie | 5.482527 | 189.86692 | Wookie |
| Wookie | 7.865816 | 161.54331 | Wookie |
| Wookie | 6.826980 | 238.50044 | Wookie |
| Wookie | 6.726183 | 232.41926 | Wookie |
| Wookie | 7.118138 | 204.81087 | Wookie |
| Wookie | 7.574449 | 154.01783 | Wookie |
| Wookie | 7.319565 | 164.75691 | Wookie |
| Wookie | 7.695449 | 119.13830 | Human |
| Wookie | 7.034601 | 294.04470 | Wookie |
| Wookie | 6.621786 | 192.22028 | Human |
| Wookie | 6.774663 | 162.27431 | Wookie |
| Wookie | 7.617187 | 91.95238 | Human |
| Wookie | 6.363048 | 178.52692 | Wookie |
| Wookie | 6.284792 | 172.54360 | Wookie |
\(k=9\)
SpeciesPrediction <- knn(train = select(TrainingData, Height, Weight),
test = select(TestData, Height, Weight),
cl = TrainingData$Species,
k = 9)
SecondTest <- cbind(TestData, SpeciesPrediction)
SecondTest %>% kable("html", caption = "Test2: k = 9") %>%
kable_styling(bootstrap_options = c("striped")) %>%
scroll_box(height = "500px")| Species | Height | Weight | SpeciesPrediction |
|---|---|---|---|
| Human | 5.742613 | 119.28202 | Human |
| Human | 5.848384 | 108.37048 | Human |
| Human | 5.592757 | 148.52337 | Human |
| Human | 5.850367 | 204.32882 | Wookie |
| Human | 5.655841 | 147.01470 | Human |
| Human | 5.880231 | 173.31711 | Wookie |
| Human | 6.421232 | 116.92211 | Human |
| Human | 6.056181 | 143.35066 | Human |
| Human | 5.516332 | 166.98285 | Human |
| Human | 4.942775 | 139.35122 | Human |
| Human | 5.709029 | 173.55788 | Wookie |
| Human | 5.299882 | 170.82142 | Human |
| Human | 6.246747 | 131.01810 | Human |
| Human | 4.696459 | 126.84124 | Human |
| Human | 5.292124 | 220.73452 | Wookie |
| Human | 5.711004 | 144.22352 | Human |
| Human | 5.424132 | 150.19858 | Human |
| Human | 5.196924 | 148.36362 | Human |
| Human | 5.347639 | 132.95379 | Human |
| Human | 5.814768 | 122.19072 | Human |
| Human | 5.947586 | 115.92794 | Human |
| Human | 5.830106 | 177.41887 | Wookie |
| Human | 6.636742 | 120.41812 | Human |
| Human | 6.086749 | 170.02456 | Human |
| Human | 5.643855 | 192.19786 | Human |
| Human | 5.170115 | 184.73337 | Human |
| Human | 6.959570 | 145.83740 | Human |
| Human | 5.838708 | 133.82034 | Human |
| Human | 5.157840 | 117.63320 | Human |
| Human | 5.593246 | 149.44314 | Human |
| Human | 5.337803 | 143.02153 | Human |
| Human | 5.362648 | 154.68881 | Human |
| Human | 5.033248 | 131.87971 | Human |
| Human | 5.558423 | 168.87105 | Human |
| Human | 5.659580 | 171.42509 | Human |
| Human | 4.961229 | 182.27658 | Human |
| Human | 3.883424 | 217.49267 | Wookie |
| Human | 5.372563 | 155.93672 | Human |
| Human | 5.514759 | 132.73361 | Human |
| Human | 5.797137 | 147.68037 | Human |
| Human | 5.529568 | 101.16240 | Human |
| Human | 5.706699 | 153.89769 | Human |
| Human | 4.951114 | 137.22981 | Human |
| Human | 5.855588 | 162.18035 | Human |
| Human | 5.859444 | 141.93569 | Human |
| Human | 5.625826 | 108.57046 | Human |
| Human | 6.178637 | 151.09750 | Human |
| Human | 5.702234 | 109.40266 | Human |
| Human | 5.632182 | 176.91019 | Wookie |
| Human | 5.634022 | 95.11921 | Human |
| Human | 5.718465 | 145.74369 | Human |
| Human | 6.030062 | 163.73945 | Human |
| Human | 5.726095 | 104.41147 | Human |
| Human | 5.831599 | 191.97166 | Human |
| Human | 4.931813 | 123.43982 | Human |
| Human | 5.314751 | 134.81009 | Human |
| Human | 6.238485 | 154.89996 | Human |
| Human | 4.888048 | 139.88568 | Human |
| Human | 5.629034 | 118.92145 | Human |
| Human | 5.702501 | 162.28014 | Human |
| Human | 5.987902 | 134.27330 | Human |
| Human | 5.325562 | 165.19501 | Human |
| Human | 5.579313 | 134.85831 | Human |
| Human | 4.618373 | 121.28249 | Human |
| Human | 5.669298 | 147.16097 | Human |
| Human | 5.166717 | 169.48318 | Human |
| Human | 5.380677 | 157.26965 | Human |
| Human | 4.906117 | 148.23198 | Human |
| Human | 5.692468 | 91.56618 | Human |
| Human | 5.833290 | 193.54331 | Human |
| Human | 5.347693 | 148.80181 | Human |
| Human | 6.412505 | 124.08007 | Human |
| Human | 5.835280 | 143.94929 | Human |
| Human | 5.974316 | 157.83767 | Human |
| Human | 6.524701 | 104.44946 | Human |
| Human | 5.174443 | 171.51926 | Human |
| Human | 5.904310 | 157.40498 | Human |
| Human | 5.993290 | 186.08301 | Wookie |
| Human | 5.496915 | 179.58118 | Human |
| Human | 5.659526 | 133.86178 | Human |
| Human | 4.994089 | 135.29152 | Human |
| Human | 5.735084 | 120.13451 | Human |
| Human | 5.149515 | 139.14891 | Human |
| Human | 5.906841 | 178.33064 | Wookie |
| Human | 5.094285 | 203.76621 | Wookie |
| Human | 5.659699 | 177.85228 | Wookie |
| Human | 5.076739 | 95.10187 | Human |
| Human | 5.377118 | 174.65634 | Human |
| Human | 4.723571 | 181.08806 | Human |
| Human | 5.564217 | 168.52260 | Human |
| Human | 5.992722 | 124.35209 | Human |
| Human | 5.591624 | 181.93532 | Human |
| Human | 4.616885 | 171.88444 | Human |
| Human | 5.189733 | 169.87970 | Human |
| Human | 6.328022 | 138.55778 | Human |
| Human | 6.404903 | 175.37675 | Wookie |
| Human | 4.912482 | 153.34776 | Human |
| Human | 5.316648 | 147.96689 | Human |
| Human | 5.676813 | 194.61294 | Wookie |
| Human | 5.659578 | 172.71670 | Wookie |
| Wookie | 7.738585 | 205.66270 | Wookie |
| Wookie | 6.081447 | 171.74523 | Wookie |
| Wookie | 7.532295 | 204.03502 | Wookie |
| Wookie | 6.918085 | 165.21143 | Human |
| Wookie | 8.336956 | 145.18989 | Human |
| Wookie | 6.817417 | 240.22203 | Wookie |
| Wookie | 5.854670 | 232.83221 | Wookie |
| Wookie | 7.368876 | 171.59847 | Wookie |
| Wookie | 7.265878 | 263.63929 | Wookie |
| Wookie | 6.986780 | 190.37280 | Wookie |
| Wookie | 6.207587 | 185.81674 | Wookie |
| Wookie | 6.370814 | 152.31641 | Human |
| Wookie | 6.990531 | 198.78247 | Wookie |
| Wookie | 7.778583 | 193.15070 | Wookie |
| Wookie | 6.726657 | 203.37592 | Wookie |
| Wookie | 6.346413 | 222.25842 | Wookie |
| Wookie | 4.971848 | 245.33002 | Wookie |
| Wookie | 7.370687 | 216.37053 | Wookie |
| Wookie | 7.320505 | 249.62808 | Wookie |
| Wookie | 7.981893 | 253.27003 | Wookie |
| Wookie | 8.100042 | 53.47590 | Human |
| Wookie | 6.892708 | 181.86131 | Wookie |
| Wookie | 7.973885 | 228.49663 | Wookie |
| Wookie | 7.000985 | 144.01823 | Human |
| Wookie | 6.492692 | 151.96748 | Human |
| Wookie | 6.363853 | 213.68682 | Wookie |
| Wookie | 6.739794 | 177.92884 | Wookie |
| Wookie | 7.170610 | 113.42561 | Human |
| Wookie | 6.979449 | 245.29274 | Wookie |
| Wookie | 7.150658 | 157.95802 | Wookie |
| Wookie | 7.149207 | 219.31739 | Wookie |
| Wookie | 7.535950 | 220.62151 | Wookie |
| Wookie | 6.795114 | 276.06343 | Wookie |
| Wookie | 8.226511 | 209.82821 | Wookie |
| Wookie | 7.328270 | 233.28052 | Wookie |
| Wookie | 6.104848 | 257.02955 | Wookie |
| Wookie | 6.792513 | 179.68094 | Wookie |
| Wookie | 7.386912 | 117.22193 | Human |
| Wookie | 6.691708 | 207.14650 | Wookie |
| Wookie | 7.258746 | 266.29572 | Wookie |
| Wookie | 7.325797 | 129.68612 | Human |
| Wookie | 7.977608 | 250.60658 | Wookie |
| Wookie | 6.511228 | 196.99346 | Wookie |
| Wookie | 6.987339 | 204.97407 | Wookie |
| Wookie | 6.574231 | 182.53891 | Wookie |
| Wookie | 6.154650 | 251.84266 | Wookie |
| Wookie | 5.613272 | 271.20782 | Wookie |
| Wookie | 7.146647 | 278.54244 | Wookie |
| Wookie | 7.451174 | 151.62996 | Wookie |
| Wookie | 7.479369 | 237.96429 | Wookie |
| Wookie | 5.957505 | 181.19052 | Human |
| Wookie | 6.152742 | 150.82953 | Human |
| Wookie | 6.144134 | 219.49476 | Wookie |
| Wookie | 6.483496 | 199.10246 | Wookie |
| Wookie | 7.127648 | 239.58623 | Wookie |
| Wookie | 7.445863 | 289.61989 | Wookie |
| Wookie | 6.388725 | 251.83584 | Wookie |
| Wookie | 4.851740 | 165.21572 | Human |
| Wookie | 7.715368 | 206.98113 | Wookie |
| Wookie | 7.461464 | 270.66227 | Wookie |
| Wookie | 6.474347 | 258.72188 | Wookie |
| Wookie | 6.194754 | 120.76092 | Human |
| Wookie | 6.775007 | 247.32150 | Wookie |
| Wookie | 7.081080 | 182.97430 | Wookie |
| Wookie | 7.542401 | 168.73187 | Human |
| Wookie | 8.042571 | 144.93989 | Human |
| Wookie | 6.363338 | 184.41023 | Wookie |
| Wookie | 7.218042 | 234.43283 | Wookie |
| Wookie | 8.000632 | 271.76359 | Wookie |
| Wookie | 6.323993 | 206.94535 | Wookie |
| Wookie | 6.832927 | 196.34738 | Wookie |
| Wookie | 5.671399 | 189.88320 | Wookie |
| Wookie | 7.215567 | 247.68997 | Wookie |
| Wookie | 6.519106 | 145.94033 | Human |
| Wookie | 6.306297 | 212.50094 | Wookie |
| Wookie | 6.415017 | 251.08110 | Wookie |
| Wookie | 7.333378 | 253.09258 | Wookie |
| Wookie | 5.378148 | 220.85351 | Wookie |
| Wookie | 6.680867 | 141.88169 | Human |
| Wookie | 6.239712 | 158.94576 | Human |
| Wookie | 6.278411 | 163.97559 | Human |
| Wookie | 7.355976 | 127.83640 | Human |
| Wookie | 7.263816 | 202.00102 | Wookie |
| Wookie | 6.919735 | 261.14733 | Wookie |
| Wookie | 7.403138 | 77.66530 | Human |
| Wookie | 6.548421 | 237.91939 | Wookie |
| Wookie | 5.482527 | 189.86692 | Wookie |
| Wookie | 7.865816 | 161.54331 | Wookie |
| Wookie | 6.826980 | 238.50044 | Wookie |
| Wookie | 6.726183 | 232.41926 | Wookie |
| Wookie | 7.118138 | 204.81087 | Wookie |
| Wookie | 7.574449 | 154.01783 | Human |
| Wookie | 7.319565 | 164.75691 | Wookie |
| Wookie | 7.695449 | 119.13830 | Human |
| Wookie | 7.034601 | 294.04470 | Wookie |
| Wookie | 6.621786 | 192.22028 | Human |
| Wookie | 6.774663 | 162.27431 | Wookie |
| Wookie | 7.617187 | 91.95238 | Human |
| Wookie | 6.363048 | 178.52692 | Wookie |
| Wookie | 6.284792 | 172.54360 | Wookie |
Baseline: Random simulation
With the two given models from Zach’s presentation, I wanted to create a baseline model as a control to test against. I simulated data to assign the Species attribute randomly to a list of 67 Ewoks, 67 humans, and 67 Wookies.
From Data Science for Business: “Comparison against a random model establishes that there is some information to be extracted from the data.”
If the results of our models are better than a random simulation, we’re looking at the right features to use in our models.
simSpeciesPrediction <- c(rep("Wookie", 100), rep("Human", 100))
randOrder <- sample(1:200, 200, replace = FALSE)
simSpeciesPrediction <- simSpeciesPrediction[randOrder]
BaseTest <- cbind(TestData, simSpeciesPrediction)
BaseTest %>% kable("html", caption = "Baseline: Random") %>%
kable_styling(bootstrap_options = c("striped")) %>%
scroll_box(height = "500px")| Species | Height | Weight | simSpeciesPrediction |
|---|---|---|---|
| Human | 5.742613 | 119.28202 | Human |
| Human | 5.848384 | 108.37048 | Wookie |
| Human | 5.592757 | 148.52337 | Wookie |
| Human | 5.850367 | 204.32882 | Wookie |
| Human | 5.655841 | 147.01470 | Human |
| Human | 5.880231 | 173.31711 | Wookie |
| Human | 6.421232 | 116.92211 | Wookie |
| Human | 6.056181 | 143.35066 | Wookie |
| Human | 5.516332 | 166.98285 | Human |
| Human | 4.942775 | 139.35122 | Wookie |
| Human | 5.709029 | 173.55788 | Wookie |
| Human | 5.299882 | 170.82142 | Wookie |
| Human | 6.246747 | 131.01810 | Human |
| Human | 4.696459 | 126.84124 | Human |
| Human | 5.292124 | 220.73452 | Wookie |
| Human | 5.711004 | 144.22352 | Wookie |
| Human | 5.424132 | 150.19858 | Human |
| Human | 5.196924 | 148.36362 | Wookie |
| Human | 5.347639 | 132.95379 | Human |
| Human | 5.814768 | 122.19072 | Wookie |
| Human | 5.947586 | 115.92794 | Wookie |
| Human | 5.830106 | 177.41887 | Human |
| Human | 6.636742 | 120.41812 | Human |
| Human | 6.086749 | 170.02456 | Wookie |
| Human | 5.643855 | 192.19786 | Wookie |
| Human | 5.170115 | 184.73337 | Wookie |
| Human | 6.959570 | 145.83740 | Human |
| Human | 5.838708 | 133.82034 | Wookie |
| Human | 5.157840 | 117.63320 | Human |
| Human | 5.593246 | 149.44314 | Wookie |
| Human | 5.337803 | 143.02153 | Wookie |
| Human | 5.362648 | 154.68881 | Wookie |
| Human | 5.033248 | 131.87971 | Human |
| Human | 5.558423 | 168.87105 | Human |
| Human | 5.659580 | 171.42509 | Wookie |
| Human | 4.961229 | 182.27658 | Wookie |
| Human | 3.883424 | 217.49267 | Wookie |
| Human | 5.372563 | 155.93672 | Human |
| Human | 5.514759 | 132.73361 | Wookie |
| Human | 5.797137 | 147.68037 | Wookie |
| Human | 5.529568 | 101.16240 | Human |
| Human | 5.706699 | 153.89769 | Wookie |
| Human | 4.951114 | 137.22981 | Human |
| Human | 5.855588 | 162.18035 | Human |
| Human | 5.859444 | 141.93569 | Human |
| Human | 5.625826 | 108.57046 | Human |
| Human | 6.178637 | 151.09750 | Wookie |
| Human | 5.702234 | 109.40266 | Wookie |
| Human | 5.632182 | 176.91019 | Human |
| Human | 5.634022 | 95.11921 | Wookie |
| Human | 5.718465 | 145.74369 | Human |
| Human | 6.030062 | 163.73945 | Wookie |
| Human | 5.726095 | 104.41147 | Human |
| Human | 5.831599 | 191.97166 | Wookie |
| Human | 4.931813 | 123.43982 | Wookie |
| Human | 5.314751 | 134.81009 | Wookie |
| Human | 6.238485 | 154.89996 | Human |
| Human | 4.888048 | 139.88568 | Human |
| Human | 5.629034 | 118.92145 | Wookie |
| Human | 5.702501 | 162.28014 | Human |
| Human | 5.987902 | 134.27330 | Human |
| Human | 5.325562 | 165.19501 | Human |
| Human | 5.579313 | 134.85831 | Human |
| Human | 4.618373 | 121.28249 | Wookie |
| Human | 5.669298 | 147.16097 | Human |
| Human | 5.166717 | 169.48318 | Human |
| Human | 5.380677 | 157.26965 | Wookie |
| Human | 4.906117 | 148.23198 | Human |
| Human | 5.692468 | 91.56618 | Wookie |
| Human | 5.833290 | 193.54331 | Human |
| Human | 5.347693 | 148.80181 | Human |
| Human | 6.412505 | 124.08007 | Wookie |
| Human | 5.835280 | 143.94929 | Human |
| Human | 5.974316 | 157.83767 | Human |
| Human | 6.524701 | 104.44946 | Human |
| Human | 5.174443 | 171.51926 | Wookie |
| Human | 5.904310 | 157.40498 | Human |
| Human | 5.993290 | 186.08301 | Human |
| Human | 5.496915 | 179.58118 | Wookie |
| Human | 5.659526 | 133.86178 | Wookie |
| Human | 4.994089 | 135.29152 | Human |
| Human | 5.735084 | 120.13451 | Human |
| Human | 5.149515 | 139.14891 | Human |
| Human | 5.906841 | 178.33064 | Wookie |
| Human | 5.094285 | 203.76621 | Wookie |
| Human | 5.659699 | 177.85228 | Human |
| Human | 5.076739 | 95.10187 | Wookie |
| Human | 5.377118 | 174.65634 | Human |
| Human | 4.723571 | 181.08806 | Wookie |
| Human | 5.564217 | 168.52260 | Human |
| Human | 5.992722 | 124.35209 | Human |
| Human | 5.591624 | 181.93532 | Human |
| Human | 4.616885 | 171.88444 | Wookie |
| Human | 5.189733 | 169.87970 | Human |
| Human | 6.328022 | 138.55778 | Human |
| Human | 6.404903 | 175.37675 | Human |
| Human | 4.912482 | 153.34776 | Human |
| Human | 5.316648 | 147.96689 | Wookie |
| Human | 5.676813 | 194.61294 | Human |
| Human | 5.659578 | 172.71670 | Wookie |
| Wookie | 7.738585 | 205.66270 | Wookie |
| Wookie | 6.081447 | 171.74523 | Human |
| Wookie | 7.532295 | 204.03502 | Human |
| Wookie | 6.918085 | 165.21143 | Wookie |
| Wookie | 8.336956 | 145.18989 | Wookie |
| Wookie | 6.817417 | 240.22203 | Wookie |
| Wookie | 5.854670 | 232.83221 | Wookie |
| Wookie | 7.368876 | 171.59847 | Wookie |
| Wookie | 7.265878 | 263.63929 | Human |
| Wookie | 6.986780 | 190.37280 | Human |
| Wookie | 6.207587 | 185.81674 | Human |
| Wookie | 6.370814 | 152.31641 | Human |
| Wookie | 6.990531 | 198.78247 | Human |
| Wookie | 7.778583 | 193.15070 | Human |
| Wookie | 6.726657 | 203.37592 | Human |
| Wookie | 6.346413 | 222.25842 | Human |
| Wookie | 4.971848 | 245.33002 | Wookie |
| Wookie | 7.370687 | 216.37053 | Wookie |
| Wookie | 7.320505 | 249.62808 | Human |
| Wookie | 7.981893 | 253.27003 | Wookie |
| Wookie | 8.100042 | 53.47590 | Wookie |
| Wookie | 6.892708 | 181.86131 | Human |
| Wookie | 7.973885 | 228.49663 | Human |
| Wookie | 7.000985 | 144.01823 | Wookie |
| Wookie | 6.492692 | 151.96748 | Wookie |
| Wookie | 6.363853 | 213.68682 | Wookie |
| Wookie | 6.739794 | 177.92884 | Human |
| Wookie | 7.170610 | 113.42561 | Wookie |
| Wookie | 6.979449 | 245.29274 | Wookie |
| Wookie | 7.150658 | 157.95802 | Wookie |
| Wookie | 7.149207 | 219.31739 | Wookie |
| Wookie | 7.535950 | 220.62151 | Human |
| Wookie | 6.795114 | 276.06343 | Wookie |
| Wookie | 8.226511 | 209.82821 | Human |
| Wookie | 7.328270 | 233.28052 | Wookie |
| Wookie | 6.104848 | 257.02955 | Human |
| Wookie | 6.792513 | 179.68094 | Wookie |
| Wookie | 7.386912 | 117.22193 | Wookie |
| Wookie | 6.691708 | 207.14650 | Wookie |
| Wookie | 7.258746 | 266.29572 | Human |
| Wookie | 7.325797 | 129.68612 | Wookie |
| Wookie | 7.977608 | 250.60658 | Human |
| Wookie | 6.511228 | 196.99346 | Human |
| Wookie | 6.987339 | 204.97407 | Human |
| Wookie | 6.574231 | 182.53891 | Wookie |
| Wookie | 6.154650 | 251.84266 | Human |
| Wookie | 5.613272 | 271.20782 | Human |
| Wookie | 7.146647 | 278.54244 | Wookie |
| Wookie | 7.451174 | 151.62996 | Wookie |
| Wookie | 7.479369 | 237.96429 | Human |
| Wookie | 5.957505 | 181.19052 | Wookie |
| Wookie | 6.152742 | 150.82953 | Human |
| Wookie | 6.144134 | 219.49476 | Wookie |
| Wookie | 6.483496 | 199.10246 | Human |
| Wookie | 7.127648 | 239.58623 | Wookie |
| Wookie | 7.445863 | 289.61989 | Human |
| Wookie | 6.388725 | 251.83584 | Human |
| Wookie | 4.851740 | 165.21572 | Human |
| Wookie | 7.715368 | 206.98113 | Human |
| Wookie | 7.461464 | 270.66227 | Wookie |
| Wookie | 6.474347 | 258.72188 | Wookie |
| Wookie | 6.194754 | 120.76092 | Wookie |
| Wookie | 6.775007 | 247.32150 | Wookie |
| Wookie | 7.081080 | 182.97430 | Human |
| Wookie | 7.542401 | 168.73187 | Wookie |
| Wookie | 8.042571 | 144.93989 | Human |
| Wookie | 6.363338 | 184.41023 | Human |
| Wookie | 7.218042 | 234.43283 | Human |
| Wookie | 8.000632 | 271.76359 | Wookie |
| Wookie | 6.323993 | 206.94535 | Wookie |
| Wookie | 6.832927 | 196.34738 | Human |
| Wookie | 5.671399 | 189.88320 | Wookie |
| Wookie | 7.215567 | 247.68997 | Wookie |
| Wookie | 6.519106 | 145.94033 | Wookie |
| Wookie | 6.306297 | 212.50094 | Wookie |
| Wookie | 6.415017 | 251.08110 | Human |
| Wookie | 7.333378 | 253.09258 | Wookie |
| Wookie | 5.378148 | 220.85351 | Human |
| Wookie | 6.680867 | 141.88169 | Wookie |
| Wookie | 6.239712 | 158.94576 | Wookie |
| Wookie | 6.278411 | 163.97559 | Human |
| Wookie | 7.355976 | 127.83640 | Human |
| Wookie | 7.263816 | 202.00102 | Human |
| Wookie | 6.919735 | 261.14733 | Wookie |
| Wookie | 7.403138 | 77.66530 | Wookie |
| Wookie | 6.548421 | 237.91939 | Wookie |
| Wookie | 5.482527 | 189.86692 | Wookie |
| Wookie | 7.865816 | 161.54331 | Human |
| Wookie | 6.826980 | 238.50044 | Human |
| Wookie | 6.726183 | 232.41926 | Wookie |
| Wookie | 7.118138 | 204.81087 | Wookie |
| Wookie | 7.574449 | 154.01783 | Human |
| Wookie | 7.319565 | 164.75691 | Human |
| Wookie | 7.695449 | 119.13830 | Human |
| Wookie | 7.034601 | 294.04470 | Human |
| Wookie | 6.621786 | 192.22028 | Wookie |
| Wookie | 6.774663 | 162.27431 | Human |
| Wookie | 7.617187 | 91.95238 | Human |
| Wookie | 6.363048 | 178.52692 | Wookie |
| Wookie | 6.284792 | 172.54360 | Human |
Is accuracy sufficient?
An intuitive first impulse to evaluate the model is to divide the number of correctly predicted results from the test data by the total number of observations in the test data. This is known as Accuracy.
#Accuracy of the Base model
(AccuracyBase <- sum(BaseTest$simSpeciesPrediction == BaseTest$Species)/length(BaseTest$simSpeciesPrediction))## [1] 0.51
#Accuracy of the FirstTest model (k = 3)
(AccuracyFirst <- sum(FirstTest$SpeciesPrediction == FirstTest$Species)/length(FirstTest$SpeciesPrediction))## [1] 0.84
#Accuracy of the SecondTest model (k = 9)
(AccuracySecond <- sum(SecondTest$SpeciesPrediction == SecondTest$Species)/length(SecondTest$SpeciesPrediction))## [1] 0.805
The test models have much greater accuracy than the random assignment simulation, so it seems like we’re on the right track. But the accuracy of a model doesn’t tell the complete story of a model, and may in fact hide its flaws.
Defining the Confusion Matrix
Instead of relying on the single metric of accuracy, we can create a confusion matrix to examine the different types of results the model generated.
While accuracy gives us the proportion of correct results, the confusion matrix separates the correct results into two sets:
- True Positive (TP): the model correctly predicted YES
- True Negative (TN): the model correctly predicted NO
The incorrect results are also divided in two:
- False Positive (FP): the model incorrectly predicted YES (actual NO) AKA Type I Error
- False Negative (FN): the model incorrectly predicted NO (actual YES) AKA Type II Error
The correct predictions fall along the main diagonal of the matrix.
From the confusion matrix, many helpful statistics can be calculated to aid in analyzing the model. The figure below gives a visual representation of the different statistics.
This figure also shows how simple accuracy can be misleading. The accuracy = \(\frac{20 + 1820}{2030} \approx 90.6\%\); however, looking at the confusion matrix gives better information - for example, the Positive predictive value is only 10%.
The Data School is a good resource for defining the statistics evaluated from the confusion matrix.
Back to our Star Wars example
In R, the confusionMatrix function from the caret package returns the confusion matrix as well as several of the summary statistics we reviewed in the previous section.
We can create confusion matrices for our three species prediction models to evaluate their performance. Since our data has three levels, the confusion matrix returns summary statistics for each feature separately.
Comparing the three models, we see that the two knn models fare significantly better than the baseline model based on random assignment.
library(caret)
confusionMatrix(BaseTest$Species, BaseTest$simSpeciesPrediction, dnn = c("Prediction", "Actual"))## Confusion Matrix and Statistics
##
## Actual
## Prediction Human Wookie
## Human 51 49
## Wookie 49 51
##
## Accuracy : 0.51
## 95% CI : (0.4385, 0.5812)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : 0.416
##
## Kappa : 0.02
## Mcnemar's Test P-Value : 1.000
##
## Sensitivity : 0.510
## Specificity : 0.510
## Pos Pred Value : 0.510
## Neg Pred Value : 0.510
## Prevalence : 0.500
## Detection Rate : 0.255
## Detection Prevalence : 0.500
## Balanced Accuracy : 0.510
##
## 'Positive' Class : Human
##
confusionMatrix(FirstTest$Species, FirstTest$SpeciesPrediction, dnn = c("Prediction", "Actual"))## Confusion Matrix and Statistics
##
## Actual
## Prediction Human Wookie
## Human 87 13
## Wookie 19 81
##
## Accuracy : 0.84
## 95% CI : (0.7817, 0.8879)
## No Information Rate : 0.53
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.68
## Mcnemar's Test P-Value : 0.3768
##
## Sensitivity : 0.8208
## Specificity : 0.8617
## Pos Pred Value : 0.8700
## Neg Pred Value : 0.8100
## Prevalence : 0.5300
## Detection Rate : 0.4350
## Detection Prevalence : 0.5000
## Balanced Accuracy : 0.8412
##
## 'Positive' Class : Human
##
confusionMatrix(SecondTest$Species, SecondTest$SpeciesPrediction, dnn = c("Prediction", "Actual"))## Confusion Matrix and Statistics
##
## Actual
## Prediction Human Wookie
## Human 86 14
## Wookie 25 75
##
## Accuracy : 0.805
## 95% CI : (0.7432, 0.8575)
## No Information Rate : 0.555
## P-Value [Acc > NIR] : 9.481e-14
##
## Kappa : 0.61
## Mcnemar's Test P-Value : 0.1093
##
## Sensitivity : 0.7748
## Specificity : 0.8427
## Pos Pred Value : 0.8600
## Neg Pred Value : 0.7500
## Prevalence : 0.5550
## Detection Rate : 0.4300
## Detection Prevalence : 0.5000
## Balanced Accuracy : 0.8087
##
## 'Positive' Class : Human
##
It appears that the \(k=3\) model performs better than the \(k=9\) model. How do you choose the correct k?
Values for \(k\)
From several sources online, a rule of thumb emerged to choose \(k\) as the square root of the number of observations in the training set. With our training set of 400 observations, how would the knn model fare with \(k=20\)?
## square root of 400 is 20
k_pref <- TrainingData$Species %>%
length() %>%
sqrt() %>%
round()
SpeciesPrediction <- knn(train = select(TrainingData, Height, Weight),
test = select(TestData, Height, Weight),
cl = TrainingData$Species,
k = k_pref)
ThirdTest <- cbind(TestData, SpeciesPrediction)
ThirdTest %>% kable("html", caption = "Test3: k = 20") %>%
kable_styling(bootstrap_options = c("striped")) %>%
scroll_box(height = "500px")| Species | Height | Weight | SpeciesPrediction |
|---|---|---|---|
| Human | 5.742613 | 119.28202 | Human |
| Human | 5.848384 | 108.37048 | Human |
| Human | 5.592757 | 148.52337 | Human |
| Human | 5.850367 | 204.32882 | Wookie |
| Human | 5.655841 | 147.01470 | Human |
| Human | 5.880231 | 173.31711 | Human |
| Human | 6.421232 | 116.92211 | Human |
| Human | 6.056181 | 143.35066 | Human |
| Human | 5.516332 | 166.98285 | Human |
| Human | 4.942775 | 139.35122 | Human |
| Human | 5.709029 | 173.55788 | Human |
| Human | 5.299882 | 170.82142 | Human |
| Human | 6.246747 | 131.01810 | Human |
| Human | 4.696459 | 126.84124 | Human |
| Human | 5.292124 | 220.73452 | Wookie |
| Human | 5.711004 | 144.22352 | Human |
| Human | 5.424132 | 150.19858 | Human |
| Human | 5.196924 | 148.36362 | Human |
| Human | 5.347639 | 132.95379 | Human |
| Human | 5.814768 | 122.19072 | Human |
| Human | 5.947586 | 115.92794 | Human |
| Human | 5.830106 | 177.41887 | Wookie |
| Human | 6.636742 | 120.41812 | Human |
| Human | 6.086749 | 170.02456 | Human |
| Human | 5.643855 | 192.19786 | Wookie |
| Human | 5.170115 | 184.73337 | Human |
| Human | 6.959570 | 145.83740 | Human |
| Human | 5.838708 | 133.82034 | Human |
| Human | 5.157840 | 117.63320 | Human |
| Human | 5.593246 | 149.44314 | Human |
| Human | 5.337803 | 143.02153 | Human |
| Human | 5.362648 | 154.68881 | Human |
| Human | 5.033248 | 131.87971 | Human |
| Human | 5.558423 | 168.87105 | Human |
| Human | 5.659580 | 171.42509 | Human |
| Human | 4.961229 | 182.27658 | Human |
| Human | 3.883424 | 217.49267 | Wookie |
| Human | 5.372563 | 155.93672 | Human |
| Human | 5.514759 | 132.73361 | Human |
| Human | 5.797137 | 147.68037 | Human |
| Human | 5.529568 | 101.16240 | Human |
| Human | 5.706699 | 153.89769 | Human |
| Human | 4.951114 | 137.22981 | Human |
| Human | 5.855588 | 162.18035 | Human |
| Human | 5.859444 | 141.93569 | Human |
| Human | 5.625826 | 108.57046 | Human |
| Human | 6.178637 | 151.09750 | Human |
| Human | 5.702234 | 109.40266 | Human |
| Human | 5.632182 | 176.91019 | Wookie |
| Human | 5.634022 | 95.11921 | Human |
| Human | 5.718465 | 145.74369 | Human |
| Human | 6.030062 | 163.73945 | Human |
| Human | 5.726095 | 104.41147 | Human |
| Human | 5.831599 | 191.97166 | Wookie |
| Human | 4.931813 | 123.43982 | Human |
| Human | 5.314751 | 134.81009 | Human |
| Human | 6.238485 | 154.89996 | Human |
| Human | 4.888048 | 139.88568 | Human |
| Human | 5.629034 | 118.92145 | Human |
| Human | 5.702501 | 162.28014 | Human |
| Human | 5.987902 | 134.27330 | Human |
| Human | 5.325562 | 165.19501 | Human |
| Human | 5.579313 | 134.85831 | Human |
| Human | 4.618373 | 121.28249 | Human |
| Human | 5.669298 | 147.16097 | Human |
| Human | 5.166717 | 169.48318 | Human |
| Human | 5.380677 | 157.26965 | Human |
| Human | 4.906117 | 148.23198 | Human |
| Human | 5.692468 | 91.56618 | Human |
| Human | 5.833290 | 193.54331 | Wookie |
| Human | 5.347693 | 148.80181 | Human |
| Human | 6.412505 | 124.08007 | Human |
| Human | 5.835280 | 143.94929 | Human |
| Human | 5.974316 | 157.83767 | Human |
| Human | 6.524701 | 104.44946 | Human |
| Human | 5.174443 | 171.51926 | Human |
| Human | 5.904310 | 157.40498 | Human |
| Human | 5.993290 | 186.08301 | Wookie |
| Human | 5.496915 | 179.58118 | Wookie |
| Human | 5.659526 | 133.86178 | Human |
| Human | 4.994089 | 135.29152 | Human |
| Human | 5.735084 | 120.13451 | Human |
| Human | 5.149515 | 139.14891 | Human |
| Human | 5.906841 | 178.33064 | Wookie |
| Human | 5.094285 | 203.76621 | Wookie |
| Human | 5.659699 | 177.85228 | Wookie |
| Human | 5.076739 | 95.10187 | Human |
| Human | 5.377118 | 174.65634 | Wookie |
| Human | 4.723571 | 181.08806 | Human |
| Human | 5.564217 | 168.52260 | Human |
| Human | 5.992722 | 124.35209 | Human |
| Human | 5.591624 | 181.93532 | Human |
| Human | 4.616885 | 171.88444 | Human |
| Human | 5.189733 | 169.87970 | Human |
| Human | 6.328022 | 138.55778 | Human |
| Human | 6.404903 | 175.37675 | Wookie |
| Human | 4.912482 | 153.34776 | Human |
| Human | 5.316648 | 147.96689 | Human |
| Human | 5.676813 | 194.61294 | Wookie |
| Human | 5.659578 | 172.71670 | Human |
| Wookie | 7.738585 | 205.66270 | Wookie |
| Wookie | 6.081447 | 171.74523 | Human |
| Wookie | 7.532295 | 204.03502 | Wookie |
| Wookie | 6.918085 | 165.21143 | Human |
| Wookie | 8.336956 | 145.18989 | Human |
| Wookie | 6.817417 | 240.22203 | Wookie |
| Wookie | 5.854670 | 232.83221 | Wookie |
| Wookie | 7.368876 | 171.59847 | Human |
| Wookie | 7.265878 | 263.63929 | Wookie |
| Wookie | 6.986780 | 190.37280 | Wookie |
| Wookie | 6.207587 | 185.81674 | Wookie |
| Wookie | 6.370814 | 152.31641 | Human |
| Wookie | 6.990531 | 198.78247 | Wookie |
| Wookie | 7.778583 | 193.15070 | Wookie |
| Wookie | 6.726657 | 203.37592 | Wookie |
| Wookie | 6.346413 | 222.25842 | Wookie |
| Wookie | 4.971848 | 245.33002 | Wookie |
| Wookie | 7.370687 | 216.37053 | Wookie |
| Wookie | 7.320505 | 249.62808 | Wookie |
| Wookie | 7.981893 | 253.27003 | Wookie |
| Wookie | 8.100042 | 53.47590 | Human |
| Wookie | 6.892708 | 181.86131 | Human |
| Wookie | 7.973885 | 228.49663 | Wookie |
| Wookie | 7.000985 | 144.01823 | Human |
| Wookie | 6.492692 | 151.96748 | Human |
| Wookie | 6.363853 | 213.68682 | Wookie |
| Wookie | 6.739794 | 177.92884 | Wookie |
| Wookie | 7.170610 | 113.42561 | Human |
| Wookie | 6.979449 | 245.29274 | Wookie |
| Wookie | 7.150658 | 157.95802 | Human |
| Wookie | 7.149207 | 219.31739 | Wookie |
| Wookie | 7.535950 | 220.62151 | Wookie |
| Wookie | 6.795114 | 276.06343 | Wookie |
| Wookie | 8.226511 | 209.82821 | Wookie |
| Wookie | 7.328270 | 233.28052 | Wookie |
| Wookie | 6.104848 | 257.02955 | Wookie |
| Wookie | 6.792513 | 179.68094 | Wookie |
| Wookie | 7.386912 | 117.22193 | Human |
| Wookie | 6.691708 | 207.14650 | Wookie |
| Wookie | 7.258746 | 266.29572 | Wookie |
| Wookie | 7.325797 | 129.68612 | Human |
| Wookie | 7.977608 | 250.60658 | Wookie |
| Wookie | 6.511228 | 196.99346 | Wookie |
| Wookie | 6.987339 | 204.97407 | Wookie |
| Wookie | 6.574231 | 182.53891 | Wookie |
| Wookie | 6.154650 | 251.84266 | Wookie |
| Wookie | 5.613272 | 271.20782 | Wookie |
| Wookie | 7.146647 | 278.54244 | Wookie |
| Wookie | 7.451174 | 151.62996 | Human |
| Wookie | 7.479369 | 237.96429 | Wookie |
| Wookie | 5.957505 | 181.19052 | Wookie |
| Wookie | 6.152742 | 150.82953 | Human |
| Wookie | 6.144134 | 219.49476 | Wookie |
| Wookie | 6.483496 | 199.10246 | Wookie |
| Wookie | 7.127648 | 239.58623 | Wookie |
| Wookie | 7.445863 | 289.61989 | Wookie |
| Wookie | 6.388725 | 251.83584 | Wookie |
| Wookie | 4.851740 | 165.21572 | Human |
| Wookie | 7.715368 | 206.98113 | Wookie |
| Wookie | 7.461464 | 270.66227 | Wookie |
| Wookie | 6.474347 | 258.72188 | Wookie |
| Wookie | 6.194754 | 120.76092 | Human |
| Wookie | 6.775007 | 247.32150 | Wookie |
| Wookie | 7.081080 | 182.97430 | Human |
| Wookie | 7.542401 | 168.73187 | Human |
| Wookie | 8.042571 | 144.93989 | Human |
| Wookie | 6.363338 | 184.41023 | Human |
| Wookie | 7.218042 | 234.43283 | Wookie |
| Wookie | 8.000632 | 271.76359 | Wookie |
| Wookie | 6.323993 | 206.94535 | Wookie |
| Wookie | 6.832927 | 196.34738 | Wookie |
| Wookie | 5.671399 | 189.88320 | Wookie |
| Wookie | 7.215567 | 247.68997 | Wookie |
| Wookie | 6.519106 | 145.94033 | Human |
| Wookie | 6.306297 | 212.50094 | Wookie |
| Wookie | 6.415017 | 251.08110 | Wookie |
| Wookie | 7.333378 | 253.09258 | Wookie |
| Wookie | 5.378148 | 220.85351 | Wookie |
| Wookie | 6.680867 | 141.88169 | Human |
| Wookie | 6.239712 | 158.94576 | Human |
| Wookie | 6.278411 | 163.97559 | Wookie |
| Wookie | 7.355976 | 127.83640 | Human |
| Wookie | 7.263816 | 202.00102 | Wookie |
| Wookie | 6.919735 | 261.14733 | Wookie |
| Wookie | 7.403138 | 77.66530 | Human |
| Wookie | 6.548421 | 237.91939 | Wookie |
| Wookie | 5.482527 | 189.86692 | Wookie |
| Wookie | 7.865816 | 161.54331 | Human |
| Wookie | 6.826980 | 238.50044 | Wookie |
| Wookie | 6.726183 | 232.41926 | Wookie |
| Wookie | 7.118138 | 204.81087 | Wookie |
| Wookie | 7.574449 | 154.01783 | Human |
| Wookie | 7.319565 | 164.75691 | Human |
| Wookie | 7.695449 | 119.13830 | Human |
| Wookie | 7.034601 | 294.04470 | Wookie |
| Wookie | 6.621786 | 192.22028 | Wookie |
| Wookie | 6.774663 | 162.27431 | Human |
| Wookie | 7.617187 | 91.95238 | Human |
| Wookie | 6.363048 | 178.52692 | Wookie |
| Wookie | 6.284792 | 172.54360 | Human |
(AccuracyThird <- sum(ThirdTest$SpeciesPrediction == ThirdTest$Species)/length(ThirdTest$SpeciesPrediction))## [1] 0.755
confusionMatrix(ThirdTest$Species, ThirdTest$SpeciesPrediction, dnn = c("Prediction", "Actual"))## Confusion Matrix and Statistics
##
## Actual
## Prediction Human Wookie
## Human 84 16
## Wookie 33 67
##
## Accuracy : 0.755
## 95% CI : (0.6894, 0.8129)
## No Information Rate : 0.585
## P-Value [Acc > NIR] : 3.611e-07
##
## Kappa : 0.51
## Mcnemar's Test P-Value : 0.02227
##
## Sensitivity : 0.7179
## Specificity : 0.8072
## Pos Pred Value : 0.8400
## Neg Pred Value : 0.6700
## Prevalence : 0.5850
## Detection Rate : 0.4200
## Detection Prevalence : 0.5000
## Balanced Accuracy : 0.7626
##
## 'Positive' Class : Human
##
(https://cran.r-project.org/web/packages/heuristica/heuristica.pdf)
AccuracyVector <- numeric()
Stats_df <- data.frame(Accuracy = double(),
Sensitivy = double(),
Specificity = double(),
Precision = double(),
stringsAsFactors=FALSE)
for(i in 1:30) {
SpeciesPrediction <- knn(train = select(TrainingData, Height, Weight),
test = select(TestData, Height, Weight),
cl = TrainingData$Species,
k = i)
Test <- cbind(TestData, SpeciesPrediction)
TestTable <- table(Test$Species, Test$SpeciesPrediction)
Stats_df[i,1] <- sum(Test$SpeciesPrediction == Test$Species)/length(Test$SpeciesPrediction)
Stats_df[i,2] <- sensitivity(TestTable)
Stats_df[i,3] <- specificity(TestTable)
Stats_df[i,4] <- precision(TestTable)
}
Stats_df## Accuracy Sensitivy Specificity Precision
## 1 0.830 0.8300000 0.8300000 0.83
## 2 0.810 0.7980769 0.8229167 0.83
## 3 0.840 0.8207547 0.8617021 0.87
## 4 0.795 0.7706422 0.8241758 0.84
## 5 0.810 0.7767857 0.8522727 0.87
## 6 0.810 0.7870370 0.8369565 0.85
## 7 0.805 0.7798165 0.8351648 0.85
## 8 0.810 0.7818182 0.8444444 0.86
## 9 0.805 0.7747748 0.8426966 0.86
## 10 0.790 0.7685185 0.8152174 0.83
## 11 0.790 0.7500000 0.8452381 0.87
## 12 0.780 0.7456140 0.8255814 0.85
## 13 0.780 0.7333333 0.8500000 0.88
## 14 0.770 0.7213115 0.8461538 0.88
## 15 0.765 0.7226891 0.8271605 0.86
## 16 0.765 0.7226891 0.8271605 0.86
## 17 0.755 0.7179487 0.8072289 0.84
## 18 0.750 0.7118644 0.8048780 0.84
## 19 0.745 0.7058824 0.8024691 0.84
## 20 0.750 0.7155172 0.7976190 0.83
## 21 0.740 0.7105263 0.7790698 0.81
## 22 0.725 0.6991150 0.7586207 0.79
## 23 0.735 0.7079646 0.7701149 0.80
## 24 0.740 0.7142857 0.7727273 0.80
## 25 0.725 0.7027027 0.7528090 0.78
## 26 0.740 0.7142857 0.7727273 0.80
## 27 0.735 0.7155963 0.7582418 0.78
## 28 0.740 0.7181818 0.7666667 0.79
## 29 0.735 0.7155963 0.7582418 0.78
## 30 0.735 0.7155963 0.7582418 0.78
k <- 1:30
ggplot(Stats_df, aes(x = 1:30)) +
geom_line(aes(y = Accuracy), colour="black") +
geom_line(aes(y = Sensitivy), colour="red") +
geom_line(aes(y = Specificity), colour="green") +
geom_line(aes(y = Precision), colour="blue") +
ylab(label="Statistic") +
xlab("k")