library(readODS)
library(ggbiplot)
library(plotly)
library(tidyr)
library(cowplot)
options(width = 120)
set.seed(357)
xy <- read_ods("./data/2019-06-29-crnotici-idpa.ods")
xy$team[is.na(xy$team)] <- "NO TEAM"
xy$kategorija <- NULL
stages <- xy[, grepl("stage", names(xy))]
Yesterday (2019-06-29) we attended a national level competition in sports shooting (IDPA like competition). There was 30-ish competitors, the six stages were fun and judges were strict but fair. If I recall correctly, there was only one disqualification. A person is disqualified when they break very strict safety protocols, e.g. point their weapon outside the strict safety zone during reloading.
This is my short analysis of the results.
I try to see how alike are competitors based on their final score using principal component analysis.
pca <- prcomp(stages,
center = TRUE,
scale. = TRUE)
print(pca)
## Standard deviations (1, .., p=6):
## [1] 1.9552623 0.9219291 0.7165578 0.5817851 0.5435593 0.4238046
##
## Rotation (n x k) = (6 x 6):
## PC1 PC2 PC3 PC4 PC5 PC6
## stage_1 0.4067072 -0.45323134 0.04797856 -0.6190155 -0.39834849 -0.2915596
## stage_2 0.4462812 -0.05200627 0.12896922 0.5452837 -0.58860334 0.3710891
## stage_3 0.4608912 -0.06952155 -0.06586136 -0.2869898 0.49751990 0.6697157
## stage_4 0.3731726 0.37364225 -0.80968436 0.0500116 -0.03765967 -0.2482448
## stage_5 0.3090213 0.74632286 0.52415229 -0.2147462 -0.01803815 -0.1622687
## stage_6 0.4338775 -0.30072764 0.21541332 0.4341719 0.49557534 -0.4907241
ggbiplot(pca,
groups = xy$team,
labels = xy$place,
var.axes = FALSE
) +
theme_bw() +
scale_color_brewer(palette = "Set1") +
coord_fixed(ratio = 0.7) +
theme(legend.text = element_text(size = 8),
legend.title = element_text(size = 9))
Notice that first component (x axis) is ranking players based on their final score or place. Top players on the left and as we scan to the right, ranking is increasing with the last shooter on the far right in 29th place. Second component is probably because stages 4 and 5. We’ll see what is going on at those stages in the next section below. A lot of variance is explained by the first two components (nearly 80%), meaning this representation of the data is probably reflecting well the mutidimensional nature of the results.
There is a hint of a few groups, going from left to right, with top ~8 shooters in one group, a big group of mediocre shooters and a few limping behind.
Let’s see if can classify them using k-means clustering.
group.vector <- 2:5
clusters <- sapply(group.vector, FUN = function(x) {
kmeans(x = stages, center = x)
}, simplify = FALSE)
plot(x = group.vector,
y = sapply(clusters, FUN = function(x) {
sum(x$withinss)
})
)
gg.plots <- list()
for (i in seq_along(group.vector)) {
gg.plots[[i]] <- ggbiplot(pca,
groups = as.factor(clusters[[i]]$cluster),
labels = xy$place,
var.axes = FALSE
) +
theme_bw() +
theme(legend.position = "none",
axis.title = element_blank(),
axis.text = element_blank()) +
scale_color_brewer(palette = "Set1") +
coord_fixed(ratio = 0.5)
}
plot_grid(plotlist = gg.plots,
rel_widths = 0.5, rel_heights = 0.5,
labels = group.vector)
Three clusters (upper right corner) appears to work for our data. Personally, I would not believe five clusters just because it’s grouping 1., 6. and 8. shooter into one group. This is expected as k-means doesn’t assume any underlaying ordering.
xy$class <- factor(clusters[[2]]$cluster, levels = c(1, 3, 2), labels = c("fast", "mid", "slow"))
xy.long <- gather(xy[, !(names(xy) %in% c("total_time"))], key = stage, value = time,
-registration_num, -place, -name, -surname, -team, -class)
ggplotly(
ggplot(xy.long, aes(x = stage, y = time, group = place, color = class)) +
theme_bw() +
scale_color_brewer(palette = "Set1") +
geom_line()
)
This figure gives us insight into how well certain shooters performed throughout the stages. Notice that some of the lagging shooters performed surprisingly well on certain stages and some high stakes shooters performed badly (e.g. stage 5).