track=read.csv("C:/Users/Prokarso/Downloads/mens_track.csv")
track
rownames(track)=track[,9]
track[,9]=NULL
colnames(track)=c('100m','200m','400m','800m','1500m','5000m','10000m','Marathon')
track_pca=prcomp(track, scale = TRUE)
track_pca
## Standard deviations (1, .., p=8):
## [1] 2.5733531 0.9368128 0.3991505 0.3522065 0.2826310 0.2607013 0.2154519
## [8] 0.1503333
##
## Rotation (n x k) = (8 x 8):
## PC1 PC2 PC3 PC4 PC5 PC6
## 100m 0.3175565 0.56687750 0.3322620 -0.12762827 0.2625555 -0.5937042
## 200m 0.3369792 0.46162589 0.3606567 0.25911576 -0.1539571 0.6561367
## 400m 0.3556454 0.24827331 -0.5604674 -0.65234077 -0.2183229 0.1566252
## 800m 0.3686841 0.01242993 -0.5324823 0.47999895 0.5400528 -0.0146918
## 1500m 0.3728099 -0.13979665 -0.1534427 0.40451039 -0.4877151 -0.1578430
## 5000m 0.3643741 -0.31203045 0.1897643 -0.02958755 -0.2539792 -0.1412987
## 10000m 0.3667726 -0.30685985 0.1817517 -0.08006862 -0.1331764 -0.2190168
## Marathon 0.3419261 -0.43896267 0.2632087 -0.29951213 0.4979283 0.3152849
## PC7 PC8
## 100m 0.136241260 -0.1055416752
## 200m -0.112639528 0.0960543222
## 400m -0.002853707 0.0001272032
## 800m -0.238016094 0.0381651151
## 1500m 0.610011482 -0.1392909844
## 5000m -0.591298850 -0.5466969221
## 10000m -0.176871021 0.7967952190
## Marathon 0.398822209 -0.1581638575
summary(track_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5734 0.9368 0.39915 0.35221 0.28263 0.2607 0.2155
## Proportion of Variance 0.8278 0.1097 0.01992 0.01551 0.00999 0.0085 0.0058
## Cumulative Proportion 0.8278 0.9375 0.95739 0.97289 0.98288 0.9914 0.9972
## PC8
## Standard deviation 0.15033
## Proportion of Variance 0.00283
## Cumulative Proportion 1.00000
biplot(track_pca, scale = 0,cex=c(0.5,0.9))
screeplot(track_pca,type="l", main="Scree plot")
1.The first principal component itself explains about 83% of the variance in the data, whereas the second principal component explains around 11% of the variation. So the first two principal components together account for about 94% of the variance present in the data.
2.By inspecting the scree plot, we might conclude that a fair amount of variance is explained by the first two principal components, and that there is an ‘elbow’ after the second component. The remaining principal components are effectively useless as they explain a very small percentage of the total variation.
3.The first loading vector places approximately equal weight on {400m, 800m, 1500m, 5000m, 10000m}, with slightly lower weights on the remaining variables. So this component roughly corresponds to athletic excellence of a given nation in medium to longer distance races. The second loading places majority of weight on {100m, 200m, Marathon}. Hence this component likely corresponds to athletic ability in sprints and marathon races. This suggests that the variables {100m, 200m, Marathon} are possibly correlated with each other and that similarly there is some correlation between {400m, 800m, 1500m, 5000m, 10000m}.
track=scale(track)
factanal(track, factors=2, rotation="varimax")
##
## Call:
## factanal(x = track, factors = 2, rotation = "varimax")
##
## Uniquenesses:
## 100m 200m 400m 800m 1500m 5000m 10000m Marathon
## 0.081 0.076 0.151 0.135 0.082 0.034 0.018 0.086
##
## Loadings:
## Factor1 Factor2
## 100m 0.291 0.914
## 200m 0.382 0.882
## 400m 0.543 0.744
## 800m 0.691 0.622
## 1500m 0.799 0.530
## 5000m 0.901 0.394
## 10000m 0.907 0.399
## Marathon 0.915 0.278
##
## Factor1 Factor2
## SS loadings 4.112 3.225
## Proportion Var 0.514 0.403
## Cumulative Var 0.514 0.917
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 16.36 on 13 degrees of freedom.
## The p-value is 0.23
1.{1500m, 5000m, 10000m, Marathon} define Factor 1, while {100m, 200m, 400m} define Factor 2. ‘800m’ is more closely aligned with Factor 2. So we suspect there is some correlation between {1500m, 5000m, 10000m, Marathon} and similarly {100m, 200m, 400m} are also correlated. The 1st and 2nd factors may be named as “Endurance Factor” and “Sprint Factor” respectively.
2.Factor 1 explains about 51% of variation in the data, whereas Factor 2 explains around 40% variation. So these two factors together explain more than 90% of variation in the data. This suggests that the choice of 2 factors is likely sufficient in this context.