MEN MARATHON DATA (PCA AND FA)

PRINCIPAL COMPONENT ANALYSIS

track=read.csv("C:/Users/Prokarso/Downloads/mens_track.csv")
track

rownames(track)=track[,9]
track[,9]=NULL
colnames(track)=c('100m','200m','400m','800m','1500m','5000m','10000m','Marathon')
track_pca=prcomp(track, scale = TRUE)
track_pca

## Standard deviations (1, .., p=8):
## [1] 2.5733531 0.9368128 0.3991505 0.3522065 0.2826310 0.2607013 0.2154519
## [8] 0.1503333
## 
## Rotation (n x k) = (8 x 8):
##                PC1         PC2        PC3         PC4        PC5        PC6
## 100m     0.3175565  0.56687750  0.3322620 -0.12762827  0.2625555 -0.5937042
## 200m     0.3369792  0.46162589  0.3606567  0.25911576 -0.1539571  0.6561367
## 400m     0.3556454  0.24827331 -0.5604674 -0.65234077 -0.2183229  0.1566252
## 800m     0.3686841  0.01242993 -0.5324823  0.47999895  0.5400528 -0.0146918
## 1500m    0.3728099 -0.13979665 -0.1534427  0.40451039 -0.4877151 -0.1578430
## 5000m    0.3643741 -0.31203045  0.1897643 -0.02958755 -0.2539792 -0.1412987
## 10000m   0.3667726 -0.30685985  0.1817517 -0.08006862 -0.1331764 -0.2190168
## Marathon 0.3419261 -0.43896267  0.2632087 -0.29951213  0.4979283  0.3152849
##                   PC7           PC8
## 100m      0.136241260 -0.1055416752
## 200m     -0.112639528  0.0960543222
## 400m     -0.002853707  0.0001272032
## 800m     -0.238016094  0.0381651151
## 1500m     0.610011482 -0.1392909844
## 5000m    -0.591298850 -0.5466969221
## 10000m   -0.176871021  0.7967952190
## Marathon  0.398822209 -0.1581638575

SUMMARY

summary(track_pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5    PC6    PC7
## Standard deviation     2.5734 0.9368 0.39915 0.35221 0.28263 0.2607 0.2155
## Proportion of Variance 0.8278 0.1097 0.01992 0.01551 0.00999 0.0085 0.0058
## Cumulative Proportion  0.8278 0.9375 0.95739 0.97289 0.98288 0.9914 0.9972
##                            PC8
## Standard deviation     0.15033
## Proportion of Variance 0.00283
## Cumulative Proportion  1.00000

Biplot of the first two principal components

biplot(track_pca, scale = 0,cex=c(0.5,0.9))

Scree plot showing the proportion of variance explained by each of the eight principal components

screeplot(track_pca,type="l", main="Scree plot")

Interpretation:

1.The first principal component itself explains about 83% of the variance in the data, whereas the second principal component explains around 11% of the variation. So the first two principal components together account for about 94% of the variance present in the data.

2.By inspecting the scree plot, we might conclude that a fair amount of variance is explained by the first two principal components, and that there is an ‘elbow’ after the second component. The remaining principal components are effectively useless as they explain a very small percentage of the total variation.

3.The first loading vector places approximately equal weight on {400m, 800m, 1500m, 5000m, 10000m}, with slightly lower weights on the remaining variables. So this component roughly corresponds to athletic excellence of a given nation in medium to longer distance races. The second loading places majority of weight on {100m, 200m, Marathon}. Hence this component likely corresponds to athletic ability in sprints and marathon races. This suggests that the variables {100m, 200m, Marathon} are possibly correlated with each other and that similarly there is some correlation between {400m, 800m, 1500m, 5000m, 10000m}.

FACTOR ANALYSIS

track=scale(track)
factanal(track, factors=2, rotation="varimax")

## 
## Call:
## factanal(x = track, factors = 2, rotation = "varimax")
## 
## Uniquenesses:
##     100m     200m     400m     800m    1500m    5000m   10000m Marathon 
##    0.081    0.076    0.151    0.135    0.082    0.034    0.018    0.086 
## 
## Loadings:
##          Factor1 Factor2
## 100m     0.291   0.914  
## 200m     0.382   0.882  
## 400m     0.543   0.744  
## 800m     0.691   0.622  
## 1500m    0.799   0.530  
## 5000m    0.901   0.394  
## 10000m   0.907   0.399  
## Marathon 0.915   0.278  
## 
##                Factor1 Factor2
## SS loadings      4.112   3.225
## Proportion Var   0.514   0.403
## Cumulative Var   0.514   0.917
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 16.36 on 13 degrees of freedom.
## The p-value is 0.23

Interpretation:

1.{1500m, 5000m, 10000m, Marathon} define Factor 1, while {100m, 200m, 400m} define Factor 2. ‘800m’ is more closely aligned with Factor 2. So we suspect there is some correlation between {1500m, 5000m, 10000m, Marathon} and similarly {100m, 200m, 400m} are also correlated. The 1st and 2nd factors may be named as “Endurance Factor” and “Sprint Factor” respectively.

2.Factor 1 explains about 51% of variation in the data, whereas Factor 2 explains around 40% variation. So these two factors together explain more than 90% of variation in the data. This suggests that the choice of 2 factors is likely sufficient in this context.