wrun=read.csv("C:/Users/Prokarso/Desktop/WM.csv",header=TRUE)
wrun
rownames(wrun)=wrun[,1]
wrun[,1]=NULL
wrun
colnames(wrun)=c('100m','200m','400m','800m','1500m','3000m','Marathon')
wrun=scale(wrun)
#pca
library(stats)
pca_wrun=prcomp(wrun)
pca_wrun
## Standard deviations (1, .., p=7):
## [1] 2.4094991 0.8084835 0.5476152 0.3542280 0.2319847 0.1976089 0.1498085
## 
## Rotation (n x k) = (7 x 7):
##                PC1        PC2         PC3         PC4         PC5          PC6
## 100m     0.3683561  0.4900597 -0.28601157  0.31938631  0.23116950  0.619825234
## 200m     0.3653642  0.5365800 -0.22981913 -0.08330196  0.04145457 -0.710764580
## 400m     0.3816103  0.2465377  0.51536655 -0.34737748 -0.57217791  0.190945970
## 800m     0.3845592 -0.1554023  0.58452608 -0.04207636  0.62032379 -0.019089032
## 1500m    0.3891040 -0.3604093  0.01291198  0.42953873  0.03026144 -0.231248381
## 3000m    0.3888661 -0.3475394 -0.15272772  0.36311995 -0.46335476  0.009277159
## Marathon 0.3670038 -0.3692076 -0.48437037 -0.67249685  0.13053590  0.142280558
##                  PC7
## 100m      0.05217655
## 200m     -0.10922503
## 400m      0.20849691
## 800m     -0.31520972
## 1500m     0.69256151
## 3000m    -0.59835943
## Marathon  0.06959828
summary(pca_wrun)
## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.4095 0.80848 0.54762 0.35423 0.23198 0.19761 0.14981
## Proportion of Variance 0.8294 0.09338 0.04284 0.01793 0.00769 0.00558 0.00321
## Cumulative Proportion  0.8294 0.92276 0.96560 0.98353 0.99122 0.99679 1.00000

BIPLOT OF THE FIRST TWO PRINCIPAL COMPONENTS

biplot(pca_wrun, scale = 0,cex=c(0.5,0.7))

screeplot(pca_wrun,type="l", main="Scree plot showing the proportion of variance explained by each of the seven principal components",cex.main=0.8)

INTERPRETATIONS:

1. The 1st Principal Component explains almost 83% of the variability in the data and the 2nd Principal component explains almost 9.3% of the variability in the data. They together amount to explain approximately 92.3% of the variability present in the data, which is a desirable proportion. So taking 2 Principal components will be enough. Same conclusion can be drawn from the screeplot. 2. The first loading vector places approximately equal weight on {400m, 800m, 1500m, 3000m}, with slightly lower weights on the remaining variables. So this component roughly corresponds to athletic excellence of a given nation in medium to longer distance races. The second loading places majority of weight on {100m, 200m, Marathon}, hence this component corresponds to athletic ability in sprints and marathons. This suggests that the variables {400m, 800m, 1500m, 3000m} are somehow correlated with each other and similarly there is some correlation between {100m, 200m, Marathon}.

FACTOR ANALYSIS

factanal(wrun, factors=2)
## 
## Call:
## factanal(x = wrun, factors = 2)
## 
## Uniquenesses:
##     100m     200m     400m     800m    1500m    3000m Marathon 
##    0.081    0.005    0.190    0.168    0.014    0.046    0.203 
## 
## Loadings:
##          Factor1 Factor2
## 100m     0.441   0.851  
## 200m     0.379   0.923  
## 400m     0.574   0.693  
## 800m     0.785   0.463  
## 1500m    0.917   0.380  
## 3000m    0.889   0.405  
## Marathon 0.789   0.417  
## 
##                Factor1 Factor2
## SS loadings      3.538   2.754
## Proportion Var   0.505   0.393
## Cumulative Var   0.505   0.899
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 48.76 on 8 degrees of freedom.
## The p-value is 7.07e-08

INTERPRETATIONS:

1. The 1st Factor explains 50.5% of the total variance and the 2nd Factor explains 39.3% of the total variation in the data. Together they amount to explain 89.9% of the total variation in the data which is quite satisfactory. So, 2-factor analysis is appropriate. 2. We observe that loadings on the 2nd factor are more for variables {100m, 200m, 400m} whereas loadings on 1st factor are more for variables {800m, 1500m, 3000m, Marathon}. This indicates certain qualities of the performers in the different races. This also suggests that there is some correlation between the variables {800m, 1500m, 3000m, Marathon} and similarly for {100m, 200m, 400m}. We may suggest naming the 1st factor as ‘Endurance factor’ and the 2nd factor as ‘Sprint factor’.