wrun=read.csv("C:/Users/Prokarso/Desktop/WM.csv",header=TRUE)
wrun
rownames(wrun)=wrun[,1]
wrun[,1]=NULL
wrun
colnames(wrun)=c('100m','200m','400m','800m','1500m','3000m','Marathon')
wrun=scale(wrun)
#pca
library(stats)
pca_wrun=prcomp(wrun)
pca_wrun
## Standard deviations (1, .., p=7):
## [1] 2.4094991 0.8084835 0.5476152 0.3542280 0.2319847 0.1976089 0.1498085
##
## Rotation (n x k) = (7 x 7):
## PC1 PC2 PC3 PC4 PC5 PC6
## 100m 0.3683561 0.4900597 -0.28601157 0.31938631 0.23116950 0.619825234
## 200m 0.3653642 0.5365800 -0.22981913 -0.08330196 0.04145457 -0.710764580
## 400m 0.3816103 0.2465377 0.51536655 -0.34737748 -0.57217791 0.190945970
## 800m 0.3845592 -0.1554023 0.58452608 -0.04207636 0.62032379 -0.019089032
## 1500m 0.3891040 -0.3604093 0.01291198 0.42953873 0.03026144 -0.231248381
## 3000m 0.3888661 -0.3475394 -0.15272772 0.36311995 -0.46335476 0.009277159
## Marathon 0.3670038 -0.3692076 -0.48437037 -0.67249685 0.13053590 0.142280558
## PC7
## 100m 0.05217655
## 200m -0.10922503
## 400m 0.20849691
## 800m -0.31520972
## 1500m 0.69256151
## 3000m -0.59835943
## Marathon 0.06959828
summary(pca_wrun)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.4095 0.80848 0.54762 0.35423 0.23198 0.19761 0.14981
## Proportion of Variance 0.8294 0.09338 0.04284 0.01793 0.00769 0.00558 0.00321
## Cumulative Proportion 0.8294 0.92276 0.96560 0.98353 0.99122 0.99679 1.00000
biplot(pca_wrun, scale = 0,cex=c(0.5,0.7))
screeplot(pca_wrun,type="l", main="Scree plot showing the proportion of variance explained by each of the seven principal components",cex.main=0.8)
1. The 1st Principal Component explains almost 83% of the variability in the data and the 2nd Principal component explains almost 9.3% of the variability in the data. They together amount to explain approximately 92.3% of the variability present in the data, which is a desirable proportion. So taking 2 Principal components will be enough. Same conclusion can be drawn from the screeplot. 2. The first loading vector places approximately equal weight on {400m, 800m, 1500m, 3000m}, with slightly lower weights on the remaining variables. So this component roughly corresponds to athletic excellence of a given nation in medium to longer distance races. The second loading places majority of weight on {100m, 200m, Marathon}, hence this component corresponds to athletic ability in sprints and marathons. This suggests that the variables {400m, 800m, 1500m, 3000m} are somehow correlated with each other and similarly there is some correlation between {100m, 200m, Marathon}.
factanal(wrun, factors=2)
##
## Call:
## factanal(x = wrun, factors = 2)
##
## Uniquenesses:
## 100m 200m 400m 800m 1500m 3000m Marathon
## 0.081 0.005 0.190 0.168 0.014 0.046 0.203
##
## Loadings:
## Factor1 Factor2
## 100m 0.441 0.851
## 200m 0.379 0.923
## 400m 0.574 0.693
## 800m 0.785 0.463
## 1500m 0.917 0.380
## 3000m 0.889 0.405
## Marathon 0.789 0.417
##
## Factor1 Factor2
## SS loadings 3.538 2.754
## Proportion Var 0.505 0.393
## Cumulative Var 0.505 0.899
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 48.76 on 8 degrees of freedom.
## The p-value is 7.07e-08
1. The 1st Factor explains 50.5% of the total variance and the 2nd Factor explains 39.3% of the total variation in the data. Together they amount to explain 89.9% of the total variation in the data which is quite satisfactory. So, 2-factor analysis is appropriate. 2. We observe that loadings on the 2nd factor are more for variables {100m, 200m, 400m} whereas loadings on 1st factor are more for variables {800m, 1500m, 3000m, Marathon}. This indicates certain qualities of the performers in the different races. This also suggests that there is some correlation between the variables {800m, 1500m, 3000m, Marathon} and similarly for {100m, 200m, 400m}. We may suggest naming the 1st factor as ‘Endurance factor’ and the 2nd factor as ‘Sprint factor’.