wrun=read.csv("C:/Users/Prokarso/Desktop/WM.csv",header=TRUE)
wrun

rownames(wrun)=wrun[,1]
wrun[,1]=NULL
wrun

colnames(wrun)=c('100m','200m','400m','800m','1500m','3000m','Marathon')
wrun=scale(wrun)
#pca
library(stats)
pca_wrun=prcomp(wrun)
pca_wrun

## Standard deviations (1, .., p=7):
## [1] 2.4094991 0.8084835 0.5476152 0.3542280 0.2319847 0.1976089 0.1498085
## 
## Rotation (n x k) = (7 x 7):
##                PC1        PC2         PC3         PC4         PC5          PC6
## 100m     0.3683561  0.4900597 -0.28601157  0.31938631  0.23116950  0.619825234
## 200m     0.3653642  0.5365800 -0.22981913 -0.08330196  0.04145457 -0.710764580
## 400m     0.3816103  0.2465377  0.51536655 -0.34737748 -0.57217791  0.190945970
## 800m     0.3845592 -0.1554023  0.58452608 -0.04207636  0.62032379 -0.019089032
## 1500m    0.3891040 -0.3604093  0.01291198  0.42953873  0.03026144 -0.231248381
## 3000m    0.3888661 -0.3475394 -0.15272772  0.36311995 -0.46335476  0.009277159
## Marathon 0.3670038 -0.3692076 -0.48437037 -0.67249685  0.13053590  0.142280558
##                  PC7
## 100m      0.05217655
## 200m     -0.10922503
## 400m      0.20849691
## 800m     -0.31520972
## 1500m     0.69256151
## 3000m    -0.59835943
## Marathon  0.06959828

summary(pca_wrun)

## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.4095 0.80848 0.54762 0.35423 0.23198 0.19761 0.14981
## Proportion of Variance 0.8294 0.09338 0.04284 0.01793 0.00769 0.00558 0.00321
## Cumulative Proportion  0.8294 0.92276 0.96560 0.98353 0.99122 0.99679 1.00000

BIPLOT OF THE FIRST TWO PRINCIPAL COMPONENTS

biplot(pca_wrun, scale = 0,cex=c(0.5,0.7))

screeplot(pca_wrun,type="l", main="Scree plot showing the proportion of variance explained by each of the seven principal components",cex.main=0.8)

INTERPRETATIONS:

1. The 1st Principal Component explains almost 83% of the variability in the data and the 2nd Principal component explains almost 9.3% of the variability in the data. They together amount to explain approximately 92.3% of the variability present in the data, which is a desirable proportion. So taking 2 Principal components will be enough. Same conclusion can be drawn from the screeplot. 2. The first loading vector places approximately equal weight on {400m, 800m, 1500m, 3000m}, with slightly lower weights on the remaining variables. So this component roughly corresponds to athletic excellence of a given nation in medium to longer distance races. The second loading places majority of weight on {100m, 200m, Marathon}, hence this component corresponds to athletic ability in sprints and marathons. This suggests that the variables {400m, 800m, 1500m, 3000m} are somehow correlated with each other and similarly there is some correlation between {100m, 200m, Marathon}.

FACTOR ANALYSIS

factanal(wrun, factors=2)

## 
## Call:
## factanal(x = wrun, factors = 2)
## 
## Uniquenesses:
##     100m     200m     400m     800m    1500m    3000m Marathon 
##    0.081    0.005    0.190    0.168    0.014    0.046    0.203 
## 
## Loadings:
##          Factor1 Factor2
## 100m     0.441   0.851  
## 200m     0.379   0.923  
## 400m     0.574   0.693  
## 800m     0.785   0.463  
## 1500m    0.917   0.380  
## 3000m    0.889   0.405  
## Marathon 0.789   0.417  
## 
##                Factor1 Factor2
## SS loadings      3.538   2.754
## Proportion Var   0.505   0.393
## Cumulative Var   0.505   0.899
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 48.76 on 8 degrees of freedom.
## The p-value is 7.07e-08

INTERPRETATIONS:

1. The 1st Factor explains 50.5% of the total variance and the 2nd Factor explains 39.3% of the total variation in the data. Together they amount to explain 89.9% of the total variation in the data which is quite satisfactory. So, 2-factor analysis is appropriate. 2. We observe that loadings on the 2nd factor are more for variables {100m, 200m, 400m} whereas loadings on 1st factor are more for variables {800m, 1500m, 3000m, Marathon}. This indicates certain qualities of the performers in the different races. This also suggests that there is some correlation between the variables {800m, 1500m, 3000m, Marathon} and similarly for {100m, 200m, 400m}. We may suggest naming the 1st factor as ‘Endurance factor’ and the 2nd factor as ‘Sprint factor’.

PRINCIPAL COMPONENT ANALYSIS ON WOMEN MARATHON TRACK RECORD DATA

BIPLOT OF THE FIRST TWO PRINCIPAL COMPONENTS

INTERPRETATIONS:

FACTOR ANALYSIS

INTERPRETATIONS: