PRINCIPAL COMPONENT ANALYSIS ON STOCK MARKET DATA

smpr=read.csv('C:/Users/Prokarso/Downloads/STOCK (1).csv')
smpr
colnames(smpr)=c('Allied Chemical','Du Pont','Union Carbide','Exxon','Texaco')
smpr=scale(smpr)
library(stats)

#pca
pca_smpr=prcomp(smpr)
pca_smpr
## Standard deviations (1, .., p=5):
## [1] 1.6907217 0.8988477 0.7345337 0.6715794 0.5856402
## 
## Rotation (n x k) = (5 x 5):
##                       PC1        PC2        PC3        PC4        PC5
## Allied Chemical 0.4633838 -0.2400729  0.6163899 -0.3774384 -0.4530388
## Du Pont         0.4569441 -0.5099083 -0.1755200 -0.2115318  0.6750131
## Union Carbide   0.4698269 -0.2607096 -0.3387787  0.6636332 -0.3951116
## Exxon           0.4223661  0.5223300 -0.5381847 -0.4759962 -0.1804525
## Texaco          0.4211260  0.5844317  0.4299331  0.3818549  0.3878620
summary(pca_smpr)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5
## Standard deviation     1.6907 0.8988 0.7345 0.6716 0.58564
## Proportion of Variance 0.5717 0.1616 0.1079 0.0902 0.06859
## Cumulative Proportion  0.5717 0.7333 0.8412 0.9314 1.00000

BIPLOT OF THE FIRST TWO PRINCIPAL COMPONENTS

biplot(pca_smpr, scale = 0,cex=c(0.5,0.9))

Scree plot showing the proportion of variance explained by each of the five principal components

plot(pca_smpr,type="l", main="Scree Plot")

INTERPRETATIONS:

1. The 1st principal component explains 57.17%, 2nd principal component explains 16.18% and the 3rd principal component explains 10.08% of the total variation in the data. Together they amount to 84.11% of the total variation in the data. So, we take 3 principal components as desired. 2. Scree plot indicates an ‘elbow’ at the 2nd principal component indicating that two principal components would be enough to explain most of the variation. 3. From the loading vectors, we can take into consideration the fact that the first 3 variables {Allied chemical, Du Pont, Union Carbide} are correlated among themselves and the 2 variables {Texaco,Exxon} are correlated.

FACTOR ANALYSIS

factanal(smpr, factors=2)
## 
## Call:
## factanal(x = smpr, factors = 2)
## 
## Uniquenesses:
## Allied Chemical         Du Pont   Union Carbide           Exxon          Texaco 
##           0.496           0.251           0.474           0.607           0.179 
## 
## Loadings:
##                 Factor1 Factor2
## Allied Chemical 0.600   0.378  
## Du Pont         0.849   0.165  
## Union Carbide   0.642   0.337  
## Exxon           0.366   0.509  
## Texaco          0.207   0.882  
## 
##                Factor1 Factor2
## SS loadings      1.671   1.321
## Proportion Var   0.334   0.264
## Cumulative Var   0.334   0.598
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 0.58 on 1 degree of freedom.
## The p-value is 0.448

INTERPRETATIONS:

1. The 1st factor explains 33.4% and the 2nd factor explains 26.4% of the total variation in the data. Together they amount to explain almost 60% of the variation in the data. This is not enough, but ‘R’ does not allow taking more factors for 5 variables. 2. We observe that loadings on the 1st factor are more for variables {Allied Chemical, Du Pont, Union Carbide} whereas loadings on 2nd factor are more for variables {Exxon, Texaco}. This indicates certain qualities of the industrial companies. We may suggest to name the 1st factor as ‘Chemical manufacture’ and the 2nd factor as ‘Oil manufacture’.