smpr=read.csv('C:/Users/Prokarso/Downloads/STOCK (1).csv')
smpr
colnames(smpr)=c('Allied Chemical','Du Pont','Union Carbide','Exxon','Texaco')
smpr=scale(smpr)
library(stats)
#pca
pca_smpr=prcomp(smpr)
pca_smpr
## Standard deviations (1, .., p=5):
## [1] 1.6907217 0.8988477 0.7345337 0.6715794 0.5856402
##
## Rotation (n x k) = (5 x 5):
## PC1 PC2 PC3 PC4 PC5
## Allied Chemical 0.4633838 -0.2400729 0.6163899 -0.3774384 -0.4530388
## Du Pont 0.4569441 -0.5099083 -0.1755200 -0.2115318 0.6750131
## Union Carbide 0.4698269 -0.2607096 -0.3387787 0.6636332 -0.3951116
## Exxon 0.4223661 0.5223300 -0.5381847 -0.4759962 -0.1804525
## Texaco 0.4211260 0.5844317 0.4299331 0.3818549 0.3878620
summary(pca_smpr)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 1.6907 0.8988 0.7345 0.6716 0.58564
## Proportion of Variance 0.5717 0.1616 0.1079 0.0902 0.06859
## Cumulative Proportion 0.5717 0.7333 0.8412 0.9314 1.00000
biplot(pca_smpr, scale = 0,cex=c(0.5,0.9))
plot(pca_smpr,type="l", main="Scree Plot")
1. The 1st principal component explains 57.17%, 2nd principal component explains 16.18% and the 3rd principal component explains 10.08% of the total variation in the data. Together they amount to 84.11% of the total variation in the data. So, we take 3 principal components as desired. 2. Scree plot indicates an ‘elbow’ at the 2nd principal component indicating that two principal components would be enough to explain most of the variation. 3. From the loading vectors, we can take into consideration the fact that the first 3 variables {Allied chemical, Du Pont, Union Carbide} are correlated among themselves and the 2 variables {Texaco,Exxon} are correlated.
factanal(smpr, factors=2)
##
## Call:
## factanal(x = smpr, factors = 2)
##
## Uniquenesses:
## Allied Chemical Du Pont Union Carbide Exxon Texaco
## 0.496 0.251 0.474 0.607 0.179
##
## Loadings:
## Factor1 Factor2
## Allied Chemical 0.600 0.378
## Du Pont 0.849 0.165
## Union Carbide 0.642 0.337
## Exxon 0.366 0.509
## Texaco 0.207 0.882
##
## Factor1 Factor2
## SS loadings 1.671 1.321
## Proportion Var 0.334 0.264
## Cumulative Var 0.334 0.598
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 0.58 on 1 degree of freedom.
## The p-value is 0.448
1. The 1st factor explains 33.4% and the 2nd factor explains 26.4% of the total variation in the data. Together they amount to explain almost 60% of the variation in the data. This is not enough, but ‘R’ does not allow taking more factors for 5 variables. 2. We observe that loadings on the 1st factor are more for variables {Allied Chemical, Du Pont, Union Carbide} whereas loadings on 2nd factor are more for variables {Exxon, Texaco}. This indicates certain qualities of the industrial companies. We may suggest to name the 1st factor as ‘Chemical manufacture’ and the 2nd factor as ‘Oil manufacture’.