This is an assignment report for course ‘IMS’ in NCKU.
The following data consists of three measurements of 10 observations.
In R, you can call cov() function to calculate covariance matrix of data.
# (a) Estimate the covariance matrix Σ for these data
a.answer <- cov(data)
a.answer
## Length.cm Breadth.cm Depth.cm
## Length.cm 8.818432 1.2447833 2.3070256
## Breadth.cm 1.244783 0.9856722 0.5696944
## Depth.cm 2.307026 0.5696944 2.2084233
And if you want to calculate eigenvalues, please use eigen() function.
# (b) What’s the eigenvalues λi of Σ?
b.answer <- eigen(a.answer)$values
b.answer
## [1] 9.7544927 1.5206484 0.7373867
Calculate the total variance, which is the sum of all eigenvalues, from (b).
# (c) What is the total variance?
c.answer <- sum(b.answer)
c.answer
## [1] 12.01253
We have total variance, so we need to calculate the trace.
# (d) Prove λ1 + λ2 + λ3 = tr(Σ), where trace is sum of diagonal elements in Σ.
tr.sigma <- a.answer[1,1] + a.answer[2,2] + a.answer[3,3]
Then, compare total variance with the trace, and we will find that they are the same.
c(c.answer, tr.sigma)
## [1] 12.01253 12.01253
Refer to that scree plot, I would select only ‘one’ PC for the reason that the trend is obviously descending after PC1.
# (e) Draw a scree plot and justify how many PCs you would like to select?
plot(x=1:length(b.answer), y=b.answer[1:length(b.answer)],
xlab="PC", ylab="Variance", type="o", xaxt="n", pch=20)
axis(1, labels=paste("PC", 1:length(b.answer), sep=""), at=1:length(b.answer)) #x-axis setting
text((1:length(b.answer)),
b.answer[1:length(b.answer)]-0.2,
labels=round(b.answer[1:length(b.answer)],digits=3 )) # label text information
# (f) What percentage of the total variance is accounted for by the first principal component?
f.answer <- b.answer[1]/c.answer
paste(round(f.answer*100,2), "%", sep="")
## [1] "81.2%"
# (g) What are the coefficients of the first principal component based on covariance matrix?
pc <- princomp(data)
pc$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3
## Length.cm 0.942 0.328
## Breadth.cm 0.153 -0.217 -0.964
## Depth.cm 0.299 -0.920 0.254
##
## Comp.1 Comp.2 Comp.3
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.333 0.333 0.333
## Cumulative Var 0.333 0.667 1.000
We can use pred() to calculate projection scores of observation. And here shows the first observation on the 1st PC.
# (h) Calculate is the projection scores of the first observation on the first principal component.
pred <- predict(pc, data)
pred[1,1]
## Comp.1
## -3.752602
The way to check loadings of PC is quite simple, just call loadings feature of PC!
The answer is 0.942(Comp.1 & Length.cm)
# (i) What is the loadings (vector correlations) of the first variable on the first principal components?
pc$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3
## Length.cm 0.942 0.328
## Breadth.cm 0.153 -0.217 -0.964
## Depth.cm 0.299 -0.920 0.254
##
## Comp.1 Comp.2 Comp.3
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.333 0.333 0.333
## Cumulative Var 0.333 0.667 1.000