This is an assignment report for course ‘IMS’ in NCKU.

1. Principal Component Analysis (50%)

The following data consists of three measurements of 10 observations.

(a) Estimate the covariance matrix Σ for these data?

In R, you can call cov() function to calculate covariance matrix of data.

# (a) Estimate the covariance matrix Σ for these data
a.answer <- cov(data)
a.answer
##            Length.cm Breadth.cm  Depth.cm
## Length.cm   8.818432  1.2447833 2.3070256
## Breadth.cm  1.244783  0.9856722 0.5696944
## Depth.cm    2.307026  0.5696944 2.2084233

(b) What’s the eigenvalues λi of Σ?

And if you want to calculate eigenvalues, please use eigen() function.

# (b) What’s the eigenvalues λi of Σ?
b.answer <- eigen(a.answer)$values
b.answer
## [1] 9.7544927 1.5206484 0.7373867

(c) What is the total variance?

Calculate the total variance, which is the sum of all eigenvalues, from (b).

# (c) What is the total variance?
c.answer <- sum(b.answer)
c.answer
## [1] 12.01253

(d) Prove λ1 + λ2 + λ3 = tr(Σ), where trace is sum of diagonal elements in Σ.

We have total variance, so we need to calculate the trace.

# (d) Prove λ1 + λ2 + λ3 = tr(Σ), where trace is sum of diagonal elements in Σ.
tr.sigma <- a.answer[1,1] + a.answer[2,2] + a.answer[3,3]

Then, compare total variance with the trace, and we will find that they are the same.

c(c.answer, tr.sigma)
## [1] 12.01253 12.01253

(e) Draw a scree plot and justify how many PCs you would like to select?

Refer to that scree plot, I would select only ‘one’ PC for the reason that the trend is obviously descending after PC1.

# (e) Draw a scree plot and justify how many PCs you would like to select?
plot(x=1:length(b.answer), y=b.answer[1:length(b.answer)], 
     xlab="PC", ylab="Variance", type="o", xaxt="n", pch=20)
axis(1, labels=paste("PC", 1:length(b.answer), sep=""), at=1:length(b.answer)) #x-axis setting

text((1:length(b.answer)),
     b.answer[1:length(b.answer)]-0.2, 
     labels=round(b.answer[1:length(b.answer)],digits=3 )) # label text information

(f) What percentage of the total variance is accounted for by the first principal component?

# (f) What percentage of the total variance is accounted for by the first principal component?
f.answer <- b.answer[1]/c.answer
paste(round(f.answer*100,2), "%", sep="")
## [1] "81.2%"

(g) What are the coefficients of the first principal component based on covariance matrix?

# (g) What are the coefficients of the first principal component based on covariance matrix?
pc <- princomp(data)
pc$loadings
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## Length.cm   0.942  0.328       
## Breadth.cm  0.153 -0.217 -0.964
## Depth.cm    0.299 -0.920  0.254
## 
##                Comp.1 Comp.2 Comp.3
## SS loadings     1.000  1.000  1.000
## Proportion Var  0.333  0.333  0.333
## Cumulative Var  0.333  0.667  1.000

(h) Calculate is the projection scores of the first observation on the first principal component.

We can use pred() to calculate projection scores of observation. And here shows the first observation on the 1st PC.

# (h) Calculate is the projection scores of the first observation on the first principal component.
pred <- predict(pc, data)
pred[1,1]
##    Comp.1 
## -3.752602

(i) What is the loadings (vector correlations) of the first variable on the first principal components?

The way to check loadings of PC is quite simple, just call loadings feature of PC!
The answer is 0.942(Comp.1 & Length.cm)

# (i) What is the loadings (vector correlations) of the first variable on the first principal components?
pc$loadings
## 
## Loadings:
##            Comp.1 Comp.2 Comp.3
## Length.cm   0.942  0.328       
## Breadth.cm  0.153 -0.217 -0.964
## Depth.cm    0.299 -0.920  0.254
## 
##                Comp.1 Comp.2 Comp.3
## SS loadings     1.000  1.000  1.000
## Proportion Var  0.333  0.333  0.333
## Cumulative Var  0.333  0.667  1.000