Homework 6 Question 9

Bill

November 19, 2015

Question

8.27. The pulp and paper properties data is given in Table 7.7. Using the four paper variables, BL (breaking length), EM (elastic modulus), SF (Stress at failure) and BS (burst strength), perform a principal component analysis using the covariance matrix S and the correlation matrix R. Your analysis should include the following:

  1. Determine the appropriate number of components to effectively summarize variability. Construct a scree plot to aid in your determination.

  2. Interpret the sample principal components.

  3. Do you think it is possible to develop a “paper strength” index that effectively contains the information in the four paper variables? Explain.

  4. Using the values for the first two principal components, plot the data in a two dimensional space with \(\hat{y_1}\) along the vertical axis and \(\hat{y_2}\) along the horizontal axis. Identify any outliers in this data set.

Let’s look at the data:

##          BL    EM    SF    BS
## [1,] 21.312 7.039 5.326 0.932
## [2,] 21.206 6.979 5.237 0.871
## [3,] 20.709 6.779 5.060 0.742
## [4,] 19.542 6.601 4.479 0.513
## [5,] 20.449 6.795 4.912 0.577
## [6,] 20.841 6.919 5.108 0.784

Matrix S and Matrix R

The covariance matrix S is:

##          BL        EM        SF        BS
## BL 8.302871 1.8866363 4.1473181 1.9720562
## EM 1.886636 0.5133593 0.9875851 0.4343071
## SF 4.147318 0.9875851 2.1400458 0.9879663
## BS 1.972056 0.4343071 0.9879663 0.4802721

The correlation matrix R is:

##           BL        EM        SF        BS
## BL 1.0000000 0.9138256 0.9838790 0.9875554
## EM 0.9138256 1.0000000 0.9422199 0.8746665
## SF 0.9838790 0.9422199 1.0000000 0.9745114
## BS 0.9875554 0.8746665 0.9745114 1.0000000

Eigenanalysis

## $values
## [1] 11.29500861  0.10362052  0.03186923  0.00604974
## 
## $vectors
##           [,1]       [,2]       [,3]       [,4]
## [1,] 0.8564777 -0.3639203  0.3315412 -0.1552046
## [2,] 0.1975731  0.7858627  0.4973116  0.3099449
## [3,] 0.4312707  0.4576816 -0.7330536 -0.2591634
## [4,] 0.2035103 -0.2012694 -0.3246446  0.9014877
## $values
## [1] 3.839505765 0.140304022 0.012603928 0.007586285
## 
## $vectors
##            [,1]        [,2]        [,3]       [,4]
## [1,] -0.5061685 -0.26110200  0.56517738  0.5968196
## [2,] -0.4854922  0.81904792  0.19350510 -0.2366720
## [3,] -0.5080684 -0.02020866 -0.80019598  0.3180323
## [4,] -0.4999573 -0.51046828  0.05307262 -0.6976017

Eigenanalysis

Using the given formula to calculate the porpotion of the variance,

For the Matrix S, the eigenanlysis is

##                  [,1]        [,2]        [,3]         [,4]
## Eigenvalue 11.2950086 0.103620523 0.031869230 0.0060497404
## Proportion  0.9876239 0.009060472 0.002786613 0.0005289831
## Cumulative  0.9876239 0.996684404 0.999471017 1.0000000000
##          PC1        PC2        PC3        PC4
## BL 0.8564777 -0.3639203  0.3315412 -0.1552046
## EM 0.1975731  0.7858627  0.4973116  0.3099449
## SF 0.4312707  0.4576816 -0.7330536 -0.2591634
## BS 0.2035103 -0.2012694 -0.3246446  0.9014877

Eigenanalysis

According to the results obtained from covariance matrix, the total sample variance explained by the friist one princi0ple component is approximately 98.8% of the total sample variance which indicates the datais summarized in the one dimension. The first principle component for the covariance matrix is given as below:

\[\hat{y_1}=\hat{e_1}'\underline{X}=0.856BL+0.198EM+0.431SF+0.204BS\]

Eigenanalysis

For the Matrix R, the eigenanlysis is

##                 [,1]       [,2]        [,3]        [,4]
## Eigenvalue 3.8395058 0.14030402 0.012603928 0.007586285
## Proportion 0.9598764 0.03507601 0.003150982 0.001896571
## Cumulative 0.9598764 0.99495245 0.998103429 1.000000000
##           PC1         PC2         PC3        PC4
## BL -0.5061685 -0.26110200  0.56517738  0.5968196
## EM -0.4854922  0.81904792  0.19350510 -0.2366720
## SF -0.5080684 -0.02020866 -0.80019598  0.3180323
## BS -0.4999573 -0.51046828  0.05307262 -0.6976017

Eigenanalysis

According to the results obtained from correlation matrix, the total sample variance explained by the friist one princi0ple component is approximately 96.0% of the total sample variance which indicates the datais summarized in the one dimension. The first principle component for the covariance matrix is given as below:

\[\hat{y_1}=\hat{e_1}'\underline{X}=0.056BL+0.485EM+0.508SF+0.5BS\]

Find outliers

Basing on the output from above analysis of the correlation matrix and covariance matrix, the data is summarized in the two or fewer dimensions. The two dimensions is depends on the four variables. The scree plots obtained is suggesting that all the variables load equally on one principle component, which can be used to summarize the paper properties data. Thus, the first principle component can be called as index of the index of the paper strength.

Using the Values for the first two principle components, plot the data as following:

Matrix S

From the scatter plot obtained, it can be seen that two outliers exist.

Matrix R

From the scatter plot obtained, it can be seen that many outliers exist.