For this exercise I use the r dataset called mtcars
Motor Trend Auto Road Testing
Description
The data was extracted from the 1974 American magazine Motor Trend and includes fuel consumption and 10 aspects of automobile design and performance for 32 cars (1973–74 models).
Use : mtcars Format : A data frame with 32 observations on 11 (numeric) variables.
[, 1] mpg Miles / (US) Gallon [, 2] cyl Number of cylinders [, 3] disp Displacement (cubic inches) [.4] hp Gross horsepower [, 5] drat Rear axle ratio [, 6] wt Weight (1000 lbs) [, 7] qsec 1/4 mile of time [, 8] in front of the motor (0 = V-shaped, 1 = straight) [, 9] a. M. Transmission (0 = automatic, 1 = manual) [, 10] gear Number of forward gears [, 11] carb Number of carburettors
Note Henderson and Velleman (1981) comment in a footnote to Table 1: ‘Hocking [original transcriber]’ noncritical encoding of Mazda’s rotary engine as an inline six-cylinder engine and Porsche’s flatbed engine as an inline engine. V, as well as the inclusion of the Mercedes 240D diesel, have been retained to allow direct comparisons with previous reviews.
Source Henderson and Velleman (1981), Building Multiple Regression Models Interactively. Biometrics, 37, 391–411.
I load the data
library(bpca)
library(scatterplot3d)
library(rgl)
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
prcomp(mtcars)
## Standard deviations (1, .., p=11):
## [1] 136.5330479 38.1480776 3.0710166 1.3066508 0.9064862 0.6635411
## [7] 0.3085791 0.2859604 0.2506973 0.2106519 0.1984238
##
## Rotation (n x k) = (11 x 11):
## PC1 PC2 PC3 PC4 PC5
## mpg -0.038118199 0.009184847 0.982070847 0.047634784 -0.08832843
## cyl 0.012035150 -0.003372487 -0.063483942 -0.227991962 0.23872590
## disp 0.899568146 0.435372320 0.031442656 -0.005086826 -0.01073597
## hp 0.434784387 -0.899307303 0.025093049 0.035715638 0.01655194
## drat -0.002660077 -0.003900205 0.039724928 -0.057129357 -0.13332765
## wt 0.006239405 0.004861023 -0.084910258 0.127962867 -0.24354296
## qsec -0.006671270 0.025011743 -0.071670457 0.886472188 -0.21416101
## vs -0.002729474 0.002198425 0.004203328 0.177123945 -0.01688851
## am -0.001962644 -0.005793760 0.054806391 -0.135658793 -0.06270200
## gear -0.002604768 -0.011272462 0.048524372 -0.129913811 -0.27616440
## carb 0.005766010 -0.027779208 -0.102897231 -0.268931427 -0.85520810
## PC6 PC7 PC8 PC9 PC10
## mpg -0.143790084 -0.039239174 2.271040e-02 -0.002790139 0.030630361
## cyl -0.793818050 0.425011021 -1.890403e-01 0.042677206 0.131718534
## disp 0.007424138 0.000582398 -5.841464e-04 0.003532713 -0.005399132
## hp 0.001653685 -0.002212538 4.748087e-06 -0.003734085 0.001862554
## drat 0.227229260 0.034847411 -9.385817e-01 -0.014131110 0.184102094
## wt -0.127142296 -0.186558915 1.561907e-01 -0.390600261 0.829886844
## qsec -0.189564973 0.254844548 -1.028515e-01 -0.095914479 -0.204240658
## vs 0.102619063 -0.080788938 -2.132903e-03 0.684043835 0.303060724
## am 0.205217266 0.200858874 -2.273255e-02 -0.572372433 -0.162808201
## gear 0.334971103 0.801625551 2.174878e-01 0.156118559 0.203540645
## carb -0.283788381 -0.165474186 3.972219e-03 0.127583043 -0.239954748
## PC11
## mpg -0.0158569365
## cyl 0.1454453628
## disp 0.0009420262
## hp -0.0021526102
## drat -0.0973818815
## wt -0.0198581635
## qsec 0.0110677880
## vs 0.6256900918
## am 0.7331658036
## gear -0.1909325849
## carb 0.0557957968
plot(prcomp(mtcars))
summary(prcomp(mtcars))
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 136.533 38.14808 3.07102 1.30665 0.90649 0.66354 0.3086
## Proportion of Variance 0.927 0.07237 0.00047 0.00008 0.00004 0.00002 0.0000
## Cumulative Proportion 0.927 0.99937 0.99984 0.99992 0.99996 0.99998 1.0000
## PC8 PC9 PC10 PC11
## Standard deviation 0.286 0.2507 0.2107 0.1984
## Proportion of Variance 0.000 0.0000 0.0000 0.0000
## Cumulative Proportion 1.000 1.0000 1.0000 1.0000
plot(prcomp(mtcars,scale=T))
#Analysis Biplot
summary(prcomp(mtcars,scale=T))
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5707 1.6280 0.79196 0.51923 0.47271 0.46000 0.3678
## Proportion of Variance 0.6008 0.2409 0.05702 0.02451 0.02031 0.01924 0.0123
## Cumulative Proportion 0.6008 0.8417 0.89873 0.92324 0.94356 0.96279 0.9751
## PC8 PC9 PC10 PC11
## Standard deviation 0.35057 0.2776 0.22811 0.1485
## Proportion of Variance 0.01117 0.0070 0.00473 0.0020
## Cumulative Proportion 0.98626 0.9933 0.99800 1.0000
prcomp(mtcars,scale=T)
## Standard deviations (1, .., p=11):
## [1] 2.5706809 1.6280258 0.7919579 0.5192277 0.4727061 0.4599958 0.3677798
## [8] 0.3505730 0.2775728 0.2281128 0.1484736
##
## Rotation (n x k) = (11 x 11):
## PC1 PC2 PC3 PC4 PC5 PC6
## mpg -0.3625305 0.01612440 -0.22574419 -0.022540255 0.10284468 -0.10879743
## cyl 0.3739160 0.04374371 -0.17531118 -0.002591838 0.05848381 0.16855369
## disp 0.3681852 -0.04932413 -0.06148414 0.256607885 0.39399530 -0.33616451
## hp 0.3300569 0.24878402 0.14001476 -0.067676157 0.54004744 0.07143563
## drat -0.2941514 0.27469408 0.16118879 0.854828743 0.07732727 0.24449705
## wt 0.3461033 -0.14303825 0.34181851 0.245899314 -0.07502912 -0.46493964
## qsec -0.2004563 -0.46337482 0.40316904 0.068076532 -0.16466591 -0.33048032
## vs -0.3065113 -0.23164699 0.42881517 -0.214848616 0.59953955 0.19401702
## am -0.2349429 0.42941765 -0.20576657 -0.030462908 0.08978128 -0.57081745
## gear -0.2069162 0.46234863 0.28977993 -0.264690521 0.04832960 -0.24356284
## carb 0.2140177 0.41357106 0.52854459 -0.126789179 -0.36131875 0.18352168
## PC7 PC8 PC9 PC10 PC11
## mpg 0.367723810 -0.754091423 0.235701617 0.13928524 -0.124895628
## cyl 0.057277736 -0.230824925 0.054035270 -0.84641949 -0.140695441
## disp 0.214303077 0.001142134 0.198427848 0.04937979 0.660606481
## hp -0.001495989 -0.222358441 -0.575830072 0.24782351 -0.256492062
## drat 0.021119857 0.032193501 -0.046901228 -0.10149369 -0.039530246
## wt -0.020668302 -0.008571929 0.359498251 0.09439426 -0.567448697
## qsec 0.050010522 -0.231840021 -0.528377185 -0.27067295 0.181361780
## vs -0.265780836 0.025935128 0.358582624 -0.15903909 0.008414634
## am -0.587305101 -0.059746952 -0.047403982 -0.17778541 0.029823537
## gear 0.605097617 0.336150240 -0.001735039 -0.21382515 -0.053507085
## carb -0.174603192 -0.395629107 0.170640677 0.07225950 0.319594676
plot(prcomp(mtcars,scale=T)$x[,1:2])
plot(prcomp(mtcars,scale=T)$x[,1:2],type="n")
text(prcomp(mtcars,scale=T)$x[,1:2],rownames(mtcars))
biplot(prcomp(mtcars,scale=T))
Comments
By analyzing the data, I can look at fuel consumption and 10 aspects of car design and performance for 32 cars (1973–74 models). Analyzing the standard deviations that are the eigenvalues of the correlation matrix, and that represent the variability in each component. The first component is 89.9% represented by disp (cubic inch displacement). Making its graphical representation it is observed that the first variable is the one with the greatest relative importance and making a numerical analysis of it, we observe that this main component CP1 represents the variability of the data or a standard deviation of 136.56.
In the first table of component importance, it is observed that the proportion of variance explained by the first component PC1 is 92.7%, that is, it is practically the only relevant one, and therefore the one that will have the most influence on the final result. . . The second most influential component is PC2, which is horsepower dependent with -89.9, the next largest variable.
I do the analysis again with the standardized data and we can see that with the first two components that we collect they are practically 99.9% of the variability. This denotes that a graph represented by the principal components is sufficiently representative. Through the biplot analysis, I can see that PC1 orders the weighted average of the original variables in the component, it would be ordered from Cyl with 37.3 to mpg with -36.2. PC2 would be ordered from gear in a weighted sense opposite to qsec.
I am left with only the two main components, since their contribution to the variance is sufficient to validate the model.