Ex.1

Carry out the logistic regression (Example 22 on Page 94) in R using the data

x 0.1 0.5 1.0 1.5 2.0 2.5

y 0 0 1 1 1 0

The formula is \(y(x) = \frac{1}{1 + exp[-(a + bx)]}\)

Solution:

x <- c(0.1, 0.5, 1.0, 1.5, 2.0, 2.5)
y <- c(0, 0, 1, 1, 1, 0)
glm(y ~ x, family = "binomial")

## 
## Call:  glm(formula = y ~ x, family = "binomial")
## 
## Coefficients:
## (Intercept)            x  
##     -0.8982       0.7099  
## 
## Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
## Null Deviance:       8.318 
## Residual Deviance: 7.832     AIC: 11.83

The logistic regression for the x and y data is seen above.

Ex.2

Using the motor car database (mtcars) of the built-in data sets in R to carry out the basic principal component analysis and explain your results

Solution:

mtcars_pca <- prcomp(mtcars, scale = TRUE)
summary(mtcars_pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6    PC7
## Standard deviation     2.5707 1.6280 0.79196 0.51923 0.47271 0.46000 0.3678
## Proportion of Variance 0.6008 0.2409 0.05702 0.02451 0.02031 0.01924 0.0123
## Cumulative Proportion  0.6008 0.8417 0.89873 0.92324 0.94356 0.96279 0.9751
##                            PC8    PC9    PC10   PC11
## Standard deviation     0.35057 0.2776 0.22811 0.1485
## Proportion of Variance 0.01117 0.0070 0.00473 0.0020
## Cumulative Proportion  0.98626 0.9933 0.99800 1.0000

str(mtcars_pca)

## List of 5
##  $ sdev    : num [1:11] 2.571 1.628 0.792 0.519 0.473 ...
##  $ rotation: num [1:11, 1:11] -0.363 0.374 0.368 0.33 -0.294 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...
##   .. ..$ : chr [1:11] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:11] 20.09 6.19 230.72 146.69 3.6 ...
##   ..- attr(*, "names")= chr [1:11] "mpg" "cyl" "disp" "hp" ...
##  $ scale   : Named num [1:11] 6.027 1.786 123.939 68.563 0.535 ...
##   ..- attr(*, "names")= chr [1:11] "mpg" "cyl" "disp" "hp" ...
##  $ x       : num [1:32, 1:11] -0.647 -0.619 -2.736 -0.307 1.943 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
##   .. ..$ : chr [1:11] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"

Using R we can see the principal component analysis above. There are 11 different objects shown, one for each column in the mtcars dataset.

Ex.3

Generate a random 4 X 5 matrix, and find its singular value decomposition using R.

Solution:

M <- matrix(rnorm(20), nrow = 4)
svd(M)

## $d
## [1] 3.522799 2.982492 2.192692 1.626184
## 
## $u
##            [,1]       [,2]       [,3]        [,4]
## [1,]  0.3059576 -0.2550316 0.91346729  0.08322449
## [2,] -0.4395004  0.8159946 0.36841600  0.07253830
## [3,] -0.5597012 -0.3110977 0.16887769 -0.74928839
## [4,] -0.6324270 -0.4151263 0.03643463  0.65297687
## 
## $v
##             [,1]        [,2]       [,3]       [,4]
## [1,]  0.42797045  0.41857108 -0.4682108  0.6033593
## [2,]  0.12803945 -0.02221780 -0.7081398 -0.3969688
## [3,] -0.76595481  0.08261435 -0.1468200  0.5122367
## [4,] -0.05625177  0.89375040  0.2800240 -0.2834539
## [5,] -0.45890757  0.13671030 -0.4234939 -0.3682943

The single value decomposition can be seen above.

Ex.4

First try to simulate 100 data points for y using \(y = 5 x_{1} + 2 x_{2} + 2 x_{3} + x_{4}.\) where \(x, x_{2}\) are uniformly distributed in [1,2], while \(x_{3} , x_{4}\) are normally distributed with zero mean and unit variance. Then, use the principal component analysis (PCA) to analyze the data to find its principal components. Are the results expected from the formula?

Solution:

x1 <- runif(100, min = 1, max = 2)
x2 <- runif(100, min = 1, max = 2)
x3 <- rnorm(100, mean = 0, sd = 1)
x4 <- rnorm(100, mean = 0, sd = 1)
y <- 5 * x1 + 2 * x2 + 2 * x3 + x4
data_frame <- as.data.frame(cbind(y, x1, x2, x3, x4))
data_pca <- prcomp(data_frame, scale = TRUE)
summary(data_pca)

## Importance of components:
##                           PC1    PC2    PC3    PC4      PC5
## Standard deviation     1.4236 1.0953 0.9891 0.8919 6.33e-16
## Proportion of Variance 0.4053 0.2399 0.1956 0.1591 0.00e+00
## Cumulative Proportion  0.4053 0.6453 0.8409 1.0000 1.00e+00

str(data_pca)

## List of 5
##  $ sdev    : num [1:5] 1.42 1.10 9.89e-01 8.92e-01 6.33e-16
##  $ rotation: num [1:5, 1:5] 0.7013 0.2837 0.0995 0.5932 0.2566 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "y" "x1" "x2" "x3" ...
##   .. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:5] 10.6616 1.4816 1.4931 0.151 -0.0345
##   ..- attr(*, "names")= chr [1:5] "y" "x1" "x2" "x3" ...
##  $ scale   : Named num [1:5] 2.701 0.247 0.286 1.042 0.975
##   ..- attr(*, "names")= chr [1:5] "y" "x1" "x2" "x3" ...
##  $ x       : num [1:100, 1:5] -0.296 1.869 0.283 -0.701 2.996 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"

The PCA for the data can be seen above. We see that the results are what you would expect from the formula.

Data 609 Module 5 HW

Bryan Persaud

4/18/2021

Ex.1

Ex.2

Ex.3

Ex.4