Carry out the logistic regression (Example 22 on Page 94) in R using the data
x 0.1 0.5 1.0 1.5 2.0 2.5
y 0 0 1 1 1 0
The formula is \(y(x) = \frac{1}{1 + exp[-(a + bx)]}\)
Solution:
x <- c(0.1, 0.5, 1.0, 1.5, 2.0, 2.5)
y <- c(0, 0, 1, 1, 1, 0)
glm(y ~ x, family = "binomial")
##
## Call: glm(formula = y ~ x, family = "binomial")
##
## Coefficients:
## (Intercept) x
## -0.8982 0.7099
##
## Degrees of Freedom: 5 Total (i.e. Null); 4 Residual
## Null Deviance: 8.318
## Residual Deviance: 7.832 AIC: 11.83
The logistic regression for the x and y data is seen above.
Using the motor car database (mtcars) of the built-in data sets in R to carry out the basic principal component analysis and explain your results
Solution:
mtcars_pca <- prcomp(mtcars, scale = TRUE)
summary(mtcars_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5707 1.6280 0.79196 0.51923 0.47271 0.46000 0.3678
## Proportion of Variance 0.6008 0.2409 0.05702 0.02451 0.02031 0.01924 0.0123
## Cumulative Proportion 0.6008 0.8417 0.89873 0.92324 0.94356 0.96279 0.9751
## PC8 PC9 PC10 PC11
## Standard deviation 0.35057 0.2776 0.22811 0.1485
## Proportion of Variance 0.01117 0.0070 0.00473 0.0020
## Cumulative Proportion 0.98626 0.9933 0.99800 1.0000
str(mtcars_pca)
## List of 5
## $ sdev : num [1:11] 2.571 1.628 0.792 0.519 0.473 ...
## $ rotation: num [1:11, 1:11] -0.363 0.374 0.368 0.33 -0.294 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...
## .. ..$ : chr [1:11] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:11] 20.09 6.19 230.72 146.69 3.6 ...
## ..- attr(*, "names")= chr [1:11] "mpg" "cyl" "disp" "hp" ...
## $ scale : Named num [1:11] 6.027 1.786 123.939 68.563 0.535 ...
## ..- attr(*, "names")= chr [1:11] "mpg" "cyl" "disp" "hp" ...
## $ x : num [1:32, 1:11] -0.647 -0.619 -2.736 -0.307 1.943 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## .. ..$ : chr [1:11] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
Using R we can see the principal component analysis above. There are 11 different objects shown, one for each column in the mtcars dataset.
Generate a random 4 X 5 matrix, and find its singular value decomposition using R.
Solution:
M <- matrix(rnorm(20), nrow = 4)
svd(M)
## $d
## [1] 3.522799 2.982492 2.192692 1.626184
##
## $u
## [,1] [,2] [,3] [,4]
## [1,] 0.3059576 -0.2550316 0.91346729 0.08322449
## [2,] -0.4395004 0.8159946 0.36841600 0.07253830
## [3,] -0.5597012 -0.3110977 0.16887769 -0.74928839
## [4,] -0.6324270 -0.4151263 0.03643463 0.65297687
##
## $v
## [,1] [,2] [,3] [,4]
## [1,] 0.42797045 0.41857108 -0.4682108 0.6033593
## [2,] 0.12803945 -0.02221780 -0.7081398 -0.3969688
## [3,] -0.76595481 0.08261435 -0.1468200 0.5122367
## [4,] -0.05625177 0.89375040 0.2800240 -0.2834539
## [5,] -0.45890757 0.13671030 -0.4234939 -0.3682943
The single value decomposition can be seen above.
First try to simulate 100 data points for y using \(y = 5 x_{1} + 2 x_{2} + 2 x_{3} + x_{4}.\) where \(x, x_{2}\) are uniformly distributed in [1,2], while \(x_{3} , x_{4}\) are normally distributed with zero mean and unit variance. Then, use the principal component analysis (PCA) to analyze the data to find its principal components. Are the results expected from the formula?
Solution:
x1 <- runif(100, min = 1, max = 2)
x2 <- runif(100, min = 1, max = 2)
x3 <- rnorm(100, mean = 0, sd = 1)
x4 <- rnorm(100, mean = 0, sd = 1)
y <- 5 * x1 + 2 * x2 + 2 * x3 + x4
data_frame <- as.data.frame(cbind(y, x1, x2, x3, x4))
data_pca <- prcomp(data_frame, scale = TRUE)
summary(data_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 1.4236 1.0953 0.9891 0.8919 6.33e-16
## Proportion of Variance 0.4053 0.2399 0.1956 0.1591 0.00e+00
## Cumulative Proportion 0.4053 0.6453 0.8409 1.0000 1.00e+00
str(data_pca)
## List of 5
## $ sdev : num [1:5] 1.42 1.10 9.89e-01 8.92e-01 6.33e-16
## $ rotation: num [1:5, 1:5] 0.7013 0.2837 0.0995 0.5932 0.2566 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:5] "y" "x1" "x2" "x3" ...
## .. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
## $ center : Named num [1:5] 10.6616 1.4816 1.4931 0.151 -0.0345
## ..- attr(*, "names")= chr [1:5] "y" "x1" "x2" "x3" ...
## $ scale : Named num [1:5] 2.701 0.247 0.286 1.042 0.975
## ..- attr(*, "names")= chr [1:5] "y" "x1" "x2" "x3" ...
## $ x : num [1:100, 1:5] -0.296 1.869 0.283 -0.701 2.996 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:5] "PC1" "PC2" "PC3" "PC4" ...
## - attr(*, "class")= chr "prcomp"
The PCA for the data can be seen above. We see that the results are what you would expect from the formula.