library(knitr)
Ex. 1 – Carry out the logistic regression (Example 22 on Page 94) in R using the data
| x | 0.1 | 0.5 | 1.5 | 2 | 2.5 |
| y | 0.0 | 1.0 | 1.0 | 1 | 0.0 |
The formula is:
\[y(x) = \frac{1}{1 + e^{-(a+bx)}}\]
p_i <- function(a_i = 1, b_i = 1, x_val) {
fxn <- 1/(1+exp(-1*(a_i + b_i*(x_val)))) #formula
return(fxn)
}
p_i(x_val = x)
## [1] 0.7502601 0.8175745 0.9241418 0.9525741 0.9706878
Above are the values for \(P_i\) according to binary logistic regression
log_lik <- function(x_i, y_i) {
p_xi <- p_i(x_val = x_i)
fxn <- y_i * log(p_xi) + (1-y_i) * log(1-p_xi)
return(fxn)
}
log_lik(x_i = x ,y_i = y) #L_i
## [1] -1.38733533 -0.20141328 -0.07888973 -0.04858735 -3.52975042
sum(log_lik(x_i = x ,y_i = y)) #log-likelihood objective, assuming a_i = b_i = 1
## [1] -5.245976
Additionally, here is \(L_i\) and log-likelihood objective.
Using the equation that logistic function \(S(x)\) has derivative \(S'(x) = S(x)(1 - S(x))\), the Jacobian can quickly be derived as:
\[\begin {bmatrix} S(x)(1-S(x)) \\ -x * S(x)(1-S(x)) \end {bmatrix}\]
Using code from the previous module to carry out the Newton method for estimating parameters.
a_0 <- 1
b_0 <- 1
initial_params <- matrix(a_0,b_0, nrow = 2)
drda <- p_i(x_val = x) * (1 - p_i(x_val = x))
drdb <- -x * p_i(x_val = x) * (1 - p_i(x_val = x))
ex1_j <- cbind(drda,drdb) #initial jacobian
print(ex1_j)
## drda drdb
## [1,] 0.18736988 -0.01873699
## [2,] 0.14914645 -0.07457323
## [3,] 0.07010372 -0.10515557
## [4,] 0.04517666 -0.09035332
## [5,] 0.02845302 -0.07113256
ex1_resid <- matrix(y - p_i(x_val = x)) #initial residuals
print(ex1_resid)
## [,1]
## [1,] -0.75026011
## [2,] 0.18242552
## [3,] 0.07585818
## [4,] 0.04742587
## [5,] -0.97068777
Where \(\frac{dR}{da} = -\frac{e^{-(a+bx)}}{(1+e^(-(a+bx)))^2}\) and \(\frac{dR}{da} = -\frac{xe^{-(a+bx)}}{(1+e^(-(a+bx)))^2}\)
initial_input <- list(initial_params, ex1_resid, ex1_j)
ex1_solve <- function(list_input = initial_input) {
params <- list_input[[1]]
R <- list_input[[2]]
J <- list_input[[3]]
newton <- params - solve(t(J) %*% J) %*% t(J) %*% R #gauss-newton algorithm
ex1_solve.result <- newton #t + 1 iteration of parameter estimates
p_xhat <- p_i(a_i = ex1_solve.result[1], b_i = ex1_solve.result[2], x_val = x) #calculate new S(x) with new parameters
ex1_solve.resid <- y - p_xhat #calculate new residual
ex1_solve.drda <- p_xhat * (1 - p_xhat) #update parameters in dRda
ex1_solve.drdb <- -x * p_xhat * (1 - p_xhat) #update parameters in dRdb
ex1_solve.jacobian <- cbind(ex1_solve.drda,ex1_solve.drdb) #recalculate jacobian using above derivatives
return(list(ex1_solve.result,ex1_solve.resid, ex1_solve.jacobian)) #return list to allow for multiple iterations
}
first_iteration <- ex1_solve()
print(first_iteration)
## [[1]]
## [,1]
## drda 3.060204
## drdb 1.022345
##
## [[2]]
## [1] -0.95939605 0.02734817 0.01001386 0.00603040 -0.99637426
##
## [[3]]
## ex1_solve.drda ex1_solve.drdb
## [1,] 0.038955271 -0.003895527
## [2,] 0.026600252 -0.013300126
## [3,] 0.009913581 -0.014870372
## [4,] 0.005994035 -0.011988069
## [5,] 0.003612591 -0.009031477
After trying several iterations of this, my build of the algorithm does not find adequate parameters for minimizing the sum of the residuals. Next, I’ll try using the glm function built into R.
ex1_glm <- glm(y ~ x, family = binomial(link = 'logit'))
print(ex1_glm)
##
## Call: glm(formula = y ~ x, family = binomial(link = "logit"))
##
## Coefficients:
## (Intercept) x
## 0.35127 0.04116
##
## Degrees of Freedom: 4 Total (i.e. Null); 3 Residual
## Null Deviance: 6.73
## Residual Deviance: 6.728 AIC: 10.73
log_lik_2 <- function(x_i, y_i) {
p_xi <- p_i(a_i = ex1_glm$coefficients[[1]], b_i = ex1_glm$coefficients[[2]], x_val = x_i)
fxn <- y_i * log(p_xi) + (1-y_i) * log(1-p_xi)
return(fxn)
}
log_lik_2(x, y)
## [1] -0.8865435 -0.5244083 -0.5078140 -0.4996694 -0.9457999
sum(log_lik_2(x_i = x, y_i = y))
## [1] -3.364235
plot(p_i(a_i = ex1_glm$coefficients[[1]], b_i = ex1_glm$coefficients[[2]], x_val = x))
The parameters of \(a = 0.35127, b = 0.04116\) appear to maximize the sum log-likelihood around approximately -3.364.
Ex. 2 – Using the motor car database (mtcars) of the built-in datasets in R, carry out the basic principal component analysis and explain your result.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
summary(prcomp(mtcars, scale = TRUE))
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5707 1.6280 0.79196 0.51923 0.47271 0.46000 0.3678
## Proportion of Variance 0.6008 0.2409 0.05702 0.02451 0.02031 0.01924 0.0123
## Cumulative Proportion 0.6008 0.8417 0.89873 0.92324 0.94356 0.96279 0.9751
## PC8 PC9 PC10 PC11
## Standard deviation 0.35057 0.2776 0.22811 0.1485
## Proportion of Variance 0.01117 0.0070 0.00473 0.0020
## Cumulative Proportion 0.98626 0.9933 0.99800 1.0000
mtcars is a dataframe that contains characteristics of 32 different cars and their impact on several indicators of vehicle performance. Using the prcomp function returns two major lists: the first shows the contribution to to variance that each principal component adds. The 11 components are ranked by size of the standard deviation, with the first component being the largest. The easiest way to interpret the importance of the component is looking at the proportion of variance it explains. For instance, the first component explains 60% of variance between car features. Beyond this, we can also see that the first five components explain nearly 95% of variance. This is is a promising finding, as it means of the 11 original variables, there are some that could be omitted in a preliminary model to explain the relationship between car design and performance. Next, lets look what the components are made of.
prcomp(mtcars, scale = TRUE)
## Standard deviations (1, .., p=11):
## [1] 2.5706809 1.6280258 0.7919579 0.5192277 0.4727061 0.4599958 0.3677798
## [8] 0.3505730 0.2775728 0.2281128 0.1484736
##
## Rotation (n x k) = (11 x 11):
## PC1 PC2 PC3 PC4 PC5 PC6
## mpg -0.3625305 0.01612440 -0.22574419 -0.022540255 0.10284468 -0.10879743
## cyl 0.3739160 0.04374371 -0.17531118 -0.002591838 0.05848381 0.16855369
## disp 0.3681852 -0.04932413 -0.06148414 0.256607885 0.39399530 -0.33616451
## hp 0.3300569 0.24878402 0.14001476 -0.067676157 0.54004744 0.07143563
## drat -0.2941514 0.27469408 0.16118879 0.854828743 0.07732727 0.24449705
## wt 0.3461033 -0.14303825 0.34181851 0.245899314 -0.07502912 -0.46493964
## qsec -0.2004563 -0.46337482 0.40316904 0.068076532 -0.16466591 -0.33048032
## vs -0.3065113 -0.23164699 0.42881517 -0.214848616 0.59953955 0.19401702
## am -0.2349429 0.42941765 -0.20576657 -0.030462908 0.08978128 -0.57081745
## gear -0.2069162 0.46234863 0.28977993 -0.264690521 0.04832960 -0.24356284
## carb 0.2140177 0.41357106 0.52854459 -0.126789179 -0.36131875 0.18352168
## PC7 PC8 PC9 PC10 PC11
## mpg 0.367723810 -0.754091423 0.235701617 0.13928524 -0.124895628
## cyl 0.057277736 -0.230824925 0.054035270 -0.84641949 -0.140695441
## disp 0.214303077 0.001142134 0.198427848 0.04937979 0.660606481
## hp -0.001495989 -0.222358441 -0.575830072 0.24782351 -0.256492062
## drat 0.021119857 0.032193501 -0.046901228 -0.10149369 -0.039530246
## wt -0.020668302 -0.008571929 0.359498251 0.09439426 -0.567448697
## qsec 0.050010522 -0.231840021 -0.528377185 -0.27067295 0.181361780
## vs -0.265780836 0.025935128 0.358582624 -0.15903909 0.008414634
## am -0.587305101 -0.059746952 -0.047403982 -0.17778541 0.029823537
## gear 0.605097617 0.336150240 -0.001735039 -0.21382515 -0.053507085
## carb -0.174603192 -0.395629107 0.170640677 0.07225950 0.319594676
This next call to prcomp offers a look at what each component is composed of. The first component relates many of the features together in correlations that would make sense in the real world. For example, miles per gallon (mpg) and weight (wt) are opposite signs, while number of cylinders (cyl) and engine size (disp) are same signs. Also notable to pick out are these variables have the largest weights relative to other features. In the second variable, note that mpg and cyl contribute far less rotation, with coefficients of 0.016 and 0.043. Instead, PC2 offers relationships between the complexity of the engine - transmission (am), gears, and carburetors (carb) are the largest contributors, while quarter mile time (qsec) is a large magnitude in the other direction.
Ex. 3 – Generate a random 4 x 5 matrix, and find its singular value decomposition using R.
random_matrix <- matrix(rnorm(20),nrow=4)
ex3_svd <- svd(random_matrix)
The svd function in R returns the three matrices that comprise the decomposition: for the starting matrix A, the function returns D, the diagonal matrix, as well as U and V, the left and right orthonormal matrices. They have the following equivalence:
\[A = U D V^T\] We can confirm this by performing matrix math in R
D <-diag(ex3_svd$d) #make vector diagonal
U <-ex3_svd$u
V <-ex3_svd$v
U %*% D %*% t(V)
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.1976013 0.8916718 1.3009164 0.7757679 1.77833463
## [2,] -0.9299211 -1.0985724 0.1406240 -0.6759909 0.02783712
## [3,] -1.2169035 -0.7583808 0.2140155 -0.2338809 0.56282008
## [4,] -0.9955845 1.1041622 0.8510487 0.8643016 -0.14368113
random_matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.1976013 0.8916718 1.3009164 0.7757679 1.77833463
## [2,] -0.9299211 -1.0985724 0.1406240 -0.6759909 0.02783712
## [3,] -1.2169035 -0.7583808 0.2140155 -0.2338809 0.56282008
## [4,] -0.9955845 1.1041622 0.8510487 0.8643016 -0.14368113
Ex. 4 – First try to simulate 100 data points for y using \(y = 5x_1 + 2x_2 + 2x_3 + x_4\) where \(x_1, x_2\) are uniformly distributed in [1,2] while \(x_3, x_4\) are normally distributed with zero mean and unit variance. Then, use the principal component analysis (PCA) to analyze the data to find its principal components. Are the results expected from the formula?
y_matrix <- function(seed=1234) {
set.seed(seed)
x_1 <- runif(100, min = 1, max = 2)
x_2 <- runif(100, min = 1, max = 2)
x_3 <- rnorm(100, mean = 0, sd = 1)
x_4 <- rnorm(100, mean = 0, sd = 1)
y_func <- cbind(5*x_1, 2*x_2, 2*x_3, x_4)
return(y_func)
}
ex4_y <- y_matrix(1028)
colnames(ex4_y) <- c('5x_1', '2x_2', '2x_3', 'x_4')
summary(ex4_y)
## 5x_1 2x_2 2x_3 x_4
## Min. :5.041 Min. :2.001 Min. :-5.6042 Min. :-1.7775
## 1st Qu.:6.186 1st Qu.:2.423 1st Qu.:-1.2634 1st Qu.:-0.6808
## Median :7.886 Median :2.931 Median :-0.1838 Median : 0.1044
## Mean :7.668 Mean :2.918 Mean :-0.2489 Mean : 0.1036
## 3rd Qu.:9.143 3rd Qu.:3.433 3rd Qu.: 1.0010 3rd Qu.: 0.7353
## Max. :9.983 Max. :3.997 Max. : 4.6198 Max. : 2.3147
sd(ex4_y[,1])
## [1] 1.572789
sd(ex4_y[,2])
## [1] 0.5819646
sd(ex4_y[,3])
## [1] 1.845164
sd(ex4_y[,4])
## [1] 0.9983244
To begin, lets make sure the statistics for each column make sense. Columns for x_3 and x_4 are approximately centered around zero as expected. Standard deviation for x_4 is pretty close to one, and the expected standard deviations for \(2x_3\) should be approximately \(Var(2x_3) = 2^2 Var(x_3)\) or 2. The expected mean and standard deviations of x_1 and x_2 are 1.5 and ~0.289, values for \(5x_1\) and \(2x_2\) are approximately 5 and 2 times that.
summary(prcomp(ex4_y, scale = TRUE))
## Importance of components:
## PC1 PC2 PC3 PC4
## Standard deviation 1.133 0.9821 0.9490 0.9226
## Proportion of Variance 0.321 0.2411 0.2251 0.2128
## Cumulative Proportion 0.321 0.5621 0.7872 1.0000
After scaling, PCA indicates that there are in fact four features of approximately equal importance. Proportion of variance for each is between 0.2121 - 0.321, which means each random variable is approximately 25% contributor to variance.
prcomp(ex4_y)
## Standard deviations (1, .., p=4):
## [1] 1.8745813 1.5484787 0.9861738 0.5738024
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## 5x_1 -0.26998634 -0.96119570 0.04228371 0.03771314
## 2x_2 -0.04843510 -0.02675898 -0.02734377 -0.99809334
## 2x_3 -0.95662569 0.27416103 0.09146524 0.03656671
## x_4 0.09812595 0.01491655 0.99453432 -0.03240800
Looking at the unscaled rotation matrix, PC1 is made up primarily of \(2x_3\), which has the highest calculated standard deviation. The other three variables follow in the exact expected order as well.