Data 609 - Module 4 - Homework
Exercise 1
For Example 19 on Page 79 in the book, carry out the regression using R.
| x | -0.98 | 1.00 | 2.02 | 3.03 | 4.00 |
| y | 2.44 | -1.51 | -0.47 | 2.54 | 7.52 |
First we will solve for the equation by hand:
\(b = \frac{n \Sigma_{i=1}^{n}x_iy_i - (\Sigma_{i=1}^{n}x_i)(\Sigma_{i=1}^{n}y_i)}{n\Sigma_{i=1}^{n}x_i^{2}-(\Sigma_{i=1}^{n}x_i)^2}\)
\(a = \frac{1}{n}[\Sigma_{i=1}^{n}y_i - b\Sigma_{i=1}^{n}x_i]\)
df <- data.frame(x,y)
n <- 5
Sigma_xy <- sum(df$x*df$y)
Sigma_x <- sum(df$x)
Sigma_y <- sum(df$y)
Sigma_xsquared <- sum((df$x^2))
Sigma_x_squared <- Sigma_x^2
b <- ((n*Sigma_xy) - (Sigma_x*Sigma_y))/((n*Sigma_xsquared)-(Sigma_x_squared))
a <- (1/n)*(Sigma_y - (b*Sigma_x))
b## [1] 0.9372728
## [1] 0.4037871
This would return a function of \(y = 0.93.72728x + 0.4037871\)
Now, let’s solve using the lm function in R. This gives us the equation:
\(y = 0.9373x + 0.4038\)
##
## Call:
## lm(formula = y ~ ., data = df)
##
## Coefficients:
## (Intercept) x
## 0.4038 0.9373
Exercise 2
Implement the nonlinear curve-fitting of Example 20 on Page 83 for the following data:
| x | 0.1 | 0.50 | 1.0 | 1.5 | 2.00 | 2.50 |
| y | 0.1 | 0.28 | 0.4 | 0.4 | 0.37 | 0.32 |
Here we have \(R = y_i - \frac{x_i}{(a+bx_i^{2})}\)
R <- function(x,y,a,b){y - (x)/(a+b*x^2)}
a<-1
b<-1
residuals <- matrix(0,1,ncol(data))
for (col in 1:ncol(data)){
resid <- R(data[1,col],data[2,col],a,b)
residuals[1,col] <- resid
}
residuals## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.000990099 -0.12 -0.1 -0.06153846 -0.03 -0.02482759
partial_Ra <- function(x,a,b){x/(a+b*x^2)^2}
partial_Rb <- function(x,a,b){x^3/(a+b*x^2)^2}
jacobian <- matrix(0,ncol(data),nrow(data))
a <- 1
b <- 1
for (col in 1:ncol(data)){
jacobian[col,1] <- partial_Ra(data[1,col],a,b)
jacobian[col,2] <- partial_Rb(data[1,col],a,b)
}
jacobian## [,1] [,2]
## [1,] 0.09802960 0.000980296
## [2,] 0.32000000 0.080000000
## [3,] 0.25000000 0.250000000
## [4,] 0.14201183 0.319526627
## [5,] 0.08000000 0.320000000
## [6,] 0.04756243 0.297265161
The first iteration using the Gauss-Newton algorithm gives us: \({a \choose b} = {1 \choose 1} - (J^{T}J)^-1JR = {1.3449 \choose 1.0317}\)
## [,1]
## [1,] 1.344879
## [2,] 1.031709
a <- a2[1,1]
b <- a2[2,1]
residuals2 <- matrix(0,1,ncol(data))
for (col in 1:ncol(data)){
resid2 <- R(data[1,col],data[2,col],a,b)
residuals2[1,col] <- resid2
}
jacobian2 <- matrix(0,ncol(data),nrow(data))
for (col in 1:ncol(data)){
jacobian2[col,1] <- partial_Ra(data[1,col],a,b)
jacobian2[col,2] <- partial_Rb(data[1,col],a,b)
}
a3 <- a2 - solve(t(jacobian2) %*% jacobian2) %*% t(jacobian2) %*% t(residuals2)
a3## [,1]
## [1,] 1.474156
## [2,] 1.005853
a <- a3[1,1]
b <- a3[2,1]
residuals3 <- matrix(0,1,ncol(data))
for (col in 1:ncol(data)){
resid3 <- R(data[1,col],data[2,col],a,b)
residuals3[1,col] <- resid3
}
jacobian3 <- matrix(0,ncol(data),nrow(data))
for (col in 1:ncol(data)){
jacobian3[col,1] <- partial_Ra(data[1,col],a,b)
jacobian3[col,2] <- partial_Rb(data[1,col],a,b)
}
a4 <- a3 - solve(t(jacobian3) %*% jacobian3) %*% t(jacobian3) %*% t(residuals3)
a4## [,1]
## [1,] 1.485228
## [2,] 1.002223
It’s clear that the values are converging here to a = ~1.49 and b = ~1.
Exercise 3
For the data with binary y values, try to fit the following data
| x | 0.1 | 0.5 | 1 | 1.5 | 2 | 2.5 |
| y | 0.0 | 0.0 | 1 | 1.0 | 1 | 0.0 |
to the nonlinear function:
\(y =\frac{1}{1+e^{-(a+bx)}}\)
starting with a = 1 and b = 1.
We have \(R_i = y_i - \frac{1}{1+e^{-(a+bx)}}\)
The initial residuals are as follows:
R <- function(x,y,a,b){y - (1/(1+exp(-(a+b*x))))}
a<-1
b<-1
residuals <- matrix(0,1,ncol(data))
for (col in 1:ncol(data)){
resid <- R(data[1,col],data[2,col],a,b)
residuals[1,col] <- resid
}
residuals## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.7502601 -0.8175745 0.1192029 0.07585818 0.04742587 -0.9706878
Here we are trying to minimize the sum of residual squares. We want to minimize S here: \(S = \Sigma_{i=1}^{6} R_i^2 = \Sigma_{i=1}^{6}[y_i - \frac{1}{1+e^{-(a+bx)}}]^2, R_i = y_i - \frac{1}{1+e^{-(a+bx)}}\)
\(\frac{\partial R_i}{\partial a} = -\frac{e^{-a-bx}}{(e^{-a-bx}+1)^2}\) \(\frac{\partial R_i}{\partial b} = -\frac{xe^{-xb-a}}{(e^{xb+a}+1)^2}\)
We use the initial guess of a=1, b= 1 to get:
partial_Ra <- function(x,a,b){-(exp(-a-b*x))/(exp(-a-b*x)+1)^2}
partial_Rb <- function(x,a,b){-(x*exp(-x*b-a))/((exp(x*b+a)+1)^2)}
jacobian <- matrix(0,ncol(data),nrow(data))
for (col in 1:ncol(data)){
jacobian[col,1] <- partial_Ra(data[1,col],a,b)
jacobian[col,2] <- partial_Rb(data[1,col],a,b)
}
jacobian## [,1] [,2]
## [1,] -0.18736988 -0.0020761174
## [2,] -0.14914645 -0.0037127823
## [3,] -0.10499359 -0.0019230246
## [4,] -0.07010372 -0.0007085327
## [5,] -0.04517666 -0.0002239635
## [6,] -0.02845302 -0.0000648645
## [,1]
## [1,] -2.145022
## [2,] -24.444072
a <- a2[1,1]
b <- a2[2,1]
residuals2 <- matrix(0,1,ncol(data))
for (col in 1:ncol(data)){
resid2 <- R(data[1,col],data[2,col],a,b)
residuals2[1,col] <- resid2
}
jacobian2 <- matrix(0,ncol(data),nrow(data))
for (col in 1:ncol(data)){
jacobian2[col,1] <- partial_Ra(data[1,col],a,b)
jacobian2[col,2] <- partial_Rb(data[1,col],a,b)
}
#a3 <- a2 - solve(t(jacobian2) %*% jacobian2) %*% t(jacobian2) %*% t(residuals2)
#solve(t(jacobian2) %*% jacobian2)
library(pracma)##
## Attaching package: 'pracma'
## The following object is masked from 'package:purrr':
##
## cross
## [,1] [,2]
## [1,] 0 0
## [2,] 0 0
## [3,] 0 0
## [4,] 0 0
## [5,] 0 0
## [6,] 0 0
The Jacobian matrix isn’t full rank so we can’t proceed. \((JJ^T)^{-1}\) doesn’t exist here.
Exercise 4
Please set up Python environment on your computer (recommended installation would be ANACONDA), then go through the following codes:
https://github.com/ageron/handson-ml2/blob/master/04_training_linear_models.ipynb
Provide a summary on what you have learned and give several screenshots to show that you have gone through this code.
This exercise went through various ways of conducting regression models. It was nice to see different methods like lasso regression and gradient descent begin called in actual Python code.
Here are links to some screenshots:
* https://github.com/devinteran/Data609/blob/main/Image1.png
* https://github.com/devinteran/Data609/blob/main/Image2.png
* https://github.com/devinteran/Data609/blob/main/Image3.png
Exercise 5
Suppose you use Batch Gradient Descent and you plot the validation error at every epoch. If you notice that the validation error consistently goes up, what is likely going on? How can you fix this?
You are likely using a learning rate that is too high. You should use lower learning rates.
Exercise 6
Why would you want to use: a. Ridge Regression instead of plain Linear Regression (i.e., without any regularization)? b. Lasso instead of Ridge Regression? c. Elastic Net instead of Lasso?
Use Ridge Regression when variables are multicollinearity (highly correlated). Use Lasso instead of Ridge when there are a ton of parameters. Lasso will pick the best ones and throw the others out. You’ll end up with fewer features in your model. Elastic Net combines Ridge and Lasso and does a bit of both. Use Elastic Net when we have highly correlated variables and we don’t want to throw the majority of them out like in Lasso regression.
Resources:
- https://builtin.com/data-science/gradient-descent
- https://datascience.stackexchange.com/questions/69661/difference-between-ridge-and-linear-regression#:~:text=Linear%20Regression%20establishes%20a%20relationship,also%20known%20as%20regression%20line).&text=Ridge%20Regression%20is%20a%20technique,independent%20variables%20are%20highly%20correlated).
- https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-for-linear-ridge-and-lasso-regression/