DATA 609 - HW 4

Ex. 1

For Example 19 on Page 79 in the book (Introduction to Algorithms for Data Mining and Machine Learning, by Xin-She Yang), carry out the regression using R.

x	-0.98	1.00	2.02	3.03	4.00
y	2.44	-1.51	-0.47	2.54	7.52

I always plot the data first to see if it gives me any insight into what type of model I might need. In this case it’s clearly not linear but quadratic, and I know this from the textbook as well anyway.

Then I created an x-squared variable and fit the model with the base R `lm()` function, which gives the following output:

x2 <- x^2
mod1 <- lm(y~(x + x2))
mod1

## 
## Call:
## lm(formula = y ~ (x + x2))
## 
## Coefficients:
## (Intercept)            x           x2  
##      -0.506       -2.026        1.007

summary(mod1)

## 
## Call:
## lm(formula = y ~ (x + x2))
## 
## Residuals:
##        1        2        3        4        5 
## -0.00677  0.01517  0.02141 -0.05586  0.02605 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.50552    0.03178   -15.9  0.00393 ** 
## x           -2.02616    0.02648   -76.5  0.00017 ***
## x2           1.00651    0.00795   126.6 0.000062 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0476 on 2 degrees of freedom
## Multiple R-squared:     1,   Adjusted R-squared:     1 
## F-statistic: 1.09e+04 on 2 and 2 DF,  p-value: 0.0000919

So the R code gives us the equation:

\[ y = -0.5055 - 2.0262 x +1.0065 x^2 \]

Which is exactly the same as the textbook solution.

Finally I plotted the data along with the model predictions to see how well the model fits the data:

Plot the residuals to make sure there are no patterns

plot(x, mod1$residuals, pch=16)

Ex. 2

Implement the nonlinear curvefitting of Example 20 on Page 83 for the following data:

x	0.10	0.50	1.00	1.50	2.00	2.50
y	0.10	0.28	0.40	0.40	0.37	0.32

Following the same procedure as in example 1 first plot the data.

Then fit the data to the model \(y = \frac{x}{a+bx^2}\) using the `nls()` function in the `stats` package.

f <- function(x, a, b) x/(a + b*x^2)
mod2 <- nls(y ~ f(x, a, b), data = data.frame(x,y), 
               start = list(a = 1, b = 1), trace = T)

## 0.0297 :  1 1
## 0.002244 :  1.345 1.032
## 0.001215 :  1.474 1.006
## 0.00121 :  1.485 1.002
## 0.00121 :  1.485 1.002
## 0.00121 :  1.485 1.002

mod2

## Nonlinear regression model
##   model: y ~ f(x, a, b)
##    data: data.frame(x, y)
##    a    b 
## 1.49 1.00 
##  residual sum-of-squares: 0.00121
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 0.00000039

s <- summary(mod2, correlation = TRUE)
s$coefficients

##   Estimate Std. Error t value   Pr(>|t|)
## a    1.485    0.08777   16.92 0.00007146
## b    1.002    0.05019   19.96 0.00003714

s$correlation

##         a       b
## a  1.0000 -0.6813
## b -0.6813  1.0000

So the R code gives us the equation:

\[ y = \frac{x}{1.4854+1.0021x^2} \]

Which is exactly the same as the textbook solution.

Plot the data along with the model predictions to see how well the model fits the data:

Plot the residuals to make sure there are no patterns

plot(x, s$residuals, pch=16)

Ex. 3

For data with binary \(y\) values, try to fit the following data to the nonlinear function \[y = \frac{1}{1+e^{a+bx}}\] starting with \(a=1\) and \(b=1\).

x	0.1	0.5	1.0	1.5	2.0	2.5
y	0	0	1	1	1	0

Plot the data.

Using the `nls()` function in the `stats` package as in exercise 2

Fit the data to the model \(y = \frac{1}{1+e^{a+bx}}\) using the `nls()` function in the `stats` package.

f3 <- function(x, a, b) 1/(1 + exp(1)^(a+b*x))
mod3 <- nls(y ~ f3(x, a, b), data = data.frame(x,y), 
               start = list(a = 1, b = 1), trace = T)

## 2.634 :  1 1
## 1.257 :   4.001 -8.045
## 1.033 :   5.807 -7.680
## 1.004 :    9.202 -12.219
## 1.001 :   12.34 -16.41
## 1 :   15.39 -20.47
## 1 :   18.41 -24.50
## 1 :   21.41 -28.51
## 1 :   24.42 -32.51
## 1 :   27.42 -36.51
## 1 :   30.42 -40.51
## 1 :   33.42 -44.51
## 1 :   36.42 -48.51

mod3

## Nonlinear regression model
##   model: y ~ f3(x, a, b)
##    data: data.frame(x, y)
##     a     b 
##  36.4 -48.5 
##  residual sum-of-squares: 1
## 
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 0.00000766

s <- summary(mod3, correlation = TRUE)
s$coefficients

##   Estimate Std. Error    t value Pr(>|t|)
## a    36.42     211276  0.0001724   0.9999
## b   -48.51     261861 -0.0001853   0.9999

s$correlation

##        a      b
## a  1.000 -0.951
## b -0.951  1.000

Plot the data along with the model predictions to see how well the model fits the data:

Using the glm() function for logistic regression

mod3.2 <- glm(y~x, family = binomial(link = "logit"))
mod3.2

## 
## Call:  glm(formula = y ~ x, family = binomial(link = "logit"))
## 
## Coefficients:
## (Intercept)            x  
##      -0.898        0.710  
## 
## Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
## Null Deviance:       8.32 
## Residual Deviance: 7.83  AIC: 11.8

summary(mod3.2)

## 
## Call:
## glm(formula = y ~ x, family = binomial(link = "logit"))
## 
## Deviance Residuals: 
##      1       2       3       4       5       6  
## -0.852  -0.957   1.258   1.107   0.965  -1.565  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -0.898      1.581   -0.57     0.57
## x              0.710      1.056    0.67     0.50
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8.3178  on 5  degrees of freedom
## Residual deviance: 7.8325  on 4  degrees of freedom
## AIC: 11.83
## 
## Number of Fisher Scoring iterations: 4

DATA 609 - HW 4

Betsy Rosalen

October 17, 2021

Ex. 1

For Example 19 on Page 79 in the book (Introduction to Algorithms for Data Mining and Machine Learning, by Xin-She Yang), carry out the regression using R.

I always plot the data first to see if it gives me any insight into what type of model I might need. In this case it’s clearly not linear but quadratic, and I know this from the textbook as well anyway.

Then I created an x-squared variable and fit the model with the base R `lm()` function, which gives the following output:

So the R code gives us the equation:

Which is exactly the same as the textbook solution.

Finally I plotted the data along with the model predictions to see how well the model fits the data:

Plot the residuals to make sure there are no patterns

Ex. 2

Implement the nonlinear curvefitting of Example 20 on Page 83 for the following data:

Following the same procedure as in example 1 first plot the data.

Then fit the data to the model \(y = \frac{x}{a+bx^2}\) using the `nls()` function in the `stats` package.

So the R code gives us the equation:

Which is exactly the same as the textbook solution.

Plot the data along with the model predictions to see how well the model fits the data:

Plot the residuals to make sure there are no patterns

Ex. 3

For data with binary \(y\) values, try to fit the following data to the nonlinear function \[y = \frac{1}{1+e^{a+bx}}\] starting with \(a=1\) and \(b=1\).

Plot the data.

Using the `nls()` function in the `stats` package as in exercise 2

Fit the data to the model \(y = \frac{1}{1+e^{a+bx}}\) using the `nls()` function in the `stats` package.

Plot the data along with the model predictions to see how well the model fits the data:

Using the glm() function for logistic regression

Plot the data along with the model predictions to see how well the model fits the data:

Looks like the first solution using the `nls()` function and Gauss-Newton method resulted in better predictions.

DATA 609 - HW 4

Betsy Rosalen

October 17, 2021

Ex. 1

For Example 19 on Page 79 in the book (Introduction to Algorithms for Data Mining and Machine Learning, by Xin-She Yang), carry out the regression using R.

I always plot the data first to see if it gives me any insight into what type of model I might need. In this case it’s clearly not linear but quadratic, and I know this from the textbook as well anyway.

Then I created an x-squared variable and fit the model with the base R lm() function, which gives the following output:

So the R code gives us the equation:

Which is exactly the same as the textbook solution.

Finally I plotted the data along with the model predictions to see how well the model fits the data:

Plot the residuals to make sure there are no patterns

Ex. 2

Implement the nonlinear curvefitting of Example 20 on Page 83 for the following data:

Following the same procedure as in example 1 first plot the data.

Then fit the data to the model \(y = \frac{x}{a+bx^2}\) using the nls() function in the stats package.

So the R code gives us the equation:

Which is exactly the same as the textbook solution.

Plot the data along with the model predictions to see how well the model fits the data:

Plot the residuals to make sure there are no patterns

Ex. 3

For data with binary \(y\) values, try to fit the following data to the nonlinear function \[y = \frac{1}{1+e^{a+bx}}\] starting with \(a=1\) and \(b=1\).

Plot the data.

Using the nls() function in the stats package as in exercise 2

Fit the data to the model \(y = \frac{1}{1+e^{a+bx}}\) using the nls() function in the stats package.

Plot the data along with the model predictions to see how well the model fits the data:

Using the glm() function for logistic regression

Plot the data along with the model predictions to see how well the model fits the data:

Looks like the first solution using the nls() function and Gauss-Newton method resulted in better predictions.

Then I created an x-squared variable and fit the model with the base R `lm()` function, which gives the following output:

Then fit the data to the model \(y = \frac{x}{a+bx^2}\) using the `nls()` function in the `stats` package.

Using the `nls()` function in the `stats` package as in exercise 2

Fit the data to the model \(y = \frac{1}{1+e^{a+bx}}\) using the `nls()` function in the `stats` package.

Looks like the first solution using the `nls()` function and Gauss-Newton method resulted in better predictions.