# Non-linear Regression

Alban Guillaumet, College of William and Mary

### Objectives

• Introduction to non-linear regression, using Asymptotic curve fitting

• Introduction to Generalized linear models, using Logistic regression

### Nonlinear regression

An infinite number of possible nonlinear functions to fit!

Book covers:

• Asymptotic curve fitting
• Formula-free curve fitting

Today we will just discuss asymptotic curve fitting.

### Asymptotic curve fitting

Consider the relationship between population growth rate of a phytoplankton in culture and the concentration of iron in the medium.

• Left panel is overfitting
• Left panel fit will perform poorly for prediction
• Left panel fit has poor biological motivation

### Asymptotic curve fitting

The right panel shows a fit with the Michaelis-Menten curve

$Y = \frac{aX}{b+X},$
with $$a$$ and $$b$$ parameters.

### Asymptotic curve fitting

phytoplankton <- read.csv("D:/Alban/WM/TEACHING/Biostats/Alban/My class/Lectures/W&S data sets/chapter17/chap17f8_1IronAndPhytoplanktonGrowth.csv")

  ironConcentration phytoGrowthRate
1        0.01096478            0.00
2        0.02398833            0.50
3        0.02818383            0.35
4        0.07244360            1.04
5        0.07585776            0.97
6        0.11481536            1.19


### Asymptotic curve fitting

One can perform a nonlinear regression using nonlinear least squares. This can be accomplished in R as follows with the package nls.

phytoCurve <- nls(phytoGrowthRate ~ a*ironConcentration / ( b+ironConcentration),
data = phytoplankton,
list(a = 2, b = 1))


The last argument is a list containing initial values for the parameters (initial guesses at the best fit).

### Asymptotic curve fitting

To get information on the fit, we can use the summary command as before in the linear case.

summary(phytoCurve)


Formula: phytoGrowthRate ~ a * ironConcentration/(b + ironConcentration)

Parameters:
Estimate Std. Error t value Pr(>|t|)
a  1.94164    0.06802  28.545 1.14e-11 ***
b  0.07833    0.01120   6.994 2.29e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1133 on 11 degrees of freedom

Number of iterations to convergence: 8
Achieved convergence tolerance: 4.875e-06


### Logistic regression: an example

Research topic: foraging ecology of the Corsican nuthatch

Hypothesis: The time spent feeding on pine cones depends on external temperature

### Logistic regression: an example

• Observational study

• Foraging nuthatches were observed at the Ascu study site (1,420 m a.s.l.) during 34 days between 16 January and 28 February 2001, for about 120 h in total.

• Recording was at 15-second intervals. Every 15s, the substrate was identified as in one of the two categories: cone, or other (lichen, needle, twig, small branch, branch, or trunk).

• The temperatures (daily maxima) was recorded under shelter.

### Logistic regression: data

head(nuthatch)

  day tmax cone other
1   1  4.5    0     9
2   2  8.0    0     8
3   3  7.0    9    65
4   4  5.0    0   111
5   5  7.0    0   133
6   6  5.0    0    80


### Logistic regression: data

tail(nuthatch)

   day tmax cone other
29  29 16.5   78     0
30  30  7.0   30    52
31  31  6.5   75    18
32  32 -3.5    0     9
33  33  4.0    0   130
34  34 -4.0    0    99


### Logistic regression: data viz

n = cone + other
prop_cone = cone / n
nuthatch = cbind(nuthatch, prop_cone, n)

  day tmax cone other prop_cone   n
1   1  4.5    0     9 0.0000000   9
2   2  8.0    0     8 0.0000000   8
3   3  7.0    9    65 0.1216216  74
4   4  5.0    0   111 0.0000000 111
5   5  7.0    0   133 0.0000000 133
6   6  5.0    0    80 0.0000000  80


### What about a straight line?

• Simple linear regression?

### What about a straight line?

• Simple linear regression? Let's make a residual plot

### What about a straight line?

• Simple linear regression?

### Logistic regression (GLM)

nut.glm = glm(prop_cone ~ tmax, family = binomial, weights = n, nuthatch)
# nut.glm = glm(cbind(cone, other) ~ tmax, family = binomial, nuthatch)

• Statistically appropriate

• Better representation of reality

• Hypothesis: cone open when temp > threshold
• Inference: e.g., temperature at which nuthatches switch their foraging behavior

### Logistic regression (GLM)

• Switching behavior temperature?

### Example: Logistic regression

To calculate $$P$$-values, we can use an analysis of deviance table.

anova(nut.glm, test = "Chi")


This will test the null hypothesis that the temperature has no effect on the proba to forage on cones (i.e. $$\beta_{0} = 0$$). See the book p. 570 for details.

### Example: Logistic regression

Analysis of Deviance Table

Response: prop_cone

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
NULL                    33     3134.8
tmax  1   2290.6        32      844.2 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


### Logistic regression: 'theory'

Definition: Logistic regression is nonlinear regression developed for a categorical response variable with two levels.

Caption: Mortality of guppies in relation to duration of exposure to a temperature of 5$$^{\circ}$$ C.

### Logistic regression

Properties:

• Outcomes at every $$X$$ have a binomial distribution (not a normal distribution).
• Probability of an event is given by corresponding predicted value on the regression curve.

### Logistic regression

The following curve is fit in logistic regression:

$\textrm{log-odds}(Y) = \alpha + \beta X,$

where

$\textrm{log-odds}(Y) = \ln\left[\frac{Y}{1-Y}\right]$

and $$\alpha, \beta$$ are population parameters.

Note: $$\textrm{Log-odds}(Y)$$ is called the logit link function.

### Logistic regression

The estimates $$a$$ and $$b$$ for $$\alpha$$ and $$\beta$$, respectively, describe the linear relationship between $$X$$ and the predicted log odds of $$Y$$, i.e.

$\textrm{log-odds}(\hat{Y}) = a + b X.$

To obtain the original predicted values, we need to solve for $$\hat{Y}$$, obtaining

$\hat{Y} = \frac{e^{a+bX}}{1+e^{a+bX}} = \frac{1}{1 + e^{-(a + bX)}}.$

This is a sigmoidal (S-shaped) function. When $$b > 0$$, function is increasing. When $$b < 0$$, function is decreasing. When $$b = 0$$, function is constant.

### Summary

• The Logistic regression allows estimating how the probability of an event varies as a function of one (or several) explanatory factors; it is a special case of an important class of models called Generalized Linear Models. It can be fitted using the glm function in R.

• Many types of other non-linear regressions can be fitted to data using the nls function in R.

### Homework

• For question e, try and use the dose.p function in the MASS package.