Alban Guillaumet, College of William and Mary

Applied Biostatistics using R (BIOL 680), Spring 2018

Introduction to non-linear regression, using Asymptotic curve fitting

Introduction to Generalized linear models, using Logistic regression

**An infinite number of possible nonlinear functions to fit!**

Book covers:

- Asymptotic curve fitting
- Quadratic curve fitting
- Formula-free curve fitting

Today we will just discuss asymptotic curve fitting.

Consider the relationship between population growth rate of a phytoplankton in culture and the concentration of iron in the medium.

- Left panel is
**overfitting** - Left panel fit will perform poorly for prediction
- Left panel fit has poor biological motivation

The right panel shows a fit with the *Michaelis-Menten* curve

\[ Y = \frac{aX}{b+X}, \]

with \( a \) and \( b \) parameters.

```
phytoplankton <- read.csv("D:/Alban/WM/TEACHING/Biostats/Alban/My class/Lectures/W&S data sets/chapter17/chap17f8_1IronAndPhytoplanktonGrowth.csv")
head(phytoplankton)
```

```
ironConcentration phytoGrowthRate
1 0.01096478 0.00
2 0.02398833 0.50
3 0.02818383 0.35
4 0.07244360 1.04
5 0.07585776 0.97
6 0.11481536 1.19
```

One can perform a nonlinear regression using **nonlinear least squares**. This can be accomplished in R as follows with the package `nls`

.

```
phytoCurve <- nls(phytoGrowthRate ~ a*ironConcentration / ( b+ironConcentration),
data = phytoplankton,
list(a = 2, b = 1))
```

The last argument is a list containing initial values for the parameters (*initial guesses* at the best fit).

To get information on the fit, we can use the `summary`

command as before in the linear case.

```
summary(phytoCurve)
```

```
Formula: phytoGrowthRate ~ a * ironConcentration/(b + ironConcentration)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a 1.94164 0.06802 28.545 1.14e-11 ***
b 0.07833 0.01120 6.994 2.29e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1133 on 11 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 4.875e-06
```

**Research topic**: foraging ecology of the Corsican nuthatch

**Hypothesis**: The time spent feeding on pine cones depends on external temperature

Observational study

Foraging nuthatches were observed at the Ascu study site (1,420 m a.s.l.) during 34 days between 16 January and 28 February 2001, for about 120 h in total.

Recording was at 15-second intervals. Every 15s, the substrate was identified as in one of the two categories: cone, or other (lichen, needle, twig, small branch, branch, or trunk).

The temperatures (daily maxima) was recorded under shelter.

```
head(nuthatch)
```

```
day tmax cone other
1 1 4.5 0 9
2 2 8.0 0 8
3 3 7.0 9 65
4 4 5.0 0 111
5 5 7.0 0 133
6 6 5.0 0 80
```

```
tail(nuthatch)
```

```
day tmax cone other
29 29 16.5 78 0
30 30 7.0 30 52
31 31 6.5 75 18
32 32 -3.5 0 9
33 33 4.0 0 130
34 34 -4.0 0 99
```

```
n = cone + other
prop_cone = cone / n
nuthatch = cbind(nuthatch, prop_cone, n)
head(nuthatch)
```

```
day tmax cone other prop_cone n
1 1 4.5 0 9 0.0000000 9
2 2 8.0 0 8 0.0000000 8
3 3 7.0 9 65 0.1216216 74
4 4 5.0 0 111 0.0000000 111
5 5 7.0 0 133 0.0000000 133
6 6 5.0 0 80 0.0000000 80
```

- Simple linear regression?

- Simple linear regression? Let's make a residual plot

- Simple linear regression?

```
nut.glm = glm(prop_cone ~ tmax, family = binomial, weights = n, nuthatch)
# nut.glm = glm(cbind(cone, other) ~ tmax, family = binomial, nuthatch)
```

Statistically appropriate

Better representation of reality

- Hypothesis: cone open when temp > threshold
- Inference: e.g., temperature at which nuthatches switch their foraging behavior

- Switching behavior temperature?

To calculate \( P \)-values, we can use an **analysis of deviance** table.

```
anova(nut.glm, test = "Chi")
```

This will test the null hypothesis that the temperature has no effect on the proba to forage on cones (i.e. \( \beta_{0} = 0 \)). See the book p. 570 for details.

```
Analysis of Deviance Table
Model: binomial, link: logit
Response: prop_cone
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 33 3134.8
tmax 1 2290.6 32 844.2 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Definition:Logistic regression is nonlinear regression developed for a categorical response variable with two levels.

**Caption**: Mortality of guppies in relation to duration of exposure to a temperature of 5\( ^{\circ} \) C.

**Properties**:

- Outcomes at every \( X \) have a
**binomial distribution**(not a normal distribution). - Probability of an event is given by corresponding predicted value on the regression curve.

The following curve is fit in logistic regression:

\[ \textrm{log-odds}(Y) = \alpha + \beta X, \]

where

\[ \textrm{log-odds}(Y) = \ln\left[\frac{Y}{1-Y}\right] \]

and \( \alpha, \beta \) are population parameters.

**Note**: \( \textrm{Log-odds}(Y) \) is called the **logit link function**.

The estimates \( a \) and \( b \) for \( \alpha \) and \( \beta \), respectively, describe the linear relationship between \( X \) and the predicted log odds of \( Y \), i.e.

\[ \textrm{log-odds}(\hat{Y}) = a + b X. \]

To obtain the original predicted values, we need to solve for \( \hat{Y} \), obtaining

\[ \hat{Y} = \frac{e^{a+bX}}{1+e^{a+bX}} = \frac{1}{1 + e^{-(a + bX)}}. \]

This is a sigmoidal (S-shaped) function. When \( b > 0 \), function is increasing. When \( b < 0 \), function is decreasing. When \( b = 0 \), function is constant.

The Logistic regression allows estimating how the probability of an event varies as a function of one (or several) explanatory factors; it is a special case of an important class of models called Generalized Linear Models. It can be fitted using the

`glm`

function in R.Many types of other non-linear regressions can be fitted to data using the

`nls`

function in R.

Reading: W&S Ch. 17.8-10

HW: W&S Ch. 17.36

- For question d, run the analysis yourself
- For question e, try and use the
`dose.p`

function in the`MASS`

package.

Research project…Let's see what we were able to do today (instructions later)