Incubation temperature can affect the sex of turtles. An experiment was conducted with three independent replicates for each temperature and the number of male and female turtles born was recorded and can be found in the turtle dataset.
library(faraway)
## Warning: package 'faraway' was built under R version 3.5.3
library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data(turtle, package = "faraway")
head(turtle)
attach(turtle, warn.conflicts = FALSE)
Plot the proportion of males against the temperature. Comment on the nature of the relationship.
The proportion is obviously more male as temp rises
turtle$y <- ifelse(female==0,1,(male)/(male+female))
attach(turtle, warn.conflicts = FALSE)
plot(x=temp, y)
Fit a binomial response model with a linear term in temperature. Does this model fit the data?
The fits pretty well, but definitely at 27.7 under fits some, and at 27.2 overfits some
tMod <- glm(cbind(male, female) ~temp, family="binomial")
summary(tMod)
##
## Call:
## glm(formula = cbind(male, female) ~ temp, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0721 -1.0292 -0.2714 0.8087 2.5550
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -61.3183 12.0224 -5.100 3.39e-07 ***
## temp 2.2110 0.4309 5.132 2.87e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 74.508 on 14 degrees of freedom
## Residual deviance: 24.942 on 13 degrees of freedom
## AIC: 53.836
##
## Number of Fisher Scoring iterations: 5
x <- seq(27,30.2,0.1)
plot(x=temp, y)
lines(x, ilogit(-61.3183+(2.2110*x)))
turtle$pred <- ilogit(-61.3183+(2.2110*temp))
attach(turtle, warn.conflicts = FALSE)
ggplot(data=turtle, aes(y=value,x=temp,colour = variable)) +
geom_point(aes(y=y, col="y")) +
geom_point(aes(y=pred, col="pred"))
Yes, there area only 15 observations, and there is essentially one predictor (ratio of male/female)
Looking at the plot of points , there are no strong outliers
Compute the empirical logits and plot these against temperature. Does this indicate a lack of fit?
The plot was already done in 3.2.b, there is fit, but obviously it could be better.
Add a quadratic term in temperature. Is this additional term a significant predictor of the response. Does the quadratic model fit the data?
Yes, this x^2 predictor has a .0199 pr value, so it is a significant predictor. When weplot it, we see we fit closer to the the points in almost all incidences.
tMod <- glm(cbind(male, female) ~temp + I(temp^2), family="binomial")
summary(tMod)
##
## Call:
## glm(formula = cbind(male, female) ~ temp + I(temp^2), family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6703 -0.8875 -0.4194 0.9481 2.2198
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -677.5950 268.7984 -2.521 0.0117 *
## temp 45.9173 18.9169 2.427 0.0152 *
## I(temp^2) -0.7745 0.3327 -2.328 0.0199 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 74.508 on 14 degrees of freedom
## Residual deviance: 20.256 on 12 degrees of freedom
## AIC: 51.15
##
## Number of Fisher Scoring iterations: 4
plot(x=temp, y)
lines(x, ilogit(-61.3183+(2.2110*x)))
lines(x, ilogit(-677.595 +(45.9173*x)- (.7745 * x^2)))
There are three replicates for each value of temperature. Assuming independent binomial variation, how much variation would be expected in the three proportions observed? Compare this to the observed variation in these proportions. Do they approximately agree or is there evidence of greater variation?
Looking at confidence intervals (5%) we see that using the ilogit function we get 0 as minimum proportion for all temps, and an error at the high interval. Looking at what the values are, we can see that realistically the high end of the confidence interval is 100% proportion.
conTMod <- confint(tMod)
## Waiting for profiling to be done...
conTMod
## 2.5 % 97.5 %
## (Intercept) -1211.941976 -125.15140682
## temp 6.791396 83.38227276
## I(temp^2) -1.430746 -0.08186443
ilogit(conTMod[1,1] +(conTMod[2,1]*x)+ (conTMod[3,1] * x^2))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ilogit(conTMod[1,2] +(conTMod[2,2]*x) + (conTMod[3,2] * x^2))
## [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
## [18] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
conTMod[1,1] +(conTMod[2,1]*x) +(conTMod[3,1] * x^2)
## [1] -2071.588 -2078.649 -2085.739 -2092.858 -2100.005 -2107.180 -2114.385
## [8] -2121.617 -2128.879 -2136.169 -2143.488 -2150.835 -2158.211 -2165.616
## [15] -2173.049 -2180.511 -2188.001 -2195.520 -2203.068 -2210.644 -2218.249
## [22] -2225.882 -2233.544 -2241.235 -2248.955 -2256.702 -2264.479 -2272.284
## [29] -2280.118 -2287.980 -2295.871 -2303.791 -2311.739
conTMod[1,2] +(conTMod[2,2]*x)+ (conTMod[3,2] * x^2)
## [1] 2066.491 2074.386 2082.280 2090.172 2098.062 2105.951 2113.838
## [8] 2121.724 2129.608 2137.490 2145.371 2153.249 2161.127 2169.003
## [15] 2176.877 2184.749 2192.620 2200.489 2208.356 2216.222 2224.087
## [22] 2231.949 2239.810 2247.669 2255.527 2263.383 2271.238 2279.090
## [29] 2286.941 2294.791 2302.639 2310.485 2318.330
If the three replicates are homogenous, they could be combined so that the dataset would have only five cases in total. Create this dataset and fit a model linear in temperature. Compare the fit seen for this model with that found in (b).
The results are similar, which makes sense. We took the mean (with either the floor or ceiling value to make it an integer) and with 3 values, we should average out to similar values as the result.
grpTurtle <- group_by(turtle, temp)
grpTurtle <-grpTurtle %>% summarise(male = floor(mean(male)), female = ceiling(mean(female)))
attach(grpTurtle, warn.conflicts = FALSE)
grpTurtle$y <- ifelse(female==0,1,(grpTurtle$male)/(male+female))
attach(grpTurtle, warn.conflicts = FALSE)
tMod <- glm(cbind(male, female) ~temp + I(temp^2), family="binomial")
summary(tMod)
##
## Call:
## glm(formula = cbind(male, female) ~ temp + I(temp^2), family = "binomial")
##
## Deviance Residuals:
## 1 2 3 4 5
## -1.3259 1.2364 0.1772 -1.0152 0.1579
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -862.9854 441.8372 -1.953 0.0508 .
## temp 58.9773 30.8701 1.911 0.0561 .
## I(temp^2) -1.0049 0.5388 -1.865 0.0622 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 23.0262 on 4 degrees of freedom
## Residual deviance: 4.3736 on 2 degrees of freedom
## AIC: 19.795
##
## Number of Fisher Scoring iterations: 4
plot(x=temp, y)
lines(x, ilogit(-862.9854 +(58.9773*x)- (1.0049 * x^2)))