Using a data set of your choice, do the following:
1) define an outcome, this can be of any form, but use an appropriate distribution for your outcome. Also define at least 1 continuous predictor. Outcome variable is the overall count of grooming for each individual gibbon, this includes when the individual was grooming someone and being groomed by someone.
Distribution = Poisson
Continuous predictor = age in months
2) Using the gam() function, estimate a model with only linear terms in your model.
library(splines)
library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::collapse() masks nlme::collapse()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
gibbons<-read_excel("/Users/alliesheldon/Documents/UTSA/Classes/Demography/Blogs and Final Paper/gibb_individuals.xlsx")
groom_all_gam<-gam(groom_all~(age_mos)+birthorder+sex+encl,data=gibbons,family=poisson)
summary(groom_all_gam)
##
## Family: poisson
## Link function: log
##
## Formula:
## groom_all ~ (age_mos) + birthorder + sex + encl
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.5715543 0.1147946 39.824 < 2e-16 ***
## age_mos -0.0009761 0.0004375 -2.231 0.0257 *
## birthorder -0.3161370 0.0573873 -5.509 3.61e-08 ***
## sexM -0.4131917 0.0526454 -7.849 4.21e-15 ***
## encl13a 0.5295616 0.0950411 5.572 2.52e-08 ***
## encl15 1.4066074 0.0755069 18.629 < 2e-16 ***
## encl2 -1.5503682 0.1964543 -7.892 2.98e-15 ***
## encl5 1.1592026 0.0780428 14.853 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## R-sq.(adj) = 0.971 Deviance explained = 96.9%
## UBRE = 2.2013 Scale est. = 1 n = 19
3) Repeat Step 2, but include a smooth term for at least one continuous variable.
groom_all_gams<-gam(groom_all~s(age_mos)+birthorder+sex+encl,data=gibbons,family=poisson)
summary(groom_all_gams)
##
## Family: poisson
## Link function: log
##
## Formula:
## groom_all ~ s(age_mos) + birthorder + sex + encl
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.4463 0.1212 36.684 < 2e-16 ***
## birthorder -0.2629 0.1623 -1.620 0.1052
## sexM -0.5554 0.1029 -5.395 6.86e-08 ***
## encl13a 0.4378 0.2264 1.933 0.0532 .
## encl15 1.2674 0.1734 7.308 2.70e-13 ***
## encl2 -1.3359 0.2410 -5.542 2.98e-08 ***
## encl5 0.9383 0.1117 8.401 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(age_mos) 7.834 8.378 26.06 0.000763 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.979 Deviance explained = 99.1%
## UBRE = 1.2237 Scale est. = 1 n = 19
4) Test if the model in step 3 is fitting the data better than the purely linear model.
anova(groom_all_gam, groom_all_gams,test="Chisq")
## Analysis of Deviance Table
##
## Model 1: groom_all ~ (age_mos) + birthorder + sex + encl
## Model 2: groom_all ~ s(age_mos) + birthorder + sex + encl
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 11.0000 44.824
## 2 3.6217 12.582 7.3783 32.242 5.066e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to the Chi2 test, the smooth spline has a better fit than the linear model.
5) Produce a plot of the smooth effect from the model in step 3
plot(groom_all_gams)