In this experiment three treatment and control areas were defined in St. Augustine, FL. monitoring traps were placed in treatment and control areas at equal densities. In the treatment areas AGO traps were placed to test their effectiveness at reducing mosquito populations. After a 4 week period with no treatment, AGO kill traps were added and each area was monitored for 20 weeks.
| Factors | comments |
|---|---|
| datetime | (24 levels, week of measurement) |
| type.y | (2 levels, the treatment of interest) |
| city_pair | (3 levels, geographic pair replicate) |
All data are count data of mosquito populations
library('tidyverse')
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.2 ✓ purrr 0.3.4
✓ tibble 3.0.4 ✓ dplyr 1.0.2
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.4.0 ✓ forcats 0.5.0
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
data <-as_tibble(read.csv("BG_all_data_long.csv"))
data <- data[,-c(7,10,11)]
We will focus on BG traps and female Aedes aegypti mosquitoes
data_filtered <- filter(data, variable == "aedes_aegypti_female_count" & type.x =="bg")
data_filtered
NA
Visualize the count data. Data are highly skewed and there appear to be excess zeros. We will evaluate quasi-possion, negative binomial and Zero inflated models.
hist(data_filtered$value, breaks =40)
mean(data_filtered$value)
[1] 3.560185
sd(data_filtered$value)
[1] 6.065848
hist(log(data_filtered$value), breaks =40)
Explore the distribution of mosquito counts over time
ggplot( aes(x=datetime, y=value, group=type.y, color=type.y), data=data_filtered,) + geom_point(alpha=0.5)
Model 1: estimate the number of mosquitoes in a trap as a function of the fixed effect of the treatment (type.y) and the random effect of the week. Use quasi-poisson as the family since the data is count data and may be over-dispersed. Account for auto-correlation in time series data by adding auto-correlation structure of order 1 using datetime.
library(nlme)
library(MASS)
fit.pois_ar1 <- glmmPQL(value ~ factor(type.y), random= ~ 1 | datetime, data=data_filtered,
family=quasipoisson('log'), correlation=corAR1(form = ~ 1 | datetime), verbose=FALSE)
summary(fit.pois_ar1)
Linear mixed-effects model fit by maximum likelihood
Data: data_filtered
Random effects:
Formula: ~1 | datetime
(Intercept) Residual
StdDev: 0.3619565 2.797202
Correlation Structure: AR(1)
Formula: ~1 | datetime
Parameter estimate(s):
Phi
0.3022005
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: value ~ factor(type.y)
Correlation:
(Intr)
factor(type.y)treatment -0.715
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.9684879 -0.6154371 -0.4423230 0.4589224 7.3339754
Number of Observations: 432
Number of Groups: 24
plot(fit.pois_ar1)
Questions:
Model 2: estimate the number of mosquitoes in a trap as a function of the fixed effect of the treatment(type.y) and the random effect of the week. Use negative binomial as the family since the data is count data and may be over-dispersed. Account for auto-correlation in time series data by adding auto-correlation structure of order 1 using datetime.
nbfit = glm.nb(value ~ factor(datetime) , data=data_filtered)
nbfit
Call: glm.nb(formula = value ~ factor(datetime), data = data_filtered,
init.theta = 0.4498860402, link = log)
Coefficients:
(Intercept) factor(datetime)10/25/18 0:00
1.17007 -0.03509
factor(datetime)10/31/18 0:00 factor(datetime)5/10/18 0:00
-0.69315 -1.49549
factor(datetime)5/17/18 0:00 factor(datetime)5/24/18 0:00
-1.42139 -0.65925
factor(datetime)6/1/18 0:00 factor(datetime)6/14/18 0:00
-0.16862 -0.69315
factor(datetime)6/21/18 0:00 factor(datetime)6/28/18 0:00
0.24362 0.66694
factor(datetime)6/8/18 0:00 factor(datetime)7/12/18 0:00
-0.47692 0.46135
factor(datetime)7/19/18 0:00 factor(datetime)7/26/18 0:00
-0.21030 -0.62646
factor(datetime)7/5/18 0:00 factor(datetime)8/16/18 0:00
-0.01739 0.08269
factor(datetime)8/2/18 0:00 factor(datetime)8/23/18 0:00
0.64909 0.46135
factor(datetime)8/30/18 0:00 factor(datetime)8/9/18 0:00
0.64004 0.72705
factor(datetime)9/13/18 0:00 factor(datetime)9/20/18 0:00
0.52452 -1.28785
factor(datetime)9/27/18 0:00 factor(datetime)9/6/18 0:00
0.84483 0.11394
Degrees of Freedom: 431 Total (i.e. Null); 408 Residual
Null Deviance: 498.5
Residual Deviance: 432.8 AIC: 1926
fit.nb_ar1 <- glmmPQL(value ~ factor(type.y), random= ~ 1 | factor(datetime),data=data_filtered,
family=negative.binomial(theta =nbfit$theta),
correlation=corAR1(form = ~ 1 | factor(datetime)),verbose=TRUE)
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
iteration 7
iteration 8
iteration 9
iteration 10
summary(fit.nb_ar1)
Linear mixed-effects model fit by maximum likelihood
Data: data_filtered
Random effects:
Formula: ~1 | factor(datetime)
(Intercept) Residual
StdDev: 0.1505557 1.00186
Correlation Structure: AR(1)
Formula: ~1 | factor(datetime)
Parameter estimate(s):
Phi
0.275222
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: value ~ factor(type.y)
Correlation:
(Intr)
factor(type.y)treatment -0.631
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-0.6430719 -0.6136618 -0.3719895 0.4341090 8.8315964
Number of Observations: 432
Number of Groups: 24
plot(fit.nb_ar1)
Thoughts:
Model 3: estimate the number of mosquitoes in a trap as a function of the fixed effect of the treatment(type.y) and the random effect of the week. use quasi-poisson as the family since the data is count data and may be over-dispersed. Account for auto-correlation in time series data by adding autocorrelation structure of order 1 using datetime. This is different from model 1 because the 3 site pairs are broken out as a factor.
fit.pois_ar1_paired <- glmmPQL(value ~ factor(type.y)*factor(site_pair), random= ~ 1 | datetime, data=data_filtered,
family=quasipoisson('log'), correlation=corAR1(form = ~ 1 | datetime), verbose=FALSE)
summary(fit.pois_ar1_paired)
Linear mixed-effects model fit by maximum likelihood
Data: data_filtered
Random effects:
Formula: ~1 | datetime
(Intercept) Residual
StdDev: 0.5664958 1.903024
Correlation Structure: AR(1)
Formula: ~1 | datetime
Parameter estimate(s):
Phi
-0.04848436
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: value ~ factor(type.y) * factor(site_pair)
Correlation:
(Intr) fct(.)
factor(type.y)treatment -0.539
factor(site_pair)sasnorth -0.221 0.276
factor(site_pair)sassouth -0.392 0.503
factor(type.y)treatment:factor(site_pair)sasnorth 0.199 -0.363
factor(type.y)treatment:factor(site_pair)sassouth 0.160 -0.297
fctr(st_pr)ssn
factor(type.y)treatment
factor(site_pair)sasnorth
factor(site_pair)sassouth 0.207
factor(type.y)treatment:factor(site_pair)sasnorth -0.895
factor(type.y)treatment:factor(site_pair)sassouth -0.082
fctr(st_pr)sss
factor(type.y)treatment
factor(site_pair)sasnorth
factor(site_pair)sassouth
factor(type.y)treatment:factor(site_pair)sasnorth -0.191
factor(type.y)treatment:factor(site_pair)sassouth -0.415
fctr(typ.y)trtmnt:fctr(st_pr)ssn
factor(type.y)treatment
factor(site_pair)sasnorth
factor(site_pair)sassouth
factor(type.y)treatment:factor(site_pair)sasnorth
factor(type.y)treatment:factor(site_pair)sassouth 0.110
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.6176859 -0.5785869 -0.2912439 0.3605838 5.5743609
Number of Observations: 432
Number of Groups: 24
plot(fit.pois_ar1_paired)
library('tidyverse')
library('lubridate')
data2 <-as_tibble(read.csv("all_data_long.csv")) %>% filter(!is.na(value))
data2$datetime <- mdy_hm(data2$datetime)
data2$week <- factor(cut(data2$datetime, breaks="week"),ordered=TRUE)
#data2 <- data[,-c(7,8,9)]
data_filtered2 <- filter(data2, variable == "aedes_aegypti_female_count" & type.x =="sago")
data_filtered2$value <- as.numeric(data_filtered2$value)
summary(data_filtered2)
trap_id location_id type.x site_pair type.y trap_density address
Min. :170.0 Min. : 20.00 Length:3432 Length:3432 Length:3432 Min. : 1.30 Length:3432
1st Qu.:205.0 1st Qu.: 55.00 Class :character Class :character Class :character 1st Qu.: 1.30 Class :character
Median :241.0 Median : 91.00 Mode :character Mode :character Mode :character Median : 2.59 Mode :character
Mean :241.1 Mean : 91.08 Mean :18.40
3rd Qu.:277.0 3rd Qu.:127.00 3rd Qu.:36.18
Max. :313.0 Max. :163.00 Max. :42.84
city state lat lon sample_id datetime
Length:3432 Length:3432 Min. :29.83 Min. :-81.32 Min. : 1.0 Min. :2018-05-09 00:00:00
Class :character Class :character 1st Qu.:29.84 1st Qu.:-81.32 1st Qu.: 858.8 1st Qu.:2018-06-13 00:00:00
Mode :character Mode :character Median :29.86 Median :-81.32 Median :1716.5 Median :2018-07-27 00:00:00
Mean :29.87 Mean :-81.31 Mean :1716.8 Mean :2018-07-31 06:02:31
3rd Qu.:29.91 3rd Qu.:-81.31 3rd Qu.:2574.2 3rd Qu.:2018-09-12 00:00:00
Max. :29.92 Max. :-81.31 Max. :3456.0 Max. :2018-10-23 00:00:00
variable value week
Length:3432 Min. :0.00000 2018-05-07: 144
Class :character 1st Qu.:0.00000 2018-05-14: 144
Mode :character Median :0.00000 2018-05-21: 144
Mean :0.06469 2018-05-28: 144
3rd Qu.:0.00000 2018-06-04: 144
Max. :5.00000 2018-06-11: 144
(Other) :2568
hist(data_filtered2$value, breaks =40)
mean(data_filtered2$value)
[1] 0.06468531
sd(data_filtered2$value)
[1] 0.2761478
hist(log(data_filtered2$value), breaks =40)
There are a ton of Zeros in the data. no model is going towork well with this data. A Zero inflated poisson model may be the most appropriate. That family is not available in GLMM’s in R that I am aware of. I can run a simpler ZIP model and will try that as well.
ggplot( aes(x=week, y=value, group=type.y, color=type.y), data=data_filtered2,) + geom_jitter(alpha=0.5)
library(nlme)
library(MASS)
fit.sago_qp_ar1 <- glmmPQL(as.numeric(value) ~ factor(type.y), random= ~ 1 | factor(datetime), data=data_filtered2, family=quasipoisson('log'), correlation=corAR1(form = ~ 1 | factor(datetime)), verbose=FALSE)
summary(fit.sago_qp_ar1)
plot(fit.sago_qp_ar1)
library(glmmTMB)
model4 <- glmmTMB(value ~ ar1(week + 0 | type.y), data = data_filtered2, family = nbinom2)
model5 <- glmmTMB(value ~ ar1(week + 0 | type.y), data = data_filtered2, family = poisson)
model6 <- glmmTMB(value ~ type.y, data = data_filtered2, family = nbinom2)
model7 <- glmmTMB(value ~ type.y, zi=~., data = data_filtered2, family = nbinom2)
anova(model4,model5, model6, model7)
Data: data_filtered2
Models:
model5: value ~ ar1(week + 0 | type.y), zi=~0, disp=~1
model6: value ~ type.y, zi=~0, disp=~1
model4: value ~ ar1(week + 0 | type.y), zi=~0, disp=~1
model7: value ~ ar1(type.y + 0 | week), zi=~0, disp=~1
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
model5 3 1642.0 1660.4 -818.00 1636.0
model6 3 1671.2 1689.7 -832.61 1665.2 0.0000 0 1
model4 4 1631.7 1656.3 -811.87 1623.7 41.4861 1 1.187e-10 ***
model7 4 1626.3 1650.9 -809.16 1618.3 5.4214 0 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(model7)
Family: nbinom2 ( log )
Formula: value ~ type.y
Zero inflation: ~.
Data: data_filtered2
AIC BIC logLik deviance df.resid
1674.5 1705.2 -832.3 1664.5 3427
Overdispersion parameter for nbinom2 family (): 0.704
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.2118 0.4664 -4.743 2.11e-06 ***
type.ytreatment -0.6963 0.4786 -1.455 0.146
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7619 1.4360 -0.531 0.596
type.ytreatment -15.1577 3236.5268 -0.005 0.996