1 Abstract

Nematodirus infection intensity in Wyoming sheep is a useful applied setting for comparing categorical and count-based modeling strategies. The response is the number of sheep in ordered infection-intensity categories within each farm. A multinomial model is a natural starting point because the counts are allocated across mutually exclusive categories, but a log-linear formulation is more flexible for modeling overdispersion, farm-level clustering, and zero inflation. This report presents a staged analysis using multinomial, Poisson, negative binomial, heterogeneous-dispersion negative binomial, mixed-effects negative binomial, and zero-inflated negative binomial models. The final selected model was a negative binomial log-linear model with heterogeneous dispersion and a farm-level random intercept.

2 Project aim

The main aim is to build a transparent, reproducible modeling workflow for aggregated Nematodirus infection-intensity data. The analysis asks four connected questions:

  1. Can the multinomial model be represented in an equivalent Poisson log-linear form?
  2. Is the simple Poisson trend model adequate for these data?
  3. Do overdispersion, heterogeneous dispersion, and farm-level random effects improve the model?
  4. Does a separate zero-inflation component improve the model enough to justify the added complexity?

3 Background

Gastrointestinal nematode infections are persistent health and production problems in grazing sheep. Infection patterns are shaped by environmental exposure, host immunity, farm management, grazing pressure, and treatment history. Nematodirus is especially relevant in cooler climates because environmental persistence and hatching behavior can produce uneven infection patterns across farms and seasons.

From a statistical perspective, the data are challenging because the counts are grouped by farm and ordered intensity scale. The higher-intensity categories are sparse, and farms may differ in baseline infection intensity for reasons not fully captured by measured predictors. These features motivate a model sequence that begins with multinomial and Poisson log-linear models and then expands to negative binomial, mixed-effects, and zero-inflated formulations (Agresti 2013; Bilder and Loughin 2024; Hilbe 2011; Brooks et al. 2017).

4 Data structure

data_path <- "data/work_data.csv"
data_available <- file.exists(data_path)

  work_dat <- readr::read_csv(data_path, show_col_types = FALSE)
  work_dat$number_deworm_year <- as.numeric(work_dat$number_deworm_year)
  work_dat$nem_scaled <- factor(work_dat$nem_scaled)
  work_dat$Farms <- factor(work_dat$Farms)
  work_sorted <- work_dat[order(work_dat$Farms, work_dat$nem_scaled), ]
  work_sorted$nem_scaled <- stats::relevel(work_sorted$nem_scaled, ref = "0")

Expected variables:

Minimum variables needed to rerun the analysis.
Variable Role Notes
Farms Farm identifier / grouping variable Should be anonymized before public posting
nem_scaled Ordered Nematodirus infection-intensity scale Scale 0 is the reference category
frequency Number of sheep in each farm-by-scale cell Count response
density_per_acre Farm-level stocking-density predictor Used in interactions
number_deworm_year Farm-level deworming-frequency predictor Used in interactions

5 Statistical methods

5.1 Multinomial starting point

Let \(Y_{ij}\) denote the number of sheep from farm \(i\) in intensity category \(j\), and let \(M_i = \sum_j Y_{ij}\) be the total number of sheep represented for farm \(i\). A multinomial model conditions on \(M_i\) and models the category probabilities:

\[ Y_i \mid M_i, \pi_i \sim \text{Multinomial}(M_i, \pi_{i1}, \ldots, \pi_{ic}). \]

This is a useful conceptual starting point, but it is less convenient when the goal is to expand the model to include overdispersion, heterogeneous dispersion, random effects, and zero inflation.

5.2 Poisson log-linear representation

The equivalent Poisson log-linear representation treats each category count as a count response:

\[ Y_{ij} \mid \mu_{ij} \sim \text{Poisson}(\mu_{ij}), \quad \log(\mu_{ij}) = \eta_{ij}. \]

The first Poisson model used farm-specific fixed effects and categorical intensity-scale effects to reproduce the multinomial contrasts. After that equivalence was checked, the scale variable was treated numerically to impose an ordinal trend and reduce the number of parameters.

5.3 Negative binomial and mixed-effects extensions

The negative binomial model relaxes the Poisson mean-variance equality and is a better candidate when parasite counts show aggregation or extra-Poisson variation (Hilbe 2011; Lawless 1987). A farm-level random intercept was then added to account for unexplained between-farm heterogeneity. The final selected model was:

\[ Y_{ij} \sim \text{NB2}(\mu_{ij}, \theta_j), \]

\[ \log(\mu_{ij}) = \beta_0 + \beta_1\text{scale}_{ij} + \beta_2\text{density}_{i} + \beta_3\text{deworm}_{i} + \beta_4(\text{scale}_{ij} \times \text{density}_{i}) + \beta_5(\text{scale}_{ij} \times \text{deworm}_{i}) + \beta_6(\text{density}_{i} \times \text{deworm}_{i}) + \beta_7(\text{scale}_{ij} \times \text{density}_{i} \times \text{deworm}_{i}) + b_i, \]

where \(b_i\) is the farm-level random intercept. Heterogeneous dispersion was modeled as a function of the intensity scale.

6 Model code

ic_f <- function(fit) {
  c(`-2 logLik` = as.numeric(-2 * logLik(fit)), AIC = AIC(fit), BIC = BIC(fit))
}

tidy_glmmtmb_conditional <- function(fit) {
  broom.mixed::tidy(fit, component = "cond", conf.int = TRUE)
}

6.1 Multinomial model

multinom_fit <- nnet::multinom(
  nem_scaled ~ density_per_acre * number_deworm_year,
  weights = frequency,
  data = work_sorted,
  trace = FALSE
)

ic_f(multinom_fit)
## -2 logLik       AIC       BIC 
##  749.9916  773.9916  820.9212
summary(multinom_fit)
## Call:
## nnet::multinom(formula = nem_scaled ~ density_per_acre * number_deworm_year, 
##     data = work_sorted, weights = frequency, trace = FALSE)
## 
## Coefficients:
##   (Intercept) density_per_acre number_deworm_year
## 2   -2.239164         5.993926          0.6027921
## 3   -2.730779         5.916960          0.8727250
## 4   -4.069314         4.006807          0.7628583
##   density_per_acre:number_deworm_year
## 2                           -3.138593
## 3                           -3.068355
## 4                           -1.864561
## 
## Std. Errors:
##   (Intercept) density_per_acre number_deworm_year
## 2   0.3941462         1.827218          0.2286101
## 3   0.4593893         1.918708          0.2586518
## 4   0.9236870         3.641792          0.5141089
##   density_per_acre:number_deworm_year
## 2                           0.9412728
## 3                           0.9803509
## 4                           1.8280003
## 
## Residual Deviance: 749.9916 
## AIC: 773.9916

6.2 Equivalent Poisson log-linear model

poisson_equiv <- glm(
  frequency ~ Farms + nem_scaled * density_per_acre * number_deworm_year,
  family = poisson(link = "log"),
  data = work_sorted
)

ic_f(poisson_equiv)
## -2 logLik       AIC       BIC 
##  184.3247  230.3247  271.3611
summary(poisson_equiv)
## 
## Call:
## glm(formula = frequency ~ Farms + nem_scaled * density_per_acre * 
##     number_deworm_year, family = poisson(link = "log"), data = work_sorted)
## 
## Coefficients: (3 not defined because of singularities)
##                                                 Estimate Std. Error z value
## (Intercept)                                       4.4841     0.1031  43.492
## Farmsb                                           -3.1280     0.5062  -6.179
## Farmsc                                           -0.6158     0.1775  -3.470
## Farmsd                                           -2.2133     0.2862  -7.733
## Farmse                                           -1.7528     0.2366  -7.408
## Farmsf                                           -2.5128     0.3082  -8.154
## Farmsg                                           -3.2060     0.4223  -7.593
## Farmsh                                           -2.9021     0.3730  -7.781
## Farmsi                                           -1.7193     0.2281  -7.538
## Farmsj                                           -2.5925     0.3960  -6.546
## Farmsk                                           -1.5668     0.2090  -7.499
## nem_scaled2                                      -2.2391     0.3941  -5.681
## nem_scaled3                                      -2.7308     0.4594  -5.944
## nem_scaled4                                      -4.0693     0.9237  -4.406
## density_per_acre                                      NA         NA      NA
## number_deworm_year                                    NA         NA      NA
## nem_scaled2:density_per_acre                      5.9937     1.8272   3.280
## nem_scaled3:density_per_acre                      5.9172     1.9187   3.084
## nem_scaled4:density_per_acre                      4.0070     3.6417   1.100
## nem_scaled2:number_deworm_year                    0.6027     0.2286   2.637
## nem_scaled3:number_deworm_year                    0.8727     0.2587   3.374
## nem_scaled4:number_deworm_year                    0.7628     0.5141   1.484
## density_per_acre:number_deworm_year                   NA         NA      NA
## nem_scaled2:density_per_acre:number_deworm_year  -3.1384     0.9413  -3.334
## nem_scaled3:density_per_acre:number_deworm_year  -3.0684     0.9803  -3.130
## nem_scaled4:density_per_acre:number_deworm_year  -1.8646     1.8280  -1.020
##                                                 Pr(>|z|)    
## (Intercept)                                      < 2e-16 ***
## Farmsb                                          6.43e-10 ***
## Farmsc                                          0.000520 ***
## Farmsd                                          1.05e-14 ***
## Farmse                                          1.28e-13 ***
## Farmsf                                          3.51e-16 ***
## Farmsg                                          3.14e-14 ***
## Farmsh                                          7.22e-15 ***
## Farmsi                                          4.79e-14 ***
## Farmsj                                          5.90e-11 ***
## Farmsk                                          6.45e-14 ***
## nem_scaled2                                     1.34e-08 ***
## nem_scaled3                                     2.77e-09 ***
## nem_scaled4                                     1.05e-05 ***
## density_per_acre                                      NA    
## number_deworm_year                                    NA    
## nem_scaled2:density_per_acre                    0.001037 ** 
## nem_scaled3:density_per_acre                    0.002043 ** 
## nem_scaled4:density_per_acre                    0.271207    
## nem_scaled2:number_deworm_year                  0.008376 ** 
## nem_scaled3:number_deworm_year                  0.000740 ***
## nem_scaled4:number_deworm_year                  0.137874    
## density_per_acre:number_deworm_year                   NA    
## nem_scaled2:density_per_acre:number_deworm_year 0.000855 ***
## nem_scaled3:density_per_acre:number_deworm_year 0.001748 ** 
## nem_scaled4:density_per_acre:number_deworm_year 0.307708    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 751.130  on 43  degrees of freedom
## Residual deviance:  74.296  on 21  degrees of freedom
## AIC: 230.32
## 
## Number of Fisher Scoring iterations: 7

6.3 Poisson trend model

poisson_trend <- glm(
  frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year,
  family = poisson(link = "log"),
  data = work_sorted
)

ic_f(poisson_trend)
## -2 logLik       AIC       BIC 
##  431.4898  447.4898  461.7634
summary(poisson_trend)
## 
## Call:
## glm(formula = frequency ~ as.numeric(nem_scaled) * density_per_acre * 
##     number_deworm_year, family = poisson(link = "log"), data = work_sorted)
## 
## Coefficients:
##                                                            Estimate Std. Error
## (Intercept)                                                  4.7768     0.3562
## as.numeric(nem_scaled)                                      -1.6718     0.2140
## density_per_acre                                            -4.0889     1.3841
## number_deworm_year                                          -0.1854     0.2344
## as.numeric(nem_scaled):density_per_acre                      2.0585     0.6278
## as.numeric(nem_scaled):number_deworm_year                    0.5547     0.1321
## density_per_acre:number_deworm_year                          1.3943     0.7383
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -1.0783     0.3398
##                                                            z value Pr(>|z|)    
## (Intercept)                                                 13.412  < 2e-16 ***
## as.numeric(nem_scaled)                                      -7.812 5.65e-15 ***
## density_per_acre                                            -2.954  0.00313 ** 
## number_deworm_year                                          -0.791  0.42883    
## as.numeric(nem_scaled):density_per_acre                      3.279  0.00104 ** 
## as.numeric(nem_scaled):number_deworm_year                    4.199 2.68e-05 ***
## density_per_acre:number_deworm_year                          1.888  0.05897 .  
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -3.173  0.00151 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 751.13  on 43  degrees of freedom
## Residual deviance: 321.46  on 36  degrees of freedom
## AIC: 447.49
## 
## Number of Fisher Scoring iterations: 7

6.4 Negative binomial model

nb_fit <- glmmTMB::glmmTMB(
  frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year,
  data = work_sorted,
  family = glmmTMB::nbinom2
)

ic_f(nb_fit)
## -2 logLik       AIC       BIC 
##  237.7496  255.7496  271.8073
summary(nb_fit)
##  Family: nbinom2  ( log )
## Formula:          
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year
## Data: work_sorted
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     255.7     271.8    -118.9     237.7        35 
## 
## 
## Dispersion parameter for nbinom2 family (): 0.636 
## 
## Conditional model:
##                                                            Estimate Std. Error
## (Intercept)                                                  4.8630     1.4279
## as.numeric(nem_scaled)                                      -1.4091     0.5899
## density_per_acre                                            -4.1564     6.2280
## number_deworm_year                                          -0.4290     0.8645
## as.numeric(nem_scaled):density_per_acre                      1.6951     2.5989
## as.numeric(nem_scaled):number_deworm_year                    0.3477     0.3520
## density_per_acre:number_deworm_year                          1.7221     3.1183
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -0.8052     1.3004
##                                                            z value Pr(>|z|)    
## (Intercept)                                                  3.406  0.00066 ***
## as.numeric(nem_scaled)                                      -2.389  0.01690 *  
## density_per_acre                                            -0.667  0.50453    
## number_deworm_year                                          -0.496  0.61974    
## as.numeric(nem_scaled):density_per_acre                      0.652  0.51423    
## as.numeric(nem_scaled):number_deworm_year                    0.988  0.32329    
## density_per_acre:number_deworm_year                          0.552  0.58078    
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -0.619  0.53579    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.5 Negative binomial with heterogeneous dispersion

nb_disp_fit <- glmmTMB::glmmTMB(
  frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year,
  dispformula = ~ as.numeric(nem_scaled),
  data = work_sorted,
  family = glmmTMB::nbinom2
)

ic_f(nb_disp_fit)
## -2 logLik       AIC       BIC 
##  224.7915  244.7915  262.6334
summary(nb_disp_fit)
##  Family: nbinom2  ( log )
## Formula:          
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year
## Dispersion:                 ~as.numeric(nem_scaled)
## Data: work_sorted
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     244.8     262.6    -112.4     224.8        34 
## 
## 
## Conditional model:
##                                                            Estimate Std. Error
## (Intercept)                                                  5.1250     1.0252
## as.numeric(nem_scaled)                                      -1.5698     0.6193
## density_per_acre                                            -6.9256     4.6670
## number_deworm_year                                          -0.5252     0.6142
## as.numeric(nem_scaled):density_per_acre                      3.2389     2.8686
## as.numeric(nem_scaled):number_deworm_year                    0.4056     0.3609
## density_per_acre:number_deworm_year                          3.1263     2.3380
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -1.5878     1.4345
##                                                            z value Pr(>|z|)    
## (Intercept)                                                  4.999 5.76e-07 ***
## as.numeric(nem_scaled)                                      -2.535   0.0112 *  
## density_per_acre                                            -1.484   0.1378    
## number_deworm_year                                          -0.855   0.3926    
## as.numeric(nem_scaled):density_per_acre                      1.129   0.2589    
## as.numeric(nem_scaled):number_deworm_year                    1.124   0.2610    
## density_per_acre:number_deworm_year                          1.337   0.1812    
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -1.107   0.2683    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Dispersion model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              1.8058     0.6406   2.819 0.004821 ** 
## as.numeric(nem_scaled)  -0.9919     0.2706  -3.666 0.000247 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.6 Final model: negative binomial with heterogeneous dispersion and farm random intercept

final_fit <- glmmTMB::glmmTMB(
  frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year + (1 | Farms),
  dispformula = ~ as.numeric(nem_scaled),
  data = work_sorted,
  family = glmmTMB::nbinom2
)

ic_f(final_fit)
## -2 logLik       AIC       BIC 
##  212.0262  234.0262  253.6523
summary(final_fit)
##  Family: nbinom2  ( log )
## Formula:          
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year +  
##     (1 | Farms)
## Dispersion:                 ~as.numeric(nem_scaled)
## Data: work_sorted
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     234.0     253.7    -106.0     212.0        33 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  Farms  (Intercept) 0.4596   0.678   
## Number of obs: 44, groups:  Farms, 11
## 
## Conditional model:
##                                                            Estimate Std. Error
## (Intercept)                                                  5.3010     0.8413
## as.numeric(nem_scaled)                                      -1.7399     0.4140
## density_per_acre                                            -8.0073     3.5461
## number_deworm_year                                          -0.7212     0.5562
## as.numeric(nem_scaled):density_per_acre                      4.4917     1.8648
## as.numeric(nem_scaled):number_deworm_year                    0.3519     0.2527
## density_per_acre:number_deworm_year                          3.7401     1.7838
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -2.2091     0.9352
##                                                            z value Pr(>|z|)    
## (Intercept)                                                  6.301 2.96e-10 ***
## as.numeric(nem_scaled)                                      -4.202 2.64e-05 ***
## density_per_acre                                            -2.258   0.0239 *  
## number_deworm_year                                          -1.297   0.1948    
## as.numeric(nem_scaled):density_per_acre                      2.409   0.0160 *  
## as.numeric(nem_scaled):number_deworm_year                    1.393   0.1637    
## density_per_acre:number_deworm_year                          2.097   0.0360 *  
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -2.362   0.0182 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Dispersion model:
##                        Estimate Std. Error z value Pr(>|z|)   
## (Intercept)               5.085      1.857   2.738  0.00619 **
## as.numeric(nem_scaled)   -1.796      0.593  -3.029  0.00245 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6.7 Zero-inflated negative binomial mixed model

zinb_fit <- glmmTMB::glmmTMB(
  frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year + (1 | Farms),
  ziformula = ~ as.numeric(nem_scaled),
  data = work_sorted,
  family = glmmTMB::nbinom2
)

ic_f(zinb_fit)
## -2 logLik       AIC       BIC 
##  214.8601  238.8601  260.2704
summary(zinb_fit)
##  Family: nbinom2  ( log )
## Formula:          
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year +  
##     (1 | Farms)
## Zero inflation:             ~as.numeric(nem_scaled)
## Data: work_sorted
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##     238.9     260.3    -107.4     214.9        32 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  Farms  (Intercept) 0.4102   0.6405  
## Number of obs: 44, groups:  Farms, 11
## 
## Dispersion parameter for nbinom2 family (): 5.58 
## 
## Conditional model:
##                                                            Estimate Std. Error
## (Intercept)                                                  4.4774     0.8998
## as.numeric(nem_scaled)                                      -1.0489     0.3840
## density_per_acre                                            -5.6644     3.4206
## number_deworm_year                                          -0.4946     0.5843
## as.numeric(nem_scaled):density_per_acre                      2.5222     1.4125
## as.numeric(nem_scaled):number_deworm_year                    0.2046     0.2175
## density_per_acre:number_deworm_year                          2.5497     1.7217
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -1.2294     0.7065
##                                                            z value Pr(>|z|)    
## (Intercept)                                                  4.976 6.49e-07 ***
## as.numeric(nem_scaled)                                      -2.732   0.0063 ** 
## density_per_acre                                            -1.656   0.0977 .  
## number_deworm_year                                          -0.847   0.3972    
## as.numeric(nem_scaled):density_per_acre                      1.786   0.0742 .  
## as.numeric(nem_scaled):number_deworm_year                    0.941   0.3468    
## density_per_acre:number_deworm_year                          1.481   0.1386    
## as.numeric(nem_scaled):density_per_acre:number_deworm_year  -1.740   0.0818 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##                        Estimate Std. Error z value Pr(>|z|)   
## (Intercept)             -4.9368     1.6720  -2.953  0.00315 **
## as.numeric(nem_scaled)   1.4317     0.5207   2.749  0.00597 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7 Embedded results from the final report

7.1 Model-selection summary

Model-selection summary.
Model AIC BIC logLik Selected
Poisson trend model 447.49 NA NA No
Negative binomial 255.70 271.8 -118.9 No
NB + heterogeneous dispersion 244.80 262.6 -112.4 No
NB + heterogeneous dispersion + random intercept 234.00 253.7 -106.0 Yes
Zero-inflated NB mixed model 238.90 260.3 -107.4 No

The negative binomial model with heterogeneous dispersion and a farm-level random intercept had the lowest AIC and was selected as the final model. The zero-inflated mixed model was biologically plausible, but it did not improve AIC or BIC enough to justify replacing the selected model.

7.2 Final model coefficient summary

Final selected model: conditional mean component.
Term Estimate SE z p_value CI_2.5 CI_97.5
Intercept 5.3010 0.8413 6.3010 <0.001 3.6521 6.9499
scale -1.7399 0.4140 -4.2024 <0.001 -2.5514 -0.9284
density -8.0073 3.5461 -2.2581 0.024 -14.9575 -1.0572
deworm -0.7212 0.5562 -1.2965 0.195 -1.8113 0.3690
scale:density 4.4917 1.8648 2.4087 0.016 0.8368 8.1466
scale:deworm 0.3519 0.2527 1.3927 0.164 -0.1433 0.8471
density:deworm 3.7401 1.7838 2.0967 0.036 0.2439 7.2364
scale:density:deworm -2.2091 0.9352 -2.3621 0.018 -4.0421 -0.3761

Random effect and dispersion components:

Final selected model: random-effect and dispersion components.
Component Estimate SE p_value
Farm random intercept variance 0.4596 NA NA
Farm random intercept SD 0.6780 NA NA
Dispersion intercept 5.0850 1.857 0.006
Dispersion scale coefficient -1.7960 0.593 0.002

8 Interpretation

The selected model suggests that the mean count across infection-intensity scale categories is not adequately described by a simple Poisson process. The large AIC reduction from the Poisson trend model to the negative binomial models shows that overdispersion mattered. Adding heterogeneous dispersion further improved model fit, meaning that extra-Poisson variation was not constant across the infection-intensity scale. Adding a farm-level random intercept improved the model again, supporting the presence of unexplained between-farm heterogeneity.

In the final model, the coefficient for the infection scale was negative, which is consistent with fewer animals appearing in higher-intensity categories. Density and the scale-by-density interaction were important, and the three-way interaction among scale, density, and deworming frequency was also retained in the final model. Because the model contains interactions, single main-effect coefficients should not be interpreted in isolation. The biological meaning depends on the combined values of infection scale, stocking density, and deworming frequency.

8.1 Exploratory visualization

Before fitting the candidate models, the observed count distribution was examined across the Nematodirus infection-intensity scale. These plots were used to assess sparsity across higher-intensity categories, between-farm heterogeneity, and the overall count pattern motivating count-based models.

library(ggplot2)

ggplot(work_sorted, aes(x = nem_scaled, y = frequency)) +
  geom_boxplot() +
  labs(
    x = "Nematodirus infection-intensity scale",
    y = "Observed frequency",
    title = "Distribution of sheep counts by intensity scale"
  ) +
  theme_minimal()
Figure 1: Observed frequency by Nematodirus infection-intensity scale.

Figure 1: Observed frequency by Nematodirus infection-intensity scale.

ggplot(work_sorted, aes(x = nem_scaled, y = frequency, group = Farms)) +
  geom_line(alpha = 0.5) +
  geom_point() +
  labs(
    x = "Nematodirus infection-intensity scale",
    y = "Observed frequency",
    title = "Farm-specific observed count profiles"
  ) +
  theme_minimal()
Figure 1: Farm-specific observed count profiles across Nematodirus infection-intensity scale.

Figure 1: Farm-specific observed count profiles across Nematodirus infection-intensity scale.

The boxplot summarizes how observed sheep counts vary across infection-intensity categories. The farm-specific profile plot shows that farms differ in their baseline count patterns and in how counts are distributed across the ordered intensity scale. These patterns support the use of models that can account for overdispersion and unexplained farm-level heterogeneity.

9 Limitations

This should be presented as a pilot analysis rather than a definitive statewide inference. The dataset is small, farm participation was based on convenience sampling, and management predictors were measured at the farm level. The selected model accounts for several important statistical features, but it cannot fully remove sampling limitations or establish causality.

10 Conclusion

A staged log-linear modeling workflow provided a defensible framework for analyzing aggregated Nematodirus infection-intensity counts in sheep. The multinomial model served as the conceptual starting point, and the equivalent Poisson log-linear model justified the transition into count-based extensions. The selected model was a negative binomial log-linear mixed model with heterogeneous dispersion and a farm-level random intercept. This model balanced fit, parsimony, and biological interpretability better than the simpler Poisson and negative binomial models and better than the added zero-inflated alternative.

11 Reproducibility notes

sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.5.2 knitr_1.50   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6        TMB_1.9.17          xfun_0.52          
##  [4] bslib_0.9.0         lattice_0.22-6      tzdb_0.5.0         
##  [7] numDeriv_2016.8-1.1 vctrs_0.7.3         tools_4.5.0        
## [10] Rdpack_2.6.4        generics_0.1.3      parallel_4.5.0     
## [13] sandwich_3.1-1      tibble_3.2.1        pkgconfig_2.0.3    
## [16] Matrix_1.7-3        RColorBrewer_1.1-3  lifecycle_1.0.4    
## [19] compiler_4.5.0      farver_2.1.2        codetools_0.2-20   
## [22] htmltools_0.5.8.1   sass_0.4.10         yaml_2.3.10        
## [25] crayon_1.5.3        furrr_0.4.0         pillar_1.10.2      
## [28] nloptr_2.2.1        jquerylib_0.1.4     tidyr_1.3.1        
## [31] MASS_7.3-65         broom.mixed_0.2.9.7 cachem_1.1.0       
## [34] reformulas_0.4.0    boot_1.3-31         multcomp_1.4-28    
## [37] parallelly_1.47.0   nlme_3.1-168        tidyselect_1.2.1   
## [40] digest_0.6.37       future_1.70.0       mvtnorm_1.3-3      
## [43] listenv_0.9.1       dplyr_1.1.4         purrr_1.2.2        
## [46] labeling_0.4.3      forcats_1.0.0       splines_4.5.0      
## [49] fastmap_1.2.0       grid_4.5.0          cli_3.6.4          
## [52] magrittr_2.0.3      survival_3.8-3      broom_1.0.8        
## [55] TH.data_1.1-3       withr_3.0.2         readr_2.1.5        
## [58] scales_1.4.0        backports_1.5.0     bit64_4.6.0-1      
## [61] estimability_1.5.1  rmarkdown_2.29      globals_0.19.1     
## [64] emmeans_1.11.1      bit_4.6.0           nnet_7.3-20        
## [67] lme4_1.1-37         zoo_1.8-14          hms_1.1.3          
## [70] coda_0.19-4.1       evaluate_1.0.3      glmmTMB_1.1.11     
## [73] rbibutils_2.3       mgcv_1.9-1          rlang_1.2.0        
## [76] Rcpp_1.0.14         xtable_1.8-4        glue_1.8.0         
## [79] vroom_1.6.5         rstudioapi_0.17.1   minqa_1.2.8        
## [82] jsonlite_2.0.0      R6_2.6.1

References

Agresti, Alan. 2013. Categorical Data Analysis. 3rd ed. Hoboken, NJ: Wiley.
Bilder, Christopher R., and Thomas M. Loughin. 2024. Analysis of Categorical Data with r. 2nd ed. Boca Raton, FL: Chapman; Hall/CRC. https://doi.org/10.1201/9781003093091.
Brooks, Mollie E., Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Benjamin M. Bolker. 2017. glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling.” The R Journal 9 (2): 378–400. https://doi.org/10.32614/RJ-2017-066.
Hilbe, Joseph M. 2011. Negative Binomial Regression. 2nd ed. Cambridge: Cambridge University Press.
Lawless, J. F. 1987. “Negative Binomial and Mixed Poisson Regression.” Canadian Journal of Statistics 15 (3): 209–25. https://doi.org/10.2307/3314912.