Nematodirus infection intensity in Wyoming sheep is a useful applied setting for comparing categorical and count-based modeling strategies. The response is the number of sheep in ordered infection-intensity categories within each farm. A multinomial model is a natural starting point because the counts are allocated across mutually exclusive categories, but a log-linear formulation is more flexible for modeling overdispersion, farm-level clustering, and zero inflation. This report presents a staged analysis using multinomial, Poisson, negative binomial, heterogeneous-dispersion negative binomial, mixed-effects negative binomial, and zero-inflated negative binomial models. The final selected model was a negative binomial log-linear model with heterogeneous dispersion and a farm-level random intercept.
The main aim is to build a transparent, reproducible modeling workflow for aggregated Nematodirus infection-intensity data. The analysis asks four connected questions:
Gastrointestinal nematode infections are persistent health and production problems in grazing sheep. Infection patterns are shaped by environmental exposure, host immunity, farm management, grazing pressure, and treatment history. Nematodirus is especially relevant in cooler climates because environmental persistence and hatching behavior can produce uneven infection patterns across farms and seasons.
From a statistical perspective, the data are challenging because the counts are grouped by farm and ordered intensity scale. The higher-intensity categories are sparse, and farms may differ in baseline infection intensity for reasons not fully captured by measured predictors. These features motivate a model sequence that begins with multinomial and Poisson log-linear models and then expands to negative binomial, mixed-effects, and zero-inflated formulations (Agresti 2013; Bilder and Loughin 2024; Hilbe 2011; Brooks et al. 2017).
data_path <- "data/work_data.csv"
data_available <- file.exists(data_path)
work_dat <- readr::read_csv(data_path, show_col_types = FALSE)
work_dat$number_deworm_year <- as.numeric(work_dat$number_deworm_year)
work_dat$nem_scaled <- factor(work_dat$nem_scaled)
work_dat$Farms <- factor(work_dat$Farms)
work_sorted <- work_dat[order(work_dat$Farms, work_dat$nem_scaled), ]
work_sorted$nem_scaled <- stats::relevel(work_sorted$nem_scaled, ref = "0")
Expected variables:
| Variable | Role | Notes |
|---|---|---|
| Farms | Farm identifier / grouping variable | Should be anonymized before public posting |
| nem_scaled | Ordered Nematodirus infection-intensity scale | Scale 0 is the reference category |
| frequency | Number of sheep in each farm-by-scale cell | Count response |
| density_per_acre | Farm-level stocking-density predictor | Used in interactions |
| number_deworm_year | Farm-level deworming-frequency predictor | Used in interactions |
Let \(Y_{ij}\) denote the number of sheep from farm \(i\) in intensity category \(j\), and let \(M_i = \sum_j Y_{ij}\) be the total number of sheep represented for farm \(i\). A multinomial model conditions on \(M_i\) and models the category probabilities:
\[ Y_i \mid M_i, \pi_i \sim \text{Multinomial}(M_i, \pi_{i1}, \ldots, \pi_{ic}). \]
This is a useful conceptual starting point, but it is less convenient when the goal is to expand the model to include overdispersion, heterogeneous dispersion, random effects, and zero inflation.
The equivalent Poisson log-linear representation treats each category count as a count response:
\[ Y_{ij} \mid \mu_{ij} \sim \text{Poisson}(\mu_{ij}), \quad \log(\mu_{ij}) = \eta_{ij}. \]
The first Poisson model used farm-specific fixed effects and categorical intensity-scale effects to reproduce the multinomial contrasts. After that equivalence was checked, the scale variable was treated numerically to impose an ordinal trend and reduce the number of parameters.
The negative binomial model relaxes the Poisson mean-variance equality and is a better candidate when parasite counts show aggregation or extra-Poisson variation (Hilbe 2011; Lawless 1987). A farm-level random intercept was then added to account for unexplained between-farm heterogeneity. The final selected model was:
\[ Y_{ij} \sim \text{NB2}(\mu_{ij}, \theta_j), \]
\[ \log(\mu_{ij}) = \beta_0 + \beta_1\text{scale}_{ij} + \beta_2\text{density}_{i} + \beta_3\text{deworm}_{i} + \beta_4(\text{scale}_{ij} \times \text{density}_{i}) + \beta_5(\text{scale}_{ij} \times \text{deworm}_{i}) + \beta_6(\text{density}_{i} \times \text{deworm}_{i}) + \beta_7(\text{scale}_{ij} \times \text{density}_{i} \times \text{deworm}_{i}) + b_i, \]
where \(b_i\) is the farm-level random intercept. Heterogeneous dispersion was modeled as a function of the intensity scale.
ic_f <- function(fit) {
c(`-2 logLik` = as.numeric(-2 * logLik(fit)), AIC = AIC(fit), BIC = BIC(fit))
}
tidy_glmmtmb_conditional <- function(fit) {
broom.mixed::tidy(fit, component = "cond", conf.int = TRUE)
}
multinom_fit <- nnet::multinom(
nem_scaled ~ density_per_acre * number_deworm_year,
weights = frequency,
data = work_sorted,
trace = FALSE
)
ic_f(multinom_fit)
## -2 logLik AIC BIC
## 749.9916 773.9916 820.9212
summary(multinom_fit)
## Call:
## nnet::multinom(formula = nem_scaled ~ density_per_acre * number_deworm_year,
## data = work_sorted, weights = frequency, trace = FALSE)
##
## Coefficients:
## (Intercept) density_per_acre number_deworm_year
## 2 -2.239164 5.993926 0.6027921
## 3 -2.730779 5.916960 0.8727250
## 4 -4.069314 4.006807 0.7628583
## density_per_acre:number_deworm_year
## 2 -3.138593
## 3 -3.068355
## 4 -1.864561
##
## Std. Errors:
## (Intercept) density_per_acre number_deworm_year
## 2 0.3941462 1.827218 0.2286101
## 3 0.4593893 1.918708 0.2586518
## 4 0.9236870 3.641792 0.5141089
## density_per_acre:number_deworm_year
## 2 0.9412728
## 3 0.9803509
## 4 1.8280003
##
## Residual Deviance: 749.9916
## AIC: 773.9916
poisson_equiv <- glm(
frequency ~ Farms + nem_scaled * density_per_acre * number_deworm_year,
family = poisson(link = "log"),
data = work_sorted
)
ic_f(poisson_equiv)
## -2 logLik AIC BIC
## 184.3247 230.3247 271.3611
summary(poisson_equiv)
##
## Call:
## glm(formula = frequency ~ Farms + nem_scaled * density_per_acre *
## number_deworm_year, family = poisson(link = "log"), data = work_sorted)
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error z value
## (Intercept) 4.4841 0.1031 43.492
## Farmsb -3.1280 0.5062 -6.179
## Farmsc -0.6158 0.1775 -3.470
## Farmsd -2.2133 0.2862 -7.733
## Farmse -1.7528 0.2366 -7.408
## Farmsf -2.5128 0.3082 -8.154
## Farmsg -3.2060 0.4223 -7.593
## Farmsh -2.9021 0.3730 -7.781
## Farmsi -1.7193 0.2281 -7.538
## Farmsj -2.5925 0.3960 -6.546
## Farmsk -1.5668 0.2090 -7.499
## nem_scaled2 -2.2391 0.3941 -5.681
## nem_scaled3 -2.7308 0.4594 -5.944
## nem_scaled4 -4.0693 0.9237 -4.406
## density_per_acre NA NA NA
## number_deworm_year NA NA NA
## nem_scaled2:density_per_acre 5.9937 1.8272 3.280
## nem_scaled3:density_per_acre 5.9172 1.9187 3.084
## nem_scaled4:density_per_acre 4.0070 3.6417 1.100
## nem_scaled2:number_deworm_year 0.6027 0.2286 2.637
## nem_scaled3:number_deworm_year 0.8727 0.2587 3.374
## nem_scaled4:number_deworm_year 0.7628 0.5141 1.484
## density_per_acre:number_deworm_year NA NA NA
## nem_scaled2:density_per_acre:number_deworm_year -3.1384 0.9413 -3.334
## nem_scaled3:density_per_acre:number_deworm_year -3.0684 0.9803 -3.130
## nem_scaled4:density_per_acre:number_deworm_year -1.8646 1.8280 -1.020
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## Farmsb 6.43e-10 ***
## Farmsc 0.000520 ***
## Farmsd 1.05e-14 ***
## Farmse 1.28e-13 ***
## Farmsf 3.51e-16 ***
## Farmsg 3.14e-14 ***
## Farmsh 7.22e-15 ***
## Farmsi 4.79e-14 ***
## Farmsj 5.90e-11 ***
## Farmsk 6.45e-14 ***
## nem_scaled2 1.34e-08 ***
## nem_scaled3 2.77e-09 ***
## nem_scaled4 1.05e-05 ***
## density_per_acre NA
## number_deworm_year NA
## nem_scaled2:density_per_acre 0.001037 **
## nem_scaled3:density_per_acre 0.002043 **
## nem_scaled4:density_per_acre 0.271207
## nem_scaled2:number_deworm_year 0.008376 **
## nem_scaled3:number_deworm_year 0.000740 ***
## nem_scaled4:number_deworm_year 0.137874
## density_per_acre:number_deworm_year NA
## nem_scaled2:density_per_acre:number_deworm_year 0.000855 ***
## nem_scaled3:density_per_acre:number_deworm_year 0.001748 **
## nem_scaled4:density_per_acre:number_deworm_year 0.307708
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 751.130 on 43 degrees of freedom
## Residual deviance: 74.296 on 21 degrees of freedom
## AIC: 230.32
##
## Number of Fisher Scoring iterations: 7
poisson_trend <- glm(
frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year,
family = poisson(link = "log"),
data = work_sorted
)
ic_f(poisson_trend)
## -2 logLik AIC BIC
## 431.4898 447.4898 461.7634
summary(poisson_trend)
##
## Call:
## glm(formula = frequency ~ as.numeric(nem_scaled) * density_per_acre *
## number_deworm_year, family = poisson(link = "log"), data = work_sorted)
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 4.7768 0.3562
## as.numeric(nem_scaled) -1.6718 0.2140
## density_per_acre -4.0889 1.3841
## number_deworm_year -0.1854 0.2344
## as.numeric(nem_scaled):density_per_acre 2.0585 0.6278
## as.numeric(nem_scaled):number_deworm_year 0.5547 0.1321
## density_per_acre:number_deworm_year 1.3943 0.7383
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -1.0783 0.3398
## z value Pr(>|z|)
## (Intercept) 13.412 < 2e-16 ***
## as.numeric(nem_scaled) -7.812 5.65e-15 ***
## density_per_acre -2.954 0.00313 **
## number_deworm_year -0.791 0.42883
## as.numeric(nem_scaled):density_per_acre 3.279 0.00104 **
## as.numeric(nem_scaled):number_deworm_year 4.199 2.68e-05 ***
## density_per_acre:number_deworm_year 1.888 0.05897 .
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -3.173 0.00151 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 751.13 on 43 degrees of freedom
## Residual deviance: 321.46 on 36 degrees of freedom
## AIC: 447.49
##
## Number of Fisher Scoring iterations: 7
nb_fit <- glmmTMB::glmmTMB(
frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year,
data = work_sorted,
family = glmmTMB::nbinom2
)
ic_f(nb_fit)
## -2 logLik AIC BIC
## 237.7496 255.7496 271.8073
summary(nb_fit)
## Family: nbinom2 ( log )
## Formula:
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year
## Data: work_sorted
##
## AIC BIC logLik -2*log(L) df.resid
## 255.7 271.8 -118.9 237.7 35
##
##
## Dispersion parameter for nbinom2 family (): 0.636
##
## Conditional model:
## Estimate Std. Error
## (Intercept) 4.8630 1.4279
## as.numeric(nem_scaled) -1.4091 0.5899
## density_per_acre -4.1564 6.2280
## number_deworm_year -0.4290 0.8645
## as.numeric(nem_scaled):density_per_acre 1.6951 2.5989
## as.numeric(nem_scaled):number_deworm_year 0.3477 0.3520
## density_per_acre:number_deworm_year 1.7221 3.1183
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -0.8052 1.3004
## z value Pr(>|z|)
## (Intercept) 3.406 0.00066 ***
## as.numeric(nem_scaled) -2.389 0.01690 *
## density_per_acre -0.667 0.50453
## number_deworm_year -0.496 0.61974
## as.numeric(nem_scaled):density_per_acre 0.652 0.51423
## as.numeric(nem_scaled):number_deworm_year 0.988 0.32329
## density_per_acre:number_deworm_year 0.552 0.58078
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -0.619 0.53579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
nb_disp_fit <- glmmTMB::glmmTMB(
frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year,
dispformula = ~ as.numeric(nem_scaled),
data = work_sorted,
family = glmmTMB::nbinom2
)
ic_f(nb_disp_fit)
## -2 logLik AIC BIC
## 224.7915 244.7915 262.6334
summary(nb_disp_fit)
## Family: nbinom2 ( log )
## Formula:
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year
## Dispersion: ~as.numeric(nem_scaled)
## Data: work_sorted
##
## AIC BIC logLik -2*log(L) df.resid
## 244.8 262.6 -112.4 224.8 34
##
##
## Conditional model:
## Estimate Std. Error
## (Intercept) 5.1250 1.0252
## as.numeric(nem_scaled) -1.5698 0.6193
## density_per_acre -6.9256 4.6670
## number_deworm_year -0.5252 0.6142
## as.numeric(nem_scaled):density_per_acre 3.2389 2.8686
## as.numeric(nem_scaled):number_deworm_year 0.4056 0.3609
## density_per_acre:number_deworm_year 3.1263 2.3380
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -1.5878 1.4345
## z value Pr(>|z|)
## (Intercept) 4.999 5.76e-07 ***
## as.numeric(nem_scaled) -2.535 0.0112 *
## density_per_acre -1.484 0.1378
## number_deworm_year -0.855 0.3926
## as.numeric(nem_scaled):density_per_acre 1.129 0.2589
## as.numeric(nem_scaled):number_deworm_year 1.124 0.2610
## density_per_acre:number_deworm_year 1.337 0.1812
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -1.107 0.2683
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8058 0.6406 2.819 0.004821 **
## as.numeric(nem_scaled) -0.9919 0.2706 -3.666 0.000247 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
final_fit <- glmmTMB::glmmTMB(
frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year + (1 | Farms),
dispformula = ~ as.numeric(nem_scaled),
data = work_sorted,
family = glmmTMB::nbinom2
)
ic_f(final_fit)
## -2 logLik AIC BIC
## 212.0262 234.0262 253.6523
summary(final_fit)
## Family: nbinom2 ( log )
## Formula:
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year +
## (1 | Farms)
## Dispersion: ~as.numeric(nem_scaled)
## Data: work_sorted
##
## AIC BIC logLik -2*log(L) df.resid
## 234.0 253.7 -106.0 212.0 33
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Farms (Intercept) 0.4596 0.678
## Number of obs: 44, groups: Farms, 11
##
## Conditional model:
## Estimate Std. Error
## (Intercept) 5.3010 0.8413
## as.numeric(nem_scaled) -1.7399 0.4140
## density_per_acre -8.0073 3.5461
## number_deworm_year -0.7212 0.5562
## as.numeric(nem_scaled):density_per_acre 4.4917 1.8648
## as.numeric(nem_scaled):number_deworm_year 0.3519 0.2527
## density_per_acre:number_deworm_year 3.7401 1.7838
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -2.2091 0.9352
## z value Pr(>|z|)
## (Intercept) 6.301 2.96e-10 ***
## as.numeric(nem_scaled) -4.202 2.64e-05 ***
## density_per_acre -2.258 0.0239 *
## number_deworm_year -1.297 0.1948
## as.numeric(nem_scaled):density_per_acre 2.409 0.0160 *
## as.numeric(nem_scaled):number_deworm_year 1.393 0.1637
## density_per_acre:number_deworm_year 2.097 0.0360 *
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -2.362 0.0182 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.085 1.857 2.738 0.00619 **
## as.numeric(nem_scaled) -1.796 0.593 -3.029 0.00245 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
zinb_fit <- glmmTMB::glmmTMB(
frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year + (1 | Farms),
ziformula = ~ as.numeric(nem_scaled),
data = work_sorted,
family = glmmTMB::nbinom2
)
ic_f(zinb_fit)
## -2 logLik AIC BIC
## 214.8601 238.8601 260.2704
summary(zinb_fit)
## Family: nbinom2 ( log )
## Formula:
## frequency ~ as.numeric(nem_scaled) * density_per_acre * number_deworm_year +
## (1 | Farms)
## Zero inflation: ~as.numeric(nem_scaled)
## Data: work_sorted
##
## AIC BIC logLik -2*log(L) df.resid
## 238.9 260.3 -107.4 214.9 32
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## Farms (Intercept) 0.4102 0.6405
## Number of obs: 44, groups: Farms, 11
##
## Dispersion parameter for nbinom2 family (): 5.58
##
## Conditional model:
## Estimate Std. Error
## (Intercept) 4.4774 0.8998
## as.numeric(nem_scaled) -1.0489 0.3840
## density_per_acre -5.6644 3.4206
## number_deworm_year -0.4946 0.5843
## as.numeric(nem_scaled):density_per_acre 2.5222 1.4125
## as.numeric(nem_scaled):number_deworm_year 0.2046 0.2175
## density_per_acre:number_deworm_year 2.5497 1.7217
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -1.2294 0.7065
## z value Pr(>|z|)
## (Intercept) 4.976 6.49e-07 ***
## as.numeric(nem_scaled) -2.732 0.0063 **
## density_per_acre -1.656 0.0977 .
## number_deworm_year -0.847 0.3972
## as.numeric(nem_scaled):density_per_acre 1.786 0.0742 .
## as.numeric(nem_scaled):number_deworm_year 0.941 0.3468
## density_per_acre:number_deworm_year 1.481 0.1386
## as.numeric(nem_scaled):density_per_acre:number_deworm_year -1.740 0.0818 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9368 1.6720 -2.953 0.00315 **
## as.numeric(nem_scaled) 1.4317 0.5207 2.749 0.00597 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| Model | AIC | BIC | logLik | Selected |
|---|---|---|---|---|
| Poisson trend model | 447.49 | NA | NA | No |
| Negative binomial | 255.70 | 271.8 | -118.9 | No |
| NB + heterogeneous dispersion | 244.80 | 262.6 | -112.4 | No |
| NB + heterogeneous dispersion + random intercept | 234.00 | 253.7 | -106.0 | Yes |
| Zero-inflated NB mixed model | 238.90 | 260.3 | -107.4 | No |
The negative binomial model with heterogeneous dispersion and a farm-level random intercept had the lowest AIC and was selected as the final model. The zero-inflated mixed model was biologically plausible, but it did not improve AIC or BIC enough to justify replacing the selected model.
| Term | Estimate | SE | z | p_value | CI_2.5 | CI_97.5 |
|---|---|---|---|---|---|---|
| Intercept | 5.3010 | 0.8413 | 6.3010 | <0.001 | 3.6521 | 6.9499 |
| scale | -1.7399 | 0.4140 | -4.2024 | <0.001 | -2.5514 | -0.9284 |
| density | -8.0073 | 3.5461 | -2.2581 | 0.024 | -14.9575 | -1.0572 |
| deworm | -0.7212 | 0.5562 | -1.2965 | 0.195 | -1.8113 | 0.3690 |
| scale:density | 4.4917 | 1.8648 | 2.4087 | 0.016 | 0.8368 | 8.1466 |
| scale:deworm | 0.3519 | 0.2527 | 1.3927 | 0.164 | -0.1433 | 0.8471 |
| density:deworm | 3.7401 | 1.7838 | 2.0967 | 0.036 | 0.2439 | 7.2364 |
| scale:density:deworm | -2.2091 | 0.9352 | -2.3621 | 0.018 | -4.0421 | -0.3761 |
Random effect and dispersion components:
| Component | Estimate | SE | p_value |
|---|---|---|---|
| Farm random intercept variance | 0.4596 | NA | NA |
| Farm random intercept SD | 0.6780 | NA | NA |
| Dispersion intercept | 5.0850 | 1.857 | 0.006 |
| Dispersion scale coefficient | -1.7960 | 0.593 | 0.002 |
The selected model suggests that the mean count across infection-intensity scale categories is not adequately described by a simple Poisson process. The large AIC reduction from the Poisson trend model to the negative binomial models shows that overdispersion mattered. Adding heterogeneous dispersion further improved model fit, meaning that extra-Poisson variation was not constant across the infection-intensity scale. Adding a farm-level random intercept improved the model again, supporting the presence of unexplained between-farm heterogeneity.
In the final model, the coefficient for the infection scale was negative, which is consistent with fewer animals appearing in higher-intensity categories. Density and the scale-by-density interaction were important, and the three-way interaction among scale, density, and deworming frequency was also retained in the final model. Because the model contains interactions, single main-effect coefficients should not be interpreted in isolation. The biological meaning depends on the combined values of infection scale, stocking density, and deworming frequency.
Before fitting the candidate models, the observed count distribution was examined across the Nematodirus infection-intensity scale. These plots were used to assess sparsity across higher-intensity categories, between-farm heterogeneity, and the overall count pattern motivating count-based models.
library(ggplot2)
ggplot(work_sorted, aes(x = nem_scaled, y = frequency)) +
geom_boxplot() +
labs(
x = "Nematodirus infection-intensity scale",
y = "Observed frequency",
title = "Distribution of sheep counts by intensity scale"
) +
theme_minimal()
Figure 1: Observed frequency by Nematodirus infection-intensity scale.
ggplot(work_sorted, aes(x = nem_scaled, y = frequency, group = Farms)) +
geom_line(alpha = 0.5) +
geom_point() +
labs(
x = "Nematodirus infection-intensity scale",
y = "Observed frequency",
title = "Farm-specific observed count profiles"
) +
theme_minimal()
Figure 1: Farm-specific observed count profiles across Nematodirus infection-intensity scale.
The boxplot summarizes how observed sheep counts vary across infection-intensity categories. The farm-specific profile plot shows that farms differ in their baseline count patterns and in how counts are distributed across the ordered intensity scale. These patterns support the use of models that can account for overdispersion and unexplained farm-level heterogeneity.
This should be presented as a pilot analysis rather than a definitive statewide inference. The dataset is small, farm participation was based on convenience sampling, and management predictors were measured at the farm level. The selected model accounts for several important statistical features, but it cannot fully remove sampling limitations or establish causality.
A staged log-linear modeling workflow provided a defensible framework for analyzing aggregated Nematodirus infection-intensity counts in sheep. The multinomial model served as the conceptual starting point, and the equivalent Poisson log-linear model justified the transition into count-based extensions. The selected model was a negative binomial log-linear mixed model with heterogeneous dispersion and a farm-level random intercept. This model balanced fit, parsimony, and biological interpretability better than the simpler Poisson and negative binomial models and better than the added zero-inflated alternative.
sessionInfo()
## R version 4.5.0 (2025-04-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.2 knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 TMB_1.9.17 xfun_0.52
## [4] bslib_0.9.0 lattice_0.22-6 tzdb_0.5.0
## [7] numDeriv_2016.8-1.1 vctrs_0.7.3 tools_4.5.0
## [10] Rdpack_2.6.4 generics_0.1.3 parallel_4.5.0
## [13] sandwich_3.1-1 tibble_3.2.1 pkgconfig_2.0.3
## [16] Matrix_1.7-3 RColorBrewer_1.1-3 lifecycle_1.0.4
## [19] compiler_4.5.0 farver_2.1.2 codetools_0.2-20
## [22] htmltools_0.5.8.1 sass_0.4.10 yaml_2.3.10
## [25] crayon_1.5.3 furrr_0.4.0 pillar_1.10.2
## [28] nloptr_2.2.1 jquerylib_0.1.4 tidyr_1.3.1
## [31] MASS_7.3-65 broom.mixed_0.2.9.7 cachem_1.1.0
## [34] reformulas_0.4.0 boot_1.3-31 multcomp_1.4-28
## [37] parallelly_1.47.0 nlme_3.1-168 tidyselect_1.2.1
## [40] digest_0.6.37 future_1.70.0 mvtnorm_1.3-3
## [43] listenv_0.9.1 dplyr_1.1.4 purrr_1.2.2
## [46] labeling_0.4.3 forcats_1.0.0 splines_4.5.0
## [49] fastmap_1.2.0 grid_4.5.0 cli_3.6.4
## [52] magrittr_2.0.3 survival_3.8-3 broom_1.0.8
## [55] TH.data_1.1-3 withr_3.0.2 readr_2.1.5
## [58] scales_1.4.0 backports_1.5.0 bit64_4.6.0-1
## [61] estimability_1.5.1 rmarkdown_2.29 globals_0.19.1
## [64] emmeans_1.11.1 bit_4.6.0 nnet_7.3-20
## [67] lme4_1.1-37 zoo_1.8-14 hms_1.1.3
## [70] coda_0.19-4.1 evaluate_1.0.3 glmmTMB_1.1.11
## [73] rbibutils_2.3 mgcv_1.9-1 rlang_1.2.0
## [76] Rcpp_1.0.14 xtable_1.8-4 glue_1.8.0
## [79] vroom_1.6.5 rstudioapi_0.17.1 minqa_1.2.8
## [82] jsonlite_2.0.0 R6_2.6.1