Lactation modelling with Stan

Here we are fitting lactation curves for dairy cattle milk production, comparing a non-linear least squares with a Bayesian where there is limited data. The three parameter Wilmink curve is a reasonable option for modelling a cow’s lactation data (Wilmink 1987). It works well for experimental data where we may have measured yields every fortnight, week, or even daily. However, for most commercial farms, we might only have 3 or 4 measurements per year for each cow. In this case, a Bayesian approach makes sense, where the prior distribution for the lactation curve parameters can be determined from high resolution data, then we use the limited commercial cow data to determine posterior estimates for individual commercial cows. Additional restrictions may be applied, such as that the sum of all cows milk production matches the farm milk production, and this matching might be at a daily, monthly or annual level, depending on available data.

data <- read_csv("2003-04StrainProds.csv")
# head(data)
# make date variable
data <- data %>%
  mutate(PeriodFinishDate = dmy(PeriodFinish)) %>% 
  mutate(CowId = as_factor(CowId))

The base data comes from an experiment with 205 spring calving animals, over the 2003-04 production season.

#plot all cows
p <- ggplot(data = data, aes(x = PeriodFinishDate, y = MilkYield, colour = CowId)) +   geom_point()+
  theme_bw()+
  ylab("Milk yield (L for week)")+
  xlab("Date")+
  theme(legend.position = "none")
p
Individual cow milk yield for the week. Different colour denotes a different cow.

Figure 1: Individual cow milk yield for the week. Different colour denotes a different cow.

#unique(data$CowId)
CowIds <- as.vector(unique(data$CowId))
# CowIds.list <- split(CowIds, seq(length(CowIds)))

A sample of eleven cows, taking roughly every 20th cow, was then used for individual plots. Two were subsequently removed for low days in milk (late calving and/or early dryoff).

CowIds.initial.sample <- c(852,1909,9855,9933,9924,9886,9926,9772,9775,1991,1982)

p <- ggplot(data = data %>% filter(CowId %in% CowIds.initial.sample), 
            aes(x = PeriodFinishDate, y = MilkYield, colour = CowId)) + 
  geom_point() +
  geom_smooth()+
  ylab("Milk yield (L for week)")+
  xlab("Date")+
  theme_bw()+  
  theme(legend.position = "none")
#p
s <- p + facet_wrap( ~ CowId)
s
Individual cow milk yield for the week for a sample of cows.

Figure 2: Individual cow milk yield for the week for a sample of cows.

CowIds.sample <- sort(c(852,1909,9855,9933,9924,9886,9926,1991,1982) )#removed low dim/early dryoff, 9772,9775

Data wrangling was performed, including making a days in milk variable and excluding observations with missing milk yields (i.e. NA).

#remove observations with milk yield NA
data <- data %>% 
  filter(!is.na(MilkYield))

# create calving and dryoff dates - nb assumes only one lactation of data
# and add milk yeild per day
# DaysInMilk variable is the days in milk for that weekly period! i.e. 0 to 7.
data <- data %>% 
  mutate(StartOfMilkSupply = PeriodFinishDate - DaysInMilk) %>% 
  mutate(EndOfMilkSupply = PeriodFinishDate -7 + DaysInMilk) %>% 
  mutate(MilkYieldPerDay = MilkYield/DaysInMilk)  #imperfect, date for end of period, not midpoint
  


summary_data <- data %>%    
  group_by(CowId) %>%  #group by lactation if multiple lactations
  summarise(CalvingDate = min(StartOfMilkSupply, na.rm=TRUE), 
            Dryoff = max(EndOfMilkSupply, na.rm=TRUE),
            MaxDIM = Dryoff - CalvingDate)

#join summary to original data
data <- data %>% 
  left_join(summary_data)

#create proper midpoint - WORK IN PROGRESS
data <- data %>% 
  # mutate(PeriodMidDate = mean(c(StartOfMilkSupply),  
  #                            (EndOfMilkSupply))) #this doesn't work, base mean function not vectorised
    mutate(
      MY.DATE1=as.Date(StartOfMilkSupply),
      MY.DATE2=as.Date(EndOfMilkSupply)) %>% 
    rowwise %>%
    mutate(PeriodMidDate=mean.Date(c(MY.DATE1,MY.DATE2))) %>% 
#alternate mid date approach from:
#https://stackoverflow.com/a/26590499/4927395
#not quite right for start and end of lactation!
# date 1 needs to be min of start and calving? 
# date 2 needs to be max of end and dryoff? 
    mutate(DIM = as.numeric(PeriodFinishDate - CalvingDate))
p <- ggplot(data %>% filter(CowId %in% CowIds.sample), 
            aes(x = DIM, y = MilkYieldPerDay, fill = CowId, colour = CowId)) + 
  geom_point() +
  stat_smooth(method = 'nls', formula = y ~ a + b*exp(-.05*x) + c*x, se = FALSE, 
              method.args = list(start=c(a=31,b=-11,c=-.07), control=nls.control(maxiter=200)), colour = "green")+
  #stat_function(fun = fun.1)+
  geom_smooth()
#p
s <- p+ facet_wrap( ~ CowId)
s
Wilmink fitted with non-linear least squares

Figure 3: Wilmink fitted with non-linear least squares

nlsobject <- nls(formula = MilkYieldPerDay  ~ a + b*exp(d*DIM) + c*DIM, 
                 data %>% filter(CowId %in% c(852)),
                 start = c(a=31,b=-11,c=-.07, d=-.05))
                 
summary(nlsobject)
## 
## Formula: MilkYieldPerDay ~ a + b * exp(d * DIM) + c * DIM
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## a 24.103080   0.733282  32.870  < 2e-16 ***
## b -6.553762   3.500877  -1.872   0.0696 .  
## c -0.042360   0.004351  -9.735 1.71e-11 ***
## d -0.117254   0.102211  -1.147   0.2591    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.732 on 35 degrees of freedom
## 
## Number of iterations to convergence: 19 
## Achieved convergence tolerance: 9.975e-06
#tidymodels approach to multiple nls/ols

tidymodel_data <- data %>%
  ungroup() %>% #was rowwise_df, so needed to ungroup(), https://stackoverflow.com/a/33292159/4927395
  select(c("CowId", "DIM", "MilkYieldPerDay")) %>%  #select only essential columns. Data to nest within CowID, because multiple observations for each CowID at each DIM
  nest(yields = c(MilkYieldPerDay, DIM)) %>%
  mutate(model = map(
#         yields, ~ lm(MilkYieldPerDay ~ DIM, 
         yields, ~ nls(formula = MilkYieldPerDay  ~ a + b*exp(-.05*DIM) + c*DIM,
                       start = c(a=31,b=-11,c=-.07),
                    data = .x)))

  
#         yields, ~ nls(formula = MilkYieldPerDay  ~ a + b*exp(d*DIM) + c*DIM, 
#                   start = c(a=31,b=-11,c=-.07, d=-.05)),
# data = .x))

# Next, let’s tidy() those models to get out the coefficients, 
# and adjust the p-values for multiple comparisons while we’re at it.

slopes_db <- tidymodel_data %>%
  mutate(coefs = map(model, tidy)) %>%
  unnest(coefs) 
# %>%
#   filter(term == "year_ending") %>%
#   mutate(p.value = p.adjust(p.value))

pairs_data <- slopes_db %>% select(CowId, term, estimate) %>% 
  pivot_wider(names_from = term, values_from = estimate, id_cols = CowId  )

library(GGally)
ggpairs(pairs_data, columns = c("a","b","c"))

ggsave("wilmink pairs plot.png")

Compare this to the pairwise plots of Wood parameters presented by Græsbøll et al. (2016).

knitr::include_graphics("graesbol_parameters.jpg")
Wood parameters from Graesboll et al. (2016)

Figure 4: Wood parameters from Graesboll et al. (2016)

References

Græsbøll, Kaare, Carsten Kirkeby, Søren Saxmose Nielsen, Tariq Halasa, Nils Toft, and Lasse Engbo Christiansen. 2016. “Models to Estimate Lactation Curves of Milk Yield and Somatic Cell Count in Dairy Cows at the Herd Level for the Use in Simulations and Predictive Models.” Frontiers in Veterinary Science 3 (December). https://doi.org/10.3389/fvets.2016.00115.

Wilmink, J.B.M. 1987. “Comparison of Different Methods of Predicting 305-Day Milk Yield Using Means Calculated from Within-Herd Lactation Curves.” Livestock Production Science 17 (January): 1–17. https://doi.org/10.1016/0301-6226(87)90049-2.