Egg Layers #3

Introduction

This is a continuation of the time-series of egg laying hens in the U.S. due to the current avian influenza (HPAI) outbreak in 2022.

While the data will not show this decrease because it is yet not available, I will attempt to show potential models showing the behavior going forward. In this quick exercise I will try both approaches, through tsibble and fable, and the ts and forecast ways.

The main intention is to show how some of these trends can help forecast and anticipate price movements. The assumption is that, when there are shocks to one variable (layers), there will be responses in another (price).

This would intuitively call for a VAR model, or possibly just a linear regression with some lags, which essentially that is what VAR is. We also looked into ETS and ARIMA forecasts.

# Bring in the Data

layers_price <- read.csv("layers_price.csv")

# Convert to tsibble
layers_price_tb <-
  layers_price %>%
  mutate(Month = yearmonth(Month)) %>%
  as_tsibble(index = Month)

# split layers and egg prices

layers_tb <- layers_price_tb %>%
  dplyr::select(Month, layers)

Price_tb <- layers_price_tb %>%
  dplyr::select(Month, Price)

############################
#Convert to ts atomic vector
layers_ts <-read.csv("layers_price_ts.csv")
layers_ts$Date <-as.Date(layers_ts$Date)
layers_ts_m <-ts(layers_ts, start = 1, end = 97, frequency = 12)

# # Excel, dates not showing up
# library(readxl)
# layers_ts_xl <- read_excel("layers_price_ts.xlsx")


#inspect data
head(layers_price_tb, 3)%>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
Month Price layers
2014 Jan 1.307727 304.292
2014 Feb 1.552000 302.343
2014 Mar 1.546191 303.008
head(layers_ts, 3) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
Date Price layers
2014-01-01 1.307727 304.292
2014-02-01 1.552000 302.343
2014-03-01 1.546191 303.008

Data Exploration

First, we will look at what we have, and plot it.

We can see that egg laying hens are relatively seasonal, and with cycles. Without having to decompose the data, we ran the stationarity tests after simply to assess these assumptions.

# Egg Layers

layers_plot <-
  autoplot(layers_tb, layers) +
  labs(title = "Layers"
       ,x = "Date"
       ,y = "Millions of Egg Laying Hens") + 
  theme_ipsum_ps() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3)

Price_plot <-
  autoplot(Price_tb, Price) +
  labs(title = "Egg Price"
       ,x = "Date"
       ,y = "$USD/dzn") + 
  theme_ipsum_ps() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3)

grid.arrange(layers_plot,
               Price_plot, nrow = 1, ncol = 2)

#### Plot with TS

layers_ts1 <-ts(subset(layers_ts, select = "layers", frequency =12))
Price_ts1 <-ts(subset(layers_ts, select = "Price", frequency =12))

ggtsdisplay(layers_ts1, main = "Egg Layers")

ggtsdisplay(Price_ts1, main = "Egg Price")

Stationarity Layers

One of the key elements for time-series, is testing for stationarity, and whether we should difference them. If so, it is also important to know how many differences we need to implement.

Layers was not stationary, so we needed to calculate the differences. The optimal differences, appeared to be just one, despite some of the seasonality that we saw, and the ACF and PACF showed that maybe, 6 or 12 lags could be considered as a judment call, if anything.

#with tsibble

#Layers
#Do a Unit Root Test from original data

layers_tb %>%
  features(layers, unitroot_kpss) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
kpss_stat kpss_pvalue
1.520376 0.01
paste("Number of Differences = ", ndiffs(layers_tb$layers)) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
x
Number of Differences = 1
# Number of Diff is 1
layers_tb %>%
  gg_tsdisplay(difference(layers), plot_type = 'partial')+
  labs(title = "Difference of 1, Egg laying Hens")

layers_diff1 <-
  layers_tb$layers %>%
  diff() %>%
  unitroot_kpss

layers_diff1 %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
x
kpss_stat 0.0599831
kpss_pvalue 0.1000000

Stationarity Price

Price resulted to be almost stationary, but we went ahead and differenced it anyway since our initial kpss test did not return a 1. if anything, it appeared to have a downward trend overall, so even an eye test would have suggested that we needed to difference the entire data series, or split it in case we assumed or saw structural changes.

# Perform the same with Price

Price_tb %>%
  features(Price, unitroot_kpss) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
kpss_stat kpss_pvalue
0.3972041 0.0783603
paste("Number of Differences = ", ndiffs(layers_tb$Price)) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
x
Number of Differences = 0
# Number of Diff is 1
Price_tb %>%
  gg_tsdisplay(difference(Price), plot_type = 'partial')+
  labs(title = "Difference of 1, Egg Price")

Price_diff1 <-
  Price_tb$Price %>%
  diff() %>%
  unitroot_kpss

Price_diff1 %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
x
kpss_stat 0.0407572
kpss_pvalue 0.1000000

ARIMA and ETS

Several models were tested and at different dates. The accuracy results are below.

################
# Cases
holts_layers <- layers_tb

fix_lay <- holts_layers %>%
  filter_index("2017 Jan" ~ "2020 Dec")

#models

fit_lay <- 
  fix_lay %>%
  model( "ETS_l_ANN" = ETS(layers ~ error("A") + trend("N") + season("N"))
        , "ETS_l_AAN" = ETS(layers ~ error("A") + trend("A") + season("N"))
        , "ETS_l_AAA" = ETS(layers ~ error("A") + trend("A") + season("A"))
        , "ETS_l_AAdA" = ETS(layers ~ error("A") + trend("Ad") + season("A")) #damped
        , "ETS_l_auto" = ETS(layers)
        , "ARIMA_l1" = ARIMA(layers ~ pdq(2, 1, 0) + PDQ(0, 1, 1))
        , "ARIMA_l2" = ARIMA(layers ~ pdq(0, 1, 3) + PDQ(0, 1, 1))
        , "ARIMA_l_auto" = ARIMA(layers)
        , "search_arima_l" = ARIMA(layers, stepwise = FALSE) 
        ) 

################
# Price

holts_Price <- Price_tb

fix_Price <- holts_Price %>%
  filter_index("2017 Jan" ~ "2021 Dec")

#models

fit_Price <- 
  fix_Price %>%
  model( "ETS_p_ANN" = ETS(Price ~ error("A") + trend("N") + season("N"))
        , "ETS_p_AAN" = ETS(Price ~ error("A") + trend("A") + season("N"))
        , "ETS_p_AAA" = ETS(Price ~ error("A") + trend("A") + season("A"))
        , "ETS_p_AAdA" = ETS(Price ~ error("A") + trend("Ad") + season("A")) #damped
        , "ETS_p_auto" = ETS(Price)
        , "ARIMA_p1" = ARIMA(Price ~ pdq(0, 1, 2) + PDQ(0, 1, 1))
        , "ARIMA_p2" = ARIMA(Price ~ pdq(2, 1, 0) + PDQ(0, 1, 1))
        , "ARIMA_p_auto" = ARIMA(Price)
        , "search_arima_p" = ARIMA(Price, stepwise = FALSE) 
        ) 

# forecasts
fc_lay <- 
  fit_lay %>%
  forecast(h=28)

fc_Price <-
  fit_Price %>%
  forecast(h=28)

models_comp <-
bind_rows(
  fit_lay %>% accuracy()
  ,fit_Price %>% accuracy()
  ,fc_lay %>% accuracy(layers_tb)
  ,fc_Price %>% accuracy(Price_tb)
)


models_comp %>%
  dplyr::select(
    .model
    ,.type
    ,RMSE
    ,RMSSE
    ,MASE
    ,ACF1
  ) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))
.model .type RMSE RMSSE MASE ACF1
ETS_l_ANN Training 2.9357636 0.2525375 0.2212311 0.4050668
ETS_l_AAN Training 2.9403275 0.2529301 0.2133396 0.0310632
ETS_l_AAA Training 1.8525937 0.1593621 0.1396612 0.1582120
ETS_l_AAdA Training 1.7425995 0.1499003 0.1342696 0.1211019
ETS_l_auto Training 2.9357612 0.2525373 0.2212486 0.4050473
ARIMA_l1 Training 1.9331826 0.1662944 0.1233226 0.0873290
ARIMA_l2 Training 1.7229168 0.1482071 0.1086727 -0.1474021
ARIMA_l_auto Training 2.2114508 0.1902314 0.1409037 -0.0650330
search_arima_l Training 1.9000125 0.1634411 0.1200104 -0.0113590
ETS_p_ANN Training 0.3091419 0.5711014 0.5376700 0.1402496
ETS_p_AAN Training 0.3087124 0.5703081 0.5355288 0.1451141
ETS_p_AAA Training 0.2331716 0.4307558 0.4742422 0.2138358
ETS_p_AAdA Training 0.2312894 0.4272787 0.4619095 0.2284907
ETS_p_auto Training 0.2196435 0.4057643 0.4242328 0.1293067
ARIMA_p1 Training 0.2044691 0.3777314 0.3549140 -0.0354985
ARIMA_p2 Training 0.1979583 0.3657036 0.3526325 -0.0001302
ARIMA_p_auto Training 0.2569413 0.4746673 0.4551089 -0.0061306
search_arima_p Training 0.2501069 0.4620416 0.4445228 0.0638146
ARIMA_l_auto Test 12.9682971 0.8073227 0.9196309 0.5342044
ARIMA_l1 Test 3.9272902 0.2444878 0.2448780 0.4318394
ARIMA_l2 Test 4.4564505 0.2774299 0.2878330 0.3818194
ETS_l_AAA Test 3.6442281 0.2268662 0.2279684 0.6481309
ETS_l_AAdA Test 3.0799670 0.1917389 0.1976347 0.4639456
ETS_l_AAN Test 18.1114872 1.1275046 1.2535791 0.7393612
ETS_l_ANN Test 5.2015131 0.3238127 0.2881744 0.7332925
ETS_l_auto Test 5.2015155 0.3238129 0.2881746 0.7332925
search_arima_l Test 4.2370846 0.2637736 0.2733021 0.3889669
ARIMA_p_auto Test 0.0519851 0.0767901 0.1024146 NA
ARIMA_p1 Test 0.0779886 0.1152014 0.1536435 NA
ARIMA_p2 Test 0.0890442 0.1315321 0.1754236 NA
ETS_p_AAA Test 0.1457752 0.2153327 0.2871880 NA
ETS_p_AAdA Test 0.1182866 0.1747278 0.2330335 NA
ETS_p_AAN Test 0.0738097 0.1090285 0.1454107 NA
ETS_p_ANN Test 0.0614655 0.0907942 0.1210917 NA
ETS_p_auto Test 0.1643640 0.2427913 0.3238093 NA
search_arima_p Test 0.0295328 0.0436245 0.0581818 NA

ARIMA and ETS plots

An eye test is always helpful and then cross compare it with the de accuracy tests, and then make a decision. All the models were tested at the tail end of the time-series.

# Univariate for now

fc_lay %>%
  autoplot(layers_tb, level = 0, size = 0.5) +
  labs(title = "Egg Layers"
       ,y = "Millions") +
  theme_ipsum_es()

fc_Price %>%
  autoplot(Price_tb, level = 0, size = 0.5) +
  labs(title = "Egg Layers"
       ,y = "Millions") +
  theme_ipsum_es()

VAR

We naturally ran a VAR model under the assumption that a supply shock in egg laying hens would cause prices to spike. After running several models and lags, we found that the optimal lags were 4 months, in which price was the response variable. Although the residuals were not normally distributed, the VAR model provided good results without serial correlation.

The data set I used has not been updated in months, and the most recent data suggests the models tested were not terribly outrageous.

When we ran the irf, the response variable, in this case price, suggests price is negatively correlated by about 5 to 15 percent for every 1 percent change in layers, and the impact would peak 3 months later.

#VAR model

ggplot( data = layers_price_tb) +
  geom_point(mapping = aes(x = layers,y = Price)) +
  ggtitle("Scatter Plot Egg Layers vs. price")

layers_tsvar <-ts(layers_ts$layers, start = 1, frequency = 12)
Price_tsvar <-ts(layers_ts$Price, start = 1, frequency = 12)

#check the optimal lags
var_lag_df <-cbind(layers_tsvar, Price_tsvar)

lagselect <-VARselect(var_lag_df, lag.max = 12, type = "const")
lagselect$selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      2      2      4
#Build Model
fit_var <-VAR(var_lag_df, p = 4, type = "const", season = NULL, exogen = NULL)
summary(fit_var)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: layers_tsvar, Price_tsvar 
## Deterministic variables: const 
## Sample size: 93 
## Log Likelihood: -255.989 
## Roots of the characteristic polynomial:
## 0.8796 0.7239 0.7239 0.6756 0.6756 0.4792 0.3403 0.3403
## Call:
## VAR(y = var_lag_df, p = 4, type = "const", exogen = NULL)
## 
## 
## Estimation results for equation layers_tsvar: 
## ============================================= 
## layers_tsvar = layers_tsvar.l1 + Price_tsvar.l1 + layers_tsvar.l2 + Price_tsvar.l2 + layers_tsvar.l3 + Price_tsvar.l3 + layers_tsvar.l4 + Price_tsvar.l4 + const 
## 
##                 Estimate Std. Error t value Pr(>|t|)    
## layers_tsvar.l1  1.34280    0.10728  12.517   <2e-16 ***
## Price_tsvar.l1   0.60009    1.39927   0.429   0.6691    
## layers_tsvar.l2 -0.33580    0.17734  -1.893   0.0617 .  
## Price_tsvar.l2   0.32783    1.72599   0.190   0.8498    
## layers_tsvar.l3 -0.05006    0.17774  -0.282   0.7789    
## Price_tsvar.l3  -3.82224    1.71795  -2.225   0.0288 *  
## layers_tsvar.l4 -0.01200    0.10944  -0.110   0.9130    
## Price_tsvar.l4   2.43646    1.42906   1.705   0.0919 .  
## const           18.15537   10.30804   1.761   0.0818 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 3.662 on 84 degrees of freedom
## Multiple R-Squared: 0.9504,  Adjusted R-squared: 0.9456 
## F-statistic:   201 on 8 and 84 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Price_tsvar: 
## ============================================ 
## Price_tsvar = layers_tsvar.l1 + Price_tsvar.l1 + layers_tsvar.l2 + Price_tsvar.l2 + layers_tsvar.l3 + Price_tsvar.l3 + layers_tsvar.l4 + Price_tsvar.l4 + const 
## 
##                  Estimate Std. Error t value Pr(>|t|)    
## layers_tsvar.l1 -0.015539   0.008135  -1.910 0.059520 .  
## Price_tsvar.l1   0.824341   0.106105   7.769 1.75e-11 ***
## layers_tsvar.l2  0.005945   0.013448   0.442 0.659558    
## Price_tsvar.l2  -0.364250   0.130879  -2.783 0.006649 ** 
## layers_tsvar.l3  0.010587   0.013478   0.786 0.434340    
## Price_tsvar.l3   0.495959   0.130269   3.807 0.000266 ***
## layers_tsvar.l4 -0.003620   0.008299  -0.436 0.663773    
## Price_tsvar.l4  -0.237747   0.108363  -2.194 0.030999 *  
## const            1.188097   0.781641   1.520 0.132266    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.2776 on 84 degrees of freedom
## Multiple R-Squared: 0.6442,  Adjusted R-squared: 0.6103 
## F-statistic: 19.01 on 8 and 84 DF,  p-value: 5.596e-16 
## 
## 
## 
## Covariance matrix of residuals:
##              layers_tsvar Price_tsvar
## layers_tsvar    13.406882   -0.003883
## Price_tsvar     -0.003883    0.077089
## 
## Correlation matrix of residuals:
##              layers_tsvar Price_tsvar
## layers_tsvar     1.000000   -0.003819
## Price_tsvar     -0.003819    1.000000
###################
#   Diagnostics   #
###################

#Serial Correlation
Serial1 <-serial.test(fit_var, lags.pt = 12, type = "PT.asymptotic")
Serial1
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 35.418, df = 32, p-value = 0.3101
#p-value higher than 0.05, so to serial correlation

#Heteroscedasticity

Arch1 <-arch.test(fit_var, lags.multi = 12, multivariate.only = TRUE)
Arch1
## 
##  ARCH (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 113.11, df = 108, p-value = 0.3491
#p-value higher than 0.05, so it passes the test of heteroscedasticity

#normal Distribution of the residuals
Norm1 <- normality.test(fit_var, multivariate.only = TRUE)
Norm1
## $JB
## 
##  JB-Test (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 1220.2, df = 4, p-value < 2.2e-16
## 
## 
## $Skewness
## 
##  Skewness only (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 134.24, df = 2, p-value < 2.2e-16
## 
## 
## $Kurtosis
## 
##  Kurtosis only (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 1086, df = 2, p-value < 2.2e-16
#Testing for Structural Breaks in the Residuals

Stability1 <- stability(fit_var, type = "OLS-CUSUM")
plot(Stability1)

# Granger Causality

Grangerlayers <- causality(fit_var, cause = "layers_tsvar")
Grangerlayers
## $Granger
## 
##  Granger causality H0: layers_tsvar do not Granger-cause Price_tsvar
## 
## data:  VAR object fit_var
## F-Test = 1.7748, df1 = 4, df2 = 168, p-value = 0.1362
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: layers_tsvar and Price_tsvar
## 
## data:  VAR object fit_var
## Chi-squared = 0.0013564, df = 1, p-value = 0.9706
Grangerprice <- causality(fit_var, cause = "Price_tsvar")
Grangerprice
## $Granger
## 
##  Granger causality H0: Price_tsvar do not Granger-cause layers_tsvar
## 
## data:  VAR object fit_var
## F-Test = 1.7779, df1 = 4, df2 = 168, p-value = 0.1356
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Price_tsvar and layers_tsvar
## 
## data:  VAR object fit_var
## Chi-squared = 0.0013564, df = 1, p-value = 0.9706
#impulse response

Priceirf <- irf(fit_var, impulse = "layers_tsvar", response = "Price_tsvar", n.ahead = 12, boot = TRUE)
plot(Priceirf, ylab = "Price", main = "Shock from Layers")

Layersirf <-irf(fit_var, impulse = "Price_tsvar", response = "layers_tsvar", n.ahead = 12, boot = TRUE)
plot(Layersirf, ylab = "Layers", main = "Shock from Price")

#Variance decomp

FEVD1 <- fevd(fit_var, n.ahead = 12)
plot(FEVD1)

# VAR Forecast
forecast <- predict(fit_var, n.ahead = 12, CI = 0.90)
fanchart(forecast)

fanchart(forecast, names = "layers_tsvar")

fanchart(forecast, names = "Price_tsvar")

Chosing the model

The ARMA and ETS models provided decent results.

Layers

Interestingly, I would not choose any of the auto ARIMA, despite decent scores in training. The best test result was ETS AAdA, which was a damped model. Still, I would still consider an ARIMA pdq(2, 1, 0).

Price

Price was much more challenging. At different dates and models suggested considerably different results. However, when subseting different dates starting in 2017, the models suggested much more reasonable forecasts, where prices could be trending higher, with seasonality or not–the judgement here would be to consider a downward trend in the amount of egg layers. Thus, this is why it would be important to try a different model.

Because we are talking about livestock and agricultural commodities I would lean towards a VAR model or an ARMA model combined with VAR. Although univariate models can be helpful, I believe that it is necessary to include other explanatory variables.

Conclusion

Although more work needs to be done, the VAR model suggested that at about 4 lags, price is responsive to a shock in egg layers, with an impact of about 5 to 15 percent within the first 3 months. This was an interesting finding since the cycle of an egg laying hen is about 4-6 months.

Naturally, there are many other variables that could explain price variations, or even the amount of egg laying hens at any given point, such as laws mandating producers to switch to cage-free laying facilities, income, weather, corn prices, etc.

Further work:

Different models and tests incorporating other explanatory variables should be considered.