Egg Layers #3

Introduction

This is a continuation of the time-series of egg laying hens in the U.S. due to the current avian influenza (HPAI) outbreak in 2022.

While the data will not show this decrease because it is yet not available, I will attempt to show potential models showing the behavior going forward. In this quick exercise I will try both approaches, through tsibble and fable, and the ts and forecast ways.

The main intention is to show how some of these trends can help forecast and anticipate price movements. The assumption is that, when there are shocks to one variable (layers), there will be responses in another (price).

This would intuitively call for a VAR model, or possibly just a linear regression with some lags, which essentially that is what VAR is. We also looked into ETS and ARIMA forecasts.

# Bring in the Data

layers_price <- read.csv("layers_price.csv")

# Convert to tsibble
layers_price_tb <-
  layers_price %>%
  mutate(Month = yearmonth(Month)) %>%
  as_tsibble(index = Month)

# split layers and egg prices

layers_tb <- layers_price_tb %>%
  dplyr::select(Month, layers)

Price_tb <- layers_price_tb %>%
  dplyr::select(Month, Price)

############################
#Convert to ts atomic vector
layers_ts <-read.csv("layers_price_ts.csv")
layers_ts$Date <-as.Date(layers_ts$Date)
layers_ts_m <-ts(layers_ts, start = 1, end = 97, frequency = 12)

# # Excel, dates not showing up
# library(readxl)
# layers_ts_xl <- read_excel("layers_price_ts.xlsx")


#inspect data
head(layers_price_tb, 3)%>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

Month	Price	layers
2014 Jan	1.307727	304.292
2014 Feb	1.552000	302.343
2014 Mar	1.546191	303.008

head(layers_ts, 3) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

Date	Price	layers
2014-01-01	1.307727	304.292
2014-02-01	1.552000	302.343
2014-03-01	1.546191	303.008

Data Exploration

First, we will look at what we have, and plot it.

We can see that egg laying hens are relatively seasonal, and with cycles. Without having to decompose the data, we ran the stationarity tests after simply to assess these assumptions.

# Egg Layers

layers_plot <-
  autoplot(layers_tb, layers) +
  labs(title = "Layers"
       ,x = "Date"
       ,y = "Millions of Egg Laying Hens") + 
  theme_ipsum_ps() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3)

Price_plot <-
  autoplot(Price_tb, Price) +
  labs(title = "Egg Price"
       ,x = "Date"
       ,y = "$USD/dzn") + 
  theme_ipsum_ps() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3)

grid.arrange(layers_plot,
               Price_plot, nrow = 1, ncol = 2)

#### Plot with TS

layers_ts1 <-ts(subset(layers_ts, select = "layers", frequency =12))
Price_ts1 <-ts(subset(layers_ts, select = "Price", frequency =12))

ggtsdisplay(layers_ts1, main = "Egg Layers")

ggtsdisplay(Price_ts1, main = "Egg Price")

Stationarity Layers

One of the key elements for time-series, is testing for stationarity, and whether we should difference them. If so, it is also important to know how many differences we need to implement.

Layers was not stationary, so we needed to calculate the differences. The optimal differences, appeared to be just one, despite some of the seasonality that we saw, and the ACF and PACF showed that maybe, 6 or 12 lags could be considered as a judment call, if anything.

#with tsibble

#Layers
#Do a Unit Root Test from original data

layers_tb %>%
  features(layers, unitroot_kpss) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

kpss_stat	kpss_pvalue
1.520376	0.01

paste("Number of Differences = ", ndiffs(layers_tb$layers)) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

x
Number of Differences = 1

# Number of Diff is 1
layers_tb %>%
  gg_tsdisplay(difference(layers), plot_type = 'partial')+
  labs(title = "Difference of 1, Egg laying Hens")

layers_diff1 <-
  layers_tb$layers %>%
  diff() %>%
  unitroot_kpss

layers_diff1 %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

	x
kpss_stat	0.0599831
kpss_pvalue	0.1000000

Stationarity Price

Price resulted to be almost stationary, but we went ahead and differenced it anyway since our initial kpss test did not return a 1. if anything, it appeared to have a downward trend overall, so even an eye test would have suggested that we needed to difference the entire data series, or split it in case we assumed or saw structural changes.

# Perform the same with Price

Price_tb %>%
  features(Price, unitroot_kpss) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

kpss_stat	kpss_pvalue
0.3972041	0.0783603

paste("Number of Differences = ", ndiffs(layers_tb$Price)) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

x
Number of Differences = 0

# Number of Diff is 1
Price_tb %>%
  gg_tsdisplay(difference(Price), plot_type = 'partial')+
  labs(title = "Difference of 1, Egg Price")

Price_diff1 <-
  Price_tb$Price %>%
  diff() %>%
  unitroot_kpss

Price_diff1 %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

	x
kpss_stat	0.0407572
kpss_pvalue	0.1000000

ARIMA and ETS

Several models were tested and at different dates. The accuracy results are below.

################
# Cases
holts_layers <- layers_tb

fix_lay <- holts_layers %>%
  filter_index("2017 Jan" ~ "2020 Dec")

#models

fit_lay <- 
  fix_lay %>%
  model( "ETS_l_ANN" = ETS(layers ~ error("A") + trend("N") + season("N"))
        , "ETS_l_AAN" = ETS(layers ~ error("A") + trend("A") + season("N"))
        , "ETS_l_AAA" = ETS(layers ~ error("A") + trend("A") + season("A"))
        , "ETS_l_AAdA" = ETS(layers ~ error("A") + trend("Ad") + season("A")) #damped
        , "ETS_l_auto" = ETS(layers)
        , "ARIMA_l1" = ARIMA(layers ~ pdq(2, 1, 0) + PDQ(0, 1, 1))
        , "ARIMA_l2" = ARIMA(layers ~ pdq(0, 1, 3) + PDQ(0, 1, 1))
        , "ARIMA_l_auto" = ARIMA(layers)
        , "search_arima_l" = ARIMA(layers, stepwise = FALSE) 
        ) 

################
# Price

holts_Price <- Price_tb

fix_Price <- holts_Price %>%
  filter_index("2017 Jan" ~ "2021 Dec")

#models

fit_Price <- 
  fix_Price %>%
  model( "ETS_p_ANN" = ETS(Price ~ error("A") + trend("N") + season("N"))
        , "ETS_p_AAN" = ETS(Price ~ error("A") + trend("A") + season("N"))
        , "ETS_p_AAA" = ETS(Price ~ error("A") + trend("A") + season("A"))
        , "ETS_p_AAdA" = ETS(Price ~ error("A") + trend("Ad") + season("A")) #damped
        , "ETS_p_auto" = ETS(Price)
        , "ARIMA_p1" = ARIMA(Price ~ pdq(0, 1, 2) + PDQ(0, 1, 1))
        , "ARIMA_p2" = ARIMA(Price ~ pdq(2, 1, 0) + PDQ(0, 1, 1))
        , "ARIMA_p_auto" = ARIMA(Price)
        , "search_arima_p" = ARIMA(Price, stepwise = FALSE) 
        ) 

# forecasts
fc_lay <- 
  fit_lay %>%
  forecast(h=28)

fc_Price <-
  fit_Price %>%
  forecast(h=28)

models_comp <-
bind_rows(
  fit_lay %>% accuracy()
  ,fit_Price %>% accuracy()
  ,fc_lay %>% accuracy(layers_tb)
  ,fc_Price %>% accuracy(Price_tb)
)


models_comp %>%
  dplyr::select(
    .model
    ,.type
    ,RMSE
    ,RMSSE
    ,MASE
    ,ACF1
  ) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

.model	.type	RMSE	RMSSE	MASE	ACF1
ETS_l_ANN	Training	2.9357636	0.2525375	0.2212311	0.4050668
ETS_l_AAN	Training	2.9403275	0.2529301	0.2133396	0.0310632
ETS_l_AAA	Training	1.8525937	0.1593621	0.1396612	0.1582120
ETS_l_AAdA	Training	1.7425995	0.1499003	0.1342696	0.1211019
ETS_l_auto	Training	2.9357612	0.2525373	0.2212486	0.4050473
ARIMA_l1	Training	1.9331826	0.1662944	0.1233226	0.0873290
ARIMA_l2	Training	1.7229168	0.1482071	0.1086727	-0.1474021
ARIMA_l_auto	Training	2.2114508	0.1902314	0.1409037	-0.0650330
search_arima_l	Training	1.9000125	0.1634411	0.1200104	-0.0113590
ETS_p_ANN	Training	0.3091419	0.5711014	0.5376700	0.1402496
ETS_p_AAN	Training	0.3087124	0.5703081	0.5355288	0.1451141
ETS_p_AAA	Training	0.2331716	0.4307558	0.4742422	0.2138358
ETS_p_AAdA	Training	0.2312894	0.4272787	0.4619095	0.2284907
ETS_p_auto	Training	0.2196435	0.4057643	0.4242328	0.1293067
ARIMA_p1	Training	0.2044691	0.3777314	0.3549140	-0.0354985
ARIMA_p2	Training	0.1979583	0.3657036	0.3526325	-0.0001302
ARIMA_p_auto	Training	0.2569413	0.4746673	0.4551089	-0.0061306
search_arima_p	Training	0.2501069	0.4620416	0.4445228	0.0638146
ARIMA_l_auto	Test	12.9682971	0.8073227	0.9196309	0.5342044
ARIMA_l1	Test	3.9272902	0.2444878	0.2448780	0.4318394
ARIMA_l2	Test	4.4564505	0.2774299	0.2878330	0.3818194
ETS_l_AAA	Test	3.6442281	0.2268662	0.2279684	0.6481309
ETS_l_AAdA	Test	3.0799670	0.1917389	0.1976347	0.4639456
ETS_l_AAN	Test	18.1114872	1.1275046	1.2535791	0.7393612
ETS_l_ANN	Test	5.2015131	0.3238127	0.2881744	0.7332925
ETS_l_auto	Test	5.2015155	0.3238129	0.2881746	0.7332925
search_arima_l	Test	4.2370846	0.2637736	0.2733021	0.3889669
ARIMA_p_auto	Test	0.0519851	0.0767901	0.1024146	NA
ARIMA_p1	Test	0.0779886	0.1152014	0.1536435	NA
ARIMA_p2	Test	0.0890442	0.1315321	0.1754236	NA
ETS_p_AAA	Test	0.1457752	0.2153327	0.2871880	NA
ETS_p_AAdA	Test	0.1182866	0.1747278	0.2330335	NA
ETS_p_AAN	Test	0.0738097	0.1090285	0.1454107	NA
ETS_p_ANN	Test	0.0614655	0.0907942	0.1210917	NA
ETS_p_auto	Test	0.1643640	0.2427913	0.3238093	NA
search_arima_p	Test	0.0295328	0.0436245	0.0581818	NA

ARIMA and ETS plots

An eye test is always helpful and then cross compare it with the de accuracy tests, and then make a decision. All the models were tested at the tail end of the time-series.

# Univariate for now

fc_lay %>%
  autoplot(layers_tb, level = 0, size = 0.5) +
  labs(title = "Egg Layers"
       ,y = "Millions") +
  theme_ipsum_es()

fc_Price %>%
  autoplot(Price_tb, level = 0, size = 0.5) +
  labs(title = "Egg Layers"
       ,y = "Millions") +
  theme_ipsum_es()

VAR

We naturally ran a VAR model under the assumption that a supply shock in egg laying hens would cause prices to spike. After running several models and lags, we found that the optimal lags were 4 months, in which price was the response variable. Although the residuals were not normally distributed, the VAR model provided good results without serial correlation.

The data set I used has not been updated in months, and the most recent data suggests the models tested were not terribly outrageous.

When we ran the irf, the response variable, in this case price, suggests price is negatively correlated by about 5 to 15 percent for every 1 percent change in layers, and the impact would peak 3 months later.

#VAR model

ggplot( data = layers_price_tb) +
  geom_point(mapping = aes(x = layers,y = Price)) +
  ggtitle("Scatter Plot Egg Layers vs. price")

layers_tsvar <-ts(layers_ts$layers, start = 1, frequency = 12)
Price_tsvar <-ts(layers_ts$Price, start = 1, frequency = 12)

#check the optimal lags
var_lag_df <-cbind(layers_tsvar, Price_tsvar)

lagselect <-VARselect(var_lag_df, lag.max = 12, type = "const")
lagselect$selection

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      2      2      4

#Build Model
fit_var <-VAR(var_lag_df, p = 4, type = "const", season = NULL, exogen = NULL)
summary(fit_var)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: layers_tsvar, Price_tsvar 
## Deterministic variables: const 
## Sample size: 93 
## Log Likelihood: -255.989 
## Roots of the characteristic polynomial:
## 0.8796 0.7239 0.7239 0.6756 0.6756 0.4792 0.3403 0.3403
## Call:
## VAR(y = var_lag_df, p = 4, type = "const", exogen = NULL)
## 
## 
## Estimation results for equation layers_tsvar: 
## ============================================= 
## layers_tsvar = layers_tsvar.l1 + Price_tsvar.l1 + layers_tsvar.l2 + Price_tsvar.l2 + layers_tsvar.l3 + Price_tsvar.l3 + layers_tsvar.l4 + Price_tsvar.l4 + const 
## 
##                 Estimate Std. Error t value Pr(>|t|)    
## layers_tsvar.l1  1.34280    0.10728  12.517   <2e-16 ***
## Price_tsvar.l1   0.60009    1.39927   0.429   0.6691    
## layers_tsvar.l2 -0.33580    0.17734  -1.893   0.0617 .  
## Price_tsvar.l2   0.32783    1.72599   0.190   0.8498    
## layers_tsvar.l3 -0.05006    0.17774  -0.282   0.7789    
## Price_tsvar.l3  -3.82224    1.71795  -2.225   0.0288 *  
## layers_tsvar.l4 -0.01200    0.10944  -0.110   0.9130    
## Price_tsvar.l4   2.43646    1.42906   1.705   0.0919 .  
## const           18.15537   10.30804   1.761   0.0818 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 3.662 on 84 degrees of freedom
## Multiple R-Squared: 0.9504,  Adjusted R-squared: 0.9456 
## F-statistic:   201 on 8 and 84 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Price_tsvar: 
## ============================================ 
## Price_tsvar = layers_tsvar.l1 + Price_tsvar.l1 + layers_tsvar.l2 + Price_tsvar.l2 + layers_tsvar.l3 + Price_tsvar.l3 + layers_tsvar.l4 + Price_tsvar.l4 + const 
## 
##                  Estimate Std. Error t value Pr(>|t|)    
## layers_tsvar.l1 -0.015539   0.008135  -1.910 0.059520 .  
## Price_tsvar.l1   0.824341   0.106105   7.769 1.75e-11 ***
## layers_tsvar.l2  0.005945   0.013448   0.442 0.659558    
## Price_tsvar.l2  -0.364250   0.130879  -2.783 0.006649 ** 
## layers_tsvar.l3  0.010587   0.013478   0.786 0.434340    
## Price_tsvar.l3   0.495959   0.130269   3.807 0.000266 ***
## layers_tsvar.l4 -0.003620   0.008299  -0.436 0.663773    
## Price_tsvar.l4  -0.237747   0.108363  -2.194 0.030999 *  
## const            1.188097   0.781641   1.520 0.132266    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.2776 on 84 degrees of freedom
## Multiple R-Squared: 0.6442,  Adjusted R-squared: 0.6103 
## F-statistic: 19.01 on 8 and 84 DF,  p-value: 5.596e-16 
## 
## 
## 
## Covariance matrix of residuals:
##              layers_tsvar Price_tsvar
## layers_tsvar    13.406882   -0.003883
## Price_tsvar     -0.003883    0.077089
## 
## Correlation matrix of residuals:
##              layers_tsvar Price_tsvar
## layers_tsvar     1.000000   -0.003819
## Price_tsvar     -0.003819    1.000000

###################
#   Diagnostics   #
###################

#Serial Correlation
Serial1 <-serial.test(fit_var, lags.pt = 12, type = "PT.asymptotic")
Serial1

## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 35.418, df = 32, p-value = 0.3101

#p-value higher than 0.05, so to serial correlation

#Heteroscedasticity

Arch1 <-arch.test(fit_var, lags.multi = 12, multivariate.only = TRUE)
Arch1

## 
##  ARCH (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 113.11, df = 108, p-value = 0.3491

#p-value higher than 0.05, so it passes the test of heteroscedasticity

#normal Distribution of the residuals
Norm1 <- normality.test(fit_var, multivariate.only = TRUE)
Norm1

## $JB
## 
##  JB-Test (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 1220.2, df = 4, p-value < 2.2e-16
## 
## 
## $Skewness
## 
##  Skewness only (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 134.24, df = 2, p-value < 2.2e-16
## 
## 
## $Kurtosis
## 
##  Kurtosis only (multivariate)
## 
## data:  Residuals of VAR object fit_var
## Chi-squared = 1086, df = 2, p-value < 2.2e-16

#Testing for Structural Breaks in the Residuals

Stability1 <- stability(fit_var, type = "OLS-CUSUM")
plot(Stability1)

# Granger Causality

Grangerlayers <- causality(fit_var, cause = "layers_tsvar")
Grangerlayers

## $Granger
## 
##  Granger causality H0: layers_tsvar do not Granger-cause Price_tsvar
## 
## data:  VAR object fit_var
## F-Test = 1.7748, df1 = 4, df2 = 168, p-value = 0.1362
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: layers_tsvar and Price_tsvar
## 
## data:  VAR object fit_var
## Chi-squared = 0.0013564, df = 1, p-value = 0.9706

Grangerprice <- causality(fit_var, cause = "Price_tsvar")
Grangerprice

## $Granger
## 
##  Granger causality H0: Price_tsvar do not Granger-cause layers_tsvar
## 
## data:  VAR object fit_var
## F-Test = 1.7779, df1 = 4, df2 = 168, p-value = 0.1356
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: Price_tsvar and layers_tsvar
## 
## data:  VAR object fit_var
## Chi-squared = 0.0013564, df = 1, p-value = 0.9706

#impulse response

Priceirf <- irf(fit_var, impulse = "layers_tsvar", response = "Price_tsvar", n.ahead = 12, boot = TRUE)
plot(Priceirf, ylab = "Price", main = "Shock from Layers")

Layersirf <-irf(fit_var, impulse = "Price_tsvar", response = "layers_tsvar", n.ahead = 12, boot = TRUE)
plot(Layersirf, ylab = "Layers", main = "Shock from Price")

#Variance decomp

FEVD1 <- fevd(fit_var, n.ahead = 12)
plot(FEVD1)

# VAR Forecast
forecast <- predict(fit_var, n.ahead = 12, CI = 0.90)
fanchart(forecast)

fanchart(forecast, names = "layers_tsvar")

fanchart(forecast, names = "Price_tsvar")

Chosing the model

The ARMA and ETS models provided decent results.

Layers

Interestingly, I would not choose any of the auto ARIMA, despite decent scores in training. The best test result was ETS AAdA, which was a damped model. Still, I would still consider an ARIMA pdq(2, 1, 0).

Price

Price was much more challenging. At different dates and models suggested considerably different results. However, when subseting different dates starting in 2017, the models suggested much more reasonable forecasts, where prices could be trending higher, with seasonality or not–the judgement here would be to consider a downward trend in the amount of egg layers. Thus, this is why it would be important to try a different model.

Because we are talking about livestock and agricultural commodities I would lean towards a VAR model or an ARMA model combined with VAR. Although univariate models can be helpful, I believe that it is necessary to include other explanatory variables.

Conclusion

Although more work needs to be done, the VAR model suggested that at about 4 lags, price is responsive to a shock in egg layers, with an impact of about 5 to 15 percent within the first 3 months. This was an interesting finding since the cycle of an egg laying hen is about 4-6 months.

Naturally, there are many other variables that could explain price variations, or even the amount of egg laying hens at any given point, such as laws mandating producers to switch to cage-free laying facilities, income, weather, corn prices, etc.

Further work:

Different models and tests incorporating other explanatory variables should be considered.