Egg Layers #3
Introduction
This is a continuation of the time-series of egg laying hens in the U.S. due to the current avian influenza (HPAI) outbreak in 2022.
While the data will not show this decrease because it is yet not available, I will attempt to show potential models showing the behavior going forward. In this quick exercise I will try both approaches, through tsibble and fable, and the ts and forecast ways.
The main intention is to show how some of these trends can help forecast and anticipate price movements. The assumption is that, when there are shocks to one variable (layers), there will be responses in another (price).
This would intuitively call for a VAR model, or possibly just a linear regression with some lags, which essentially that is what VAR is. We also looked into ETS and ARIMA forecasts.
# Bring in the Data
layers_price <- read.csv("layers_price.csv")
# Convert to tsibble
layers_price_tb <-
layers_price %>%
mutate(Month = yearmonth(Month)) %>%
as_tsibble(index = Month)
# split layers and egg prices
layers_tb <- layers_price_tb %>%
dplyr::select(Month, layers)
Price_tb <- layers_price_tb %>%
dplyr::select(Month, Price)
############################
#Convert to ts atomic vector
layers_ts <-read.csv("layers_price_ts.csv")
layers_ts$Date <-as.Date(layers_ts$Date)
layers_ts_m <-ts(layers_ts, start = 1, end = 97, frequency = 12)
# # Excel, dates not showing up
# library(readxl)
# layers_ts_xl <- read_excel("layers_price_ts.xlsx")
#inspect data
head(layers_price_tb, 3)%>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| Month | Price | layers |
|---|---|---|
| 2014 Jan | 1.307727 | 304.292 |
| 2014 Feb | 1.552000 | 302.343 |
| 2014 Mar | 1.546191 | 303.008 |
head(layers_ts, 3) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| Date | Price | layers |
|---|---|---|
| 2014-01-01 | 1.307727 | 304.292 |
| 2014-02-01 | 1.552000 | 302.343 |
| 2014-03-01 | 1.546191 | 303.008 |
Data Exploration
First, we will look at what we have, and plot it.
We can see that egg laying hens are relatively seasonal, and with cycles. Without having to decompose the data, we ran the stationarity tests after simply to assess these assumptions.
# Egg Layers
layers_plot <-
autoplot(layers_tb, layers) +
labs(title = "Layers"
,x = "Date"
,y = "Millions of Egg Laying Hens") +
theme_ipsum_ps() +
geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3)
Price_plot <-
autoplot(Price_tb, Price) +
labs(title = "Egg Price"
,x = "Date"
,y = "$USD/dzn") +
theme_ipsum_ps() +
geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3)
grid.arrange(layers_plot,
Price_plot, nrow = 1, ncol = 2)#### Plot with TS
layers_ts1 <-ts(subset(layers_ts, select = "layers", frequency =12))
Price_ts1 <-ts(subset(layers_ts, select = "Price", frequency =12))
ggtsdisplay(layers_ts1, main = "Egg Layers")ggtsdisplay(Price_ts1, main = "Egg Price")Stationarity Layers
One of the key elements for time-series, is testing for stationarity, and whether we should difference them. If so, it is also important to know how many differences we need to implement.
Layers was not stationary, so we needed to calculate the differences. The optimal differences, appeared to be just one, despite some of the seasonality that we saw, and the ACF and PACF showed that maybe, 6 or 12 lags could be considered as a judment call, if anything.
#with tsibble
#Layers
#Do a Unit Root Test from original data
layers_tb %>%
features(layers, unitroot_kpss) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| kpss_stat | kpss_pvalue |
|---|---|
| 1.520376 | 0.01 |
paste("Number of Differences = ", ndiffs(layers_tb$layers)) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| x |
|---|
| Number of Differences = 1 |
# Number of Diff is 1
layers_tb %>%
gg_tsdisplay(difference(layers), plot_type = 'partial')+
labs(title = "Difference of 1, Egg laying Hens")layers_diff1 <-
layers_tb$layers %>%
diff() %>%
unitroot_kpss
layers_diff1 %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| x | |
|---|---|
| kpss_stat | 0.0599831 |
| kpss_pvalue | 0.1000000 |
Stationarity Price
Price resulted to be almost stationary, but we went ahead and differenced it anyway since our initial kpss test did not return a 1. if anything, it appeared to have a downward trend overall, so even an eye test would have suggested that we needed to difference the entire data series, or split it in case we assumed or saw structural changes.
# Perform the same with Price
Price_tb %>%
features(Price, unitroot_kpss) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| kpss_stat | kpss_pvalue |
|---|---|
| 0.3972041 | 0.0783603 |
paste("Number of Differences = ", ndiffs(layers_tb$Price)) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| x |
|---|
| Number of Differences = 0 |
# Number of Diff is 1
Price_tb %>%
gg_tsdisplay(difference(Price), plot_type = 'partial')+
labs(title = "Difference of 1, Egg Price")Price_diff1 <-
Price_tb$Price %>%
diff() %>%
unitroot_kpss
Price_diff1 %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| x | |
|---|---|
| kpss_stat | 0.0407572 |
| kpss_pvalue | 0.1000000 |
ARIMA and ETS
Several models were tested and at different dates. The accuracy results are below.
################
# Cases
holts_layers <- layers_tb
fix_lay <- holts_layers %>%
filter_index("2017 Jan" ~ "2020 Dec")
#models
fit_lay <-
fix_lay %>%
model( "ETS_l_ANN" = ETS(layers ~ error("A") + trend("N") + season("N"))
, "ETS_l_AAN" = ETS(layers ~ error("A") + trend("A") + season("N"))
, "ETS_l_AAA" = ETS(layers ~ error("A") + trend("A") + season("A"))
, "ETS_l_AAdA" = ETS(layers ~ error("A") + trend("Ad") + season("A")) #damped
, "ETS_l_auto" = ETS(layers)
, "ARIMA_l1" = ARIMA(layers ~ pdq(2, 1, 0) + PDQ(0, 1, 1))
, "ARIMA_l2" = ARIMA(layers ~ pdq(0, 1, 3) + PDQ(0, 1, 1))
, "ARIMA_l_auto" = ARIMA(layers)
, "search_arima_l" = ARIMA(layers, stepwise = FALSE)
)
################
# Price
holts_Price <- Price_tb
fix_Price <- holts_Price %>%
filter_index("2017 Jan" ~ "2021 Dec")
#models
fit_Price <-
fix_Price %>%
model( "ETS_p_ANN" = ETS(Price ~ error("A") + trend("N") + season("N"))
, "ETS_p_AAN" = ETS(Price ~ error("A") + trend("A") + season("N"))
, "ETS_p_AAA" = ETS(Price ~ error("A") + trend("A") + season("A"))
, "ETS_p_AAdA" = ETS(Price ~ error("A") + trend("Ad") + season("A")) #damped
, "ETS_p_auto" = ETS(Price)
, "ARIMA_p1" = ARIMA(Price ~ pdq(0, 1, 2) + PDQ(0, 1, 1))
, "ARIMA_p2" = ARIMA(Price ~ pdq(2, 1, 0) + PDQ(0, 1, 1))
, "ARIMA_p_auto" = ARIMA(Price)
, "search_arima_p" = ARIMA(Price, stepwise = FALSE)
)
# forecasts
fc_lay <-
fit_lay %>%
forecast(h=28)
fc_Price <-
fit_Price %>%
forecast(h=28)
models_comp <-
bind_rows(
fit_lay %>% accuracy()
,fit_Price %>% accuracy()
,fc_lay %>% accuracy(layers_tb)
,fc_Price %>% accuracy(Price_tb)
)
models_comp %>%
dplyr::select(
.model
,.type
,RMSE
,RMSSE
,MASE
,ACF1
) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| .model | .type | RMSE | RMSSE | MASE | ACF1 |
|---|---|---|---|---|---|
| ETS_l_ANN | Training | 2.9357636 | 0.2525375 | 0.2212311 | 0.4050668 |
| ETS_l_AAN | Training | 2.9403275 | 0.2529301 | 0.2133396 | 0.0310632 |
| ETS_l_AAA | Training | 1.8525937 | 0.1593621 | 0.1396612 | 0.1582120 |
| ETS_l_AAdA | Training | 1.7425995 | 0.1499003 | 0.1342696 | 0.1211019 |
| ETS_l_auto | Training | 2.9357612 | 0.2525373 | 0.2212486 | 0.4050473 |
| ARIMA_l1 | Training | 1.9331826 | 0.1662944 | 0.1233226 | 0.0873290 |
| ARIMA_l2 | Training | 1.7229168 | 0.1482071 | 0.1086727 | -0.1474021 |
| ARIMA_l_auto | Training | 2.2114508 | 0.1902314 | 0.1409037 | -0.0650330 |
| search_arima_l | Training | 1.9000125 | 0.1634411 | 0.1200104 | -0.0113590 |
| ETS_p_ANN | Training | 0.3091419 | 0.5711014 | 0.5376700 | 0.1402496 |
| ETS_p_AAN | Training | 0.3087124 | 0.5703081 | 0.5355288 | 0.1451141 |
| ETS_p_AAA | Training | 0.2331716 | 0.4307558 | 0.4742422 | 0.2138358 |
| ETS_p_AAdA | Training | 0.2312894 | 0.4272787 | 0.4619095 | 0.2284907 |
| ETS_p_auto | Training | 0.2196435 | 0.4057643 | 0.4242328 | 0.1293067 |
| ARIMA_p1 | Training | 0.2044691 | 0.3777314 | 0.3549140 | -0.0354985 |
| ARIMA_p2 | Training | 0.1979583 | 0.3657036 | 0.3526325 | -0.0001302 |
| ARIMA_p_auto | Training | 0.2569413 | 0.4746673 | 0.4551089 | -0.0061306 |
| search_arima_p | Training | 0.2501069 | 0.4620416 | 0.4445228 | 0.0638146 |
| ARIMA_l_auto | Test | 12.9682971 | 0.8073227 | 0.9196309 | 0.5342044 |
| ARIMA_l1 | Test | 3.9272902 | 0.2444878 | 0.2448780 | 0.4318394 |
| ARIMA_l2 | Test | 4.4564505 | 0.2774299 | 0.2878330 | 0.3818194 |
| ETS_l_AAA | Test | 3.6442281 | 0.2268662 | 0.2279684 | 0.6481309 |
| ETS_l_AAdA | Test | 3.0799670 | 0.1917389 | 0.1976347 | 0.4639456 |
| ETS_l_AAN | Test | 18.1114872 | 1.1275046 | 1.2535791 | 0.7393612 |
| ETS_l_ANN | Test | 5.2015131 | 0.3238127 | 0.2881744 | 0.7332925 |
| ETS_l_auto | Test | 5.2015155 | 0.3238129 | 0.2881746 | 0.7332925 |
| search_arima_l | Test | 4.2370846 | 0.2637736 | 0.2733021 | 0.3889669 |
| ARIMA_p_auto | Test | 0.0519851 | 0.0767901 | 0.1024146 | NA |
| ARIMA_p1 | Test | 0.0779886 | 0.1152014 | 0.1536435 | NA |
| ARIMA_p2 | Test | 0.0890442 | 0.1315321 | 0.1754236 | NA |
| ETS_p_AAA | Test | 0.1457752 | 0.2153327 | 0.2871880 | NA |
| ETS_p_AAdA | Test | 0.1182866 | 0.1747278 | 0.2330335 | NA |
| ETS_p_AAN | Test | 0.0738097 | 0.1090285 | 0.1454107 | NA |
| ETS_p_ANN | Test | 0.0614655 | 0.0907942 | 0.1210917 | NA |
| ETS_p_auto | Test | 0.1643640 | 0.2427913 | 0.3238093 | NA |
| search_arima_p | Test | 0.0295328 | 0.0436245 | 0.0581818 | NA |
ARIMA and ETS plots
An eye test is always helpful and then cross compare it with the de accuracy tests, and then make a decision. All the models were tested at the tail end of the time-series.
# Univariate for now
fc_lay %>%
autoplot(layers_tb, level = 0, size = 0.5) +
labs(title = "Egg Layers"
,y = "Millions") +
theme_ipsum_es()fc_Price %>%
autoplot(Price_tb, level = 0, size = 0.5) +
labs(title = "Egg Layers"
,y = "Millions") +
theme_ipsum_es()VAR
We naturally ran a VAR model under the assumption that a supply shock in egg laying hens would cause prices to spike. After running several models and lags, we found that the optimal lags were 4 months, in which price was the response variable. Although the residuals were not normally distributed, the VAR model provided good results without serial correlation.
The data set I used has not been updated in months, and the most recent data suggests the models tested were not terribly outrageous.
When we ran the irf, the response variable, in this case price, suggests price is negatively correlated by about 5 to 15 percent for every 1 percent change in layers, and the impact would peak 3 months later.
#VAR model
ggplot( data = layers_price_tb) +
geom_point(mapping = aes(x = layers,y = Price)) +
ggtitle("Scatter Plot Egg Layers vs. price")layers_tsvar <-ts(layers_ts$layers, start = 1, frequency = 12)
Price_tsvar <-ts(layers_ts$Price, start = 1, frequency = 12)
#check the optimal lags
var_lag_df <-cbind(layers_tsvar, Price_tsvar)
lagselect <-VARselect(var_lag_df, lag.max = 12, type = "const")
lagselect$selection## AIC(n) HQ(n) SC(n) FPE(n)
## 4 2 2 4
#Build Model
fit_var <-VAR(var_lag_df, p = 4, type = "const", season = NULL, exogen = NULL)
summary(fit_var)##
## VAR Estimation Results:
## =========================
## Endogenous variables: layers_tsvar, Price_tsvar
## Deterministic variables: const
## Sample size: 93
## Log Likelihood: -255.989
## Roots of the characteristic polynomial:
## 0.8796 0.7239 0.7239 0.6756 0.6756 0.4792 0.3403 0.3403
## Call:
## VAR(y = var_lag_df, p = 4, type = "const", exogen = NULL)
##
##
## Estimation results for equation layers_tsvar:
## =============================================
## layers_tsvar = layers_tsvar.l1 + Price_tsvar.l1 + layers_tsvar.l2 + Price_tsvar.l2 + layers_tsvar.l3 + Price_tsvar.l3 + layers_tsvar.l4 + Price_tsvar.l4 + const
##
## Estimate Std. Error t value Pr(>|t|)
## layers_tsvar.l1 1.34280 0.10728 12.517 <2e-16 ***
## Price_tsvar.l1 0.60009 1.39927 0.429 0.6691
## layers_tsvar.l2 -0.33580 0.17734 -1.893 0.0617 .
## Price_tsvar.l2 0.32783 1.72599 0.190 0.8498
## layers_tsvar.l3 -0.05006 0.17774 -0.282 0.7789
## Price_tsvar.l3 -3.82224 1.71795 -2.225 0.0288 *
## layers_tsvar.l4 -0.01200 0.10944 -0.110 0.9130
## Price_tsvar.l4 2.43646 1.42906 1.705 0.0919 .
## const 18.15537 10.30804 1.761 0.0818 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 3.662 on 84 degrees of freedom
## Multiple R-Squared: 0.9504, Adjusted R-squared: 0.9456
## F-statistic: 201 on 8 and 84 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation Price_tsvar:
## ============================================
## Price_tsvar = layers_tsvar.l1 + Price_tsvar.l1 + layers_tsvar.l2 + Price_tsvar.l2 + layers_tsvar.l3 + Price_tsvar.l3 + layers_tsvar.l4 + Price_tsvar.l4 + const
##
## Estimate Std. Error t value Pr(>|t|)
## layers_tsvar.l1 -0.015539 0.008135 -1.910 0.059520 .
## Price_tsvar.l1 0.824341 0.106105 7.769 1.75e-11 ***
## layers_tsvar.l2 0.005945 0.013448 0.442 0.659558
## Price_tsvar.l2 -0.364250 0.130879 -2.783 0.006649 **
## layers_tsvar.l3 0.010587 0.013478 0.786 0.434340
## Price_tsvar.l3 0.495959 0.130269 3.807 0.000266 ***
## layers_tsvar.l4 -0.003620 0.008299 -0.436 0.663773
## Price_tsvar.l4 -0.237747 0.108363 -2.194 0.030999 *
## const 1.188097 0.781641 1.520 0.132266
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.2776 on 84 degrees of freedom
## Multiple R-Squared: 0.6442, Adjusted R-squared: 0.6103
## F-statistic: 19.01 on 8 and 84 DF, p-value: 5.596e-16
##
##
##
## Covariance matrix of residuals:
## layers_tsvar Price_tsvar
## layers_tsvar 13.406882 -0.003883
## Price_tsvar -0.003883 0.077089
##
## Correlation matrix of residuals:
## layers_tsvar Price_tsvar
## layers_tsvar 1.000000 -0.003819
## Price_tsvar -0.003819 1.000000
###################
# Diagnostics #
###################
#Serial Correlation
Serial1 <-serial.test(fit_var, lags.pt = 12, type = "PT.asymptotic")
Serial1##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object fit_var
## Chi-squared = 35.418, df = 32, p-value = 0.3101
#p-value higher than 0.05, so to serial correlation
#Heteroscedasticity
Arch1 <-arch.test(fit_var, lags.multi = 12, multivariate.only = TRUE)
Arch1##
## ARCH (multivariate)
##
## data: Residuals of VAR object fit_var
## Chi-squared = 113.11, df = 108, p-value = 0.3491
#p-value higher than 0.05, so it passes the test of heteroscedasticity
#normal Distribution of the residuals
Norm1 <- normality.test(fit_var, multivariate.only = TRUE)
Norm1## $JB
##
## JB-Test (multivariate)
##
## data: Residuals of VAR object fit_var
## Chi-squared = 1220.2, df = 4, p-value < 2.2e-16
##
##
## $Skewness
##
## Skewness only (multivariate)
##
## data: Residuals of VAR object fit_var
## Chi-squared = 134.24, df = 2, p-value < 2.2e-16
##
##
## $Kurtosis
##
## Kurtosis only (multivariate)
##
## data: Residuals of VAR object fit_var
## Chi-squared = 1086, df = 2, p-value < 2.2e-16
#Testing for Structural Breaks in the Residuals
Stability1 <- stability(fit_var, type = "OLS-CUSUM")
plot(Stability1)# Granger Causality
Grangerlayers <- causality(fit_var, cause = "layers_tsvar")
Grangerlayers## $Granger
##
## Granger causality H0: layers_tsvar do not Granger-cause Price_tsvar
##
## data: VAR object fit_var
## F-Test = 1.7748, df1 = 4, df2 = 168, p-value = 0.1362
##
##
## $Instant
##
## H0: No instantaneous causality between: layers_tsvar and Price_tsvar
##
## data: VAR object fit_var
## Chi-squared = 0.0013564, df = 1, p-value = 0.9706
Grangerprice <- causality(fit_var, cause = "Price_tsvar")
Grangerprice## $Granger
##
## Granger causality H0: Price_tsvar do not Granger-cause layers_tsvar
##
## data: VAR object fit_var
## F-Test = 1.7779, df1 = 4, df2 = 168, p-value = 0.1356
##
##
## $Instant
##
## H0: No instantaneous causality between: Price_tsvar and layers_tsvar
##
## data: VAR object fit_var
## Chi-squared = 0.0013564, df = 1, p-value = 0.9706
#impulse response
Priceirf <- irf(fit_var, impulse = "layers_tsvar", response = "Price_tsvar", n.ahead = 12, boot = TRUE)
plot(Priceirf, ylab = "Price", main = "Shock from Layers")Layersirf <-irf(fit_var, impulse = "Price_tsvar", response = "layers_tsvar", n.ahead = 12, boot = TRUE)
plot(Layersirf, ylab = "Layers", main = "Shock from Price")#Variance decomp
FEVD1 <- fevd(fit_var, n.ahead = 12)
plot(FEVD1)# VAR Forecast
forecast <- predict(fit_var, n.ahead = 12, CI = 0.90)
fanchart(forecast)fanchart(forecast, names = "layers_tsvar")fanchart(forecast, names = "Price_tsvar")Chosing the model
The ARMA and ETS models provided decent results.
Layers
Interestingly, I would not choose any of the auto ARIMA, despite decent scores in training. The best test result was ETS AAdA, which was a damped model. Still, I would still consider an ARIMA pdq(2, 1, 0).
Price
Price was much more challenging. At different dates and models suggested considerably different results. However, when subseting different dates starting in 2017, the models suggested much more reasonable forecasts, where prices could be trending higher, with seasonality or not–the judgement here would be to consider a downward trend in the amount of egg layers. Thus, this is why it would be important to try a different model.
Because we are talking about livestock and agricultural commodities I would lean towards a VAR model or an ARMA model combined with VAR. Although univariate models can be helpful, I believe that it is necessary to include other explanatory variables.
Conclusion
Although more work needs to be done, the VAR model suggested that at about 4 lags, price is responsive to a shock in egg layers, with an impact of about 5 to 15 percent within the first 3 months. This was an interesting finding since the cycle of an egg laying hen is about 4-6 months.
Naturally, there are many other variables that could explain price variations, or even the amount of egg laying hens at any given point, such as laws mandating producers to switch to cage-free laying facilities, income, weather, corn prices, etc.
Further work:
Different models and tests incorporating other explanatory variables should be considered.