Summary and conclusions

This document is intended to be a concise report to explain the Time Series analysis performed on a dataset containing information about Milk Production (available here). The analysis was created as part of the Data Science Certificate in the class Methods for Data Analysis at University of Washington.

The idea is to analyze trend and seasonality of the production of Milk from 1995 until 2013, decomposing the time series and analysing the remainder. After that we also intend to create a model to forecast the production of 2014.

The analysis show that the time series is not stationary, as it has a trend, and that after removing trend and season efftects, the remainder of the decomposition is ARMA(1,0). The forecasting for 2014 showed good results with milk production data with confidence intervals reasonably small compared to the means.


Times Series Introduction

The first thing to do in this kind of analysis is verify if the time series is stationary. To do so, we plot the series, right after loading the data and isolating the milk production values of the complete dataset.

production <- read.csv('CADairyProduction.csv')
milk.ts <- ts(production$Milk.Prod, start=1995, freq=12)
plot(log(milk.ts), main='Milk Production Time Series', ylab='Log of Milk Production')

As we can see from the plot there is clear tren in this time series data, as the milk production is increasing overall in time. Therefore it is not stationary. Also some seasonal component may be present. With this in mind we proceed to the decomposition of the time series.


Time Series Decomposition

To decompose the time series we use seasonal decomposition by loess with span of 0.5. Right after the decomposition we plot the results of the function stl of the stats package.

milk.decomp <- stl(log(milk.ts), s.window = "periodic", t.window = 0.5*length(milk.ts))
plot(milk.decomp, main = paste('Decompositon of Milk Production with lowess span = 0.5'))

First, as stated before, the trend is clear is the third part of the plot. Also, the second part of the plot now shows a significant seasonal component. To analyse the remainder, we now isolate the remainder data and then show the ACF and PACF plots.

remainder.ts <- milk.decomp$time.series[,'remainder']
plot.acf(remainder.ts, is.df=F) ## Function added in the appendix

From the plots, the most clear information comes the ACF of Remainder, which shows that the time series lags are correlated with time, probably insinuating an Auto Regressive process.

Taking the observed plots into consideration we tried different ARMA processes for the remainder and only show below the one with the best (minimum) AIC. Finally we show the ACF and PACF plots of the residuals.

milk.arima = arima(remainder.ts, order = c(1,0,0), include.mean=F)
milk.arima
## 
## Call:
## arima(x = remainder.ts, order = c(1, 0, 0), include.mean = F)
## 
## Coefficients:
##          ar1
##       0.8151
## s.e.  0.0380
## 
## sigma^2 estimated as 0.000206:  log likelihood = 643.52,  aic = -1283.05
plot.acf(milk.arima$resid[-1], col = 'Milk ARIMA model residuals', is.df=F)

We can see that the best remainder model is ARMA process with p=1 and q=0. Also, from the ACF and PACF plots, we see no information left to correlate in any ARIMA process. Therefore, confirming a good choice of p and q.


Forecasting

To forecast for 2014 milk production, we use the forecast R package.

require(forecast)
ARIMAfit <- auto.arima(log(milk.ts), max.p=3, max.q=3, max.P=2, max.Q=2, max.order=5,
                       max.d=2, max.D=1, start.p=0, start.q=0, start.P=0, start.Q=0)
summary(ARIMAfit)
## Series: log(milk.ts) 
## ARIMA(0,1,1)(0,1,2)[12]                    
## 
## Coefficients:
##           ma1     sma1    sma2
##       -0.1506  -0.9076  0.1129
## s.e.   0.0743   0.0794  0.0838
## 
## sigma^2 estimated as 0.0002547:  log likelihood=577.8
## AIC=-1147.6   AICc=-1147.41   BIC=-1134.12
## 
## Training set error measures:
##                         ME       RMSE        MAE         MPE    MAPE      MASE        ACF1
## Training set -0.0003536657 0.01549906 0.01109068 -0.01955938 1.05342 0.2902694 0.005145456

From the summary above, we see that our ARIMA has d=1 and q=1 with seasonal D=1 and Q=2. With this model in hand we can now forecast for the 2014, namely the next 12 months.

milk.forecast = forecast(ARIMAfit, h=12)
summary(milk.forecast)
## 
## Forecast method: ARIMA(0,1,1)(0,1,2)[12]                   
## 
## Model Information:
## Series: log(milk.ts) 
## ARIMA(0,1,1)(0,1,2)[12]                    
## 
## Coefficients:
##           ma1     sma1    sma2
##       -0.1506  -0.9076  0.1129
## s.e.   0.0743   0.0794  0.0838
## 
## sigma^2 estimated as 0.0002547:  log likelihood=577.8
## AIC=-1147.6   AICc=-1147.41   BIC=-1134.12
## 
## Error measures:
##                         ME       RMSE        MAE         MPE    MAPE      MASE        ACF1
## Training set -0.0003536657 0.01549906 0.01109068 -0.01955938 1.05342 0.2902694 0.005145456
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       1.259101 1.238648 1.279555 1.227820 1.290382
## Feb 2014       1.193990 1.167154 1.220825 1.152948 1.235031
## Mar 2014       1.303803 1.271835 1.335772 1.254912 1.352695
## Apr 2014       1.278470 1.242086 1.314854 1.222825 1.334115
## May 2014       1.307799 1.267480 1.348118 1.246136 1.369462
## Jun 2014       1.256228 1.212325 1.300130 1.189084 1.323371
## Jul 2014       1.253456 1.206240 1.300671 1.181246 1.325665
## Aug 2014       1.243199 1.192889 1.293509 1.166256 1.320141
## Sep 2014       1.198112 1.144887 1.251337 1.116711 1.279512
## Oct 2014       1.232666 1.176677 1.288655 1.147039 1.318293
## Nov 2014       1.209167 1.150544 1.267789 1.119512 1.298821
## Dec 2014       1.252931 1.191789 1.314074 1.159422 1.346440
plot(milk.forecast)

The summary results are showing acceptable error measures. The plot with the forecast in blue shows the mean predicted value for each month, which is detailed before the plot by the summary. The confidence intervals in gray in the plot are reasonably small compared to the means, and can also be check in the summary for 80% and 95%.


Conclusion

From the analysis above we can conclude that forecasting for 2014 showed good results with milk production data with confidence intervals reasonably small compared to the means.


Appendix

Function used in this report is showed below.

## Show ACF and PACF plots
plot.acf <- function(df, col = 'Remainder', is.df=TRUE){
  if(is.df) temp <- df[, col]
  else temp <- df
  par(mfrow = c(2,1))
  acf(temp, main = paste('ACF of', col))
  pacf(temp, main = paste('PACF of', col))
  par(mfrow = c(1,1))
}