Introduction

This case study examines a time series data set that tracks the monthly electric production in kilowatt-hours (kWh) from January 1973 to December 2010 for the state of New South Wales, Australia. The data set contains 456 observations. This data set was retrieved from kaggle and uploaded to github. The data is available through the following link: https://raw.githubusercontent.com/JackRoss10089/STA-321/main/Electric_Production.csv. The goal of this case study is to decompose this time series data to observe seasonality and other trends within the data. Also, we aim find the ideal sample size for our training data set.

Exploratory Data Analysis

After retrieving the data from github, we must next prepare the data set for analysis. First we must reduce the data set to the 200 most recent observations. Then we remove the date variable from the data set as we do not need it to utilize the ts function. We next define a time series object for this data set using a frequency of 12 because this data is for monthly observations.

Forecasting with Decomposing

Next we will use a classical decomposition method and a STL decomposition method to perform forecasts on this additive time series.

After performing the classical decomposition and the STL decomposition, we can see that there is distinct seasonality in this data set. There is also a slight positive trend with this data as well. This suggests that energy production has been slowly increasing and the amount of energy produced varies from season to season throughout the year.

Training and Testing Data

Next we will partition the data into a training and testing data set in order to perform forecasting methods on this data. We will use the last 7 periods of data for testing, and the rest of the historical data will be used to train the forecasting model.

We next perform error analysis.

Error comparison between forecast results with different sample sizes
MSE MAPE
n.144 26.92753 0.0346743
n.109 26.83099 0.0345234
n. 73 27.71894 0.0347812
n. 48 26.42848 0.0355194

We next create a visualization for the change in error based upon testing and training data set size.

Based upon the error curve, the best training data set size to use when n = 109. THis value reports the least MAPE value and the secind least MSE value, suggesting this training data set size will yield the best performance.