1.Introduction


Iron is one of the key commodities traded worldwide. ‘The Economist’ has referred to iron as the most important commodity after oil. It’s because iron is the one of the raw materials to make steel. Steel production goes into buildings and infrastructure, automotive, mechanical equipment and more. Because of its high demand, iron production is closely related to economy as well. Therefore, creating a model and performing forecasting on iron production is important to those who study economics.


For this study, quarterly iron production in Australia from 1956 to 1994 data will be used. This data shows boom and bust of iron production in Australia. This shows good demonstration on how iron production changed as number of competitors increases.


2.Visualization of Data

\label{fig:fig1} *Figure 1*

Figure 1

We will start by visualizing the data. From Figure 1, we can see significant growth rate in iron production from 1956 to about 1975. This is due to high demand in iron after the World War II. This brought massive boom in the industry. There is significant drop in 1980. In the early 1980s, many industries began to use manufacturers in China and South East Asia where they could purchase goods for less money, including iron and steel. Because the behavior of this time series, it might be a better idea not to use time series regression.

\label{fig:fig2} *Figure 2*

Figure 2

From Figure 2, we can see the ACF decreasing slowly while the PACF looks white noise. This is a sign that this time series is non-stationary, and detrending will be required.


3.Data Transformation

\label{fig:fig3} *Figure 3*

Figure 3

To decrease the fluctuation of the time series, the log transformation was applied. As we can see from Figure 3, it fluctuation did not disappear, but it did decrease a little bit.


3.1.Data Transformation - Lowess Smoothing

\label{fig:fig4} *Figure 4*

Figure 4

Since behavior of this time series changes over time, I have decided to apply Lowess smoothing to the data. From Figure 4, ‘Actual vs. Fitted’ plot, we can see that it fits pretty good, but does not capture the significant decreases from year 1980 and year 1990.


3.2.Data Transformation - First Order Difference

\label{fig:fig5} *Figure 5*

Figure 5

As we can see from Figure 5, first order differencing and second order differencing did not show significant difference. And the first order difference looks for the model.


4.Modeling (ARIMA)

\label{fig:fig6} *Figure 6*

Figure 6

By analyzing Figure 6, the ACF and the PACF of the time series, I have decided to use ARIMA (3,1,0). Both plots do not show any sign of seasonality. To make sure there is no better model, several different AIC values from different ARIMA models were compared.

AIC Value
MA0 MA1 MA2 MA3
AR0 -302.9367 -314.2411 -316.9349 -315.0675
AR1 -307.5330 -315.2961 -314.9564 -315.9805
AR2 -315.4985 -317.6391 -319.3311 -319.4251
AR3 -322.9428 -323.1305 -321.4552 -319.4645

From the table above, we can see ARIMA(3,1,1) has the least AIC value. However, ARIMA(3,1,0) has a smaller MA parameter and AIC value is bigger only by a small amount. Therefore, I have decided to use ARIMA (3,1,0).


5.Diagnosis

## initial  value -2.405283 
## iter   2 value -2.486043
## iter   3 value -2.502656
## iter   4 value -2.505649
## iter   5 value -2.505703
## iter   6 value -2.505721
## iter   7 value -2.505721
## iter   7 value -2.505721
## iter   7 value -2.505721
## final  value -2.505721 
## converged
## initial  value -2.512672 
## iter   2 value -2.512682
## iter   3 value -2.512694
## iter   4 value -2.512696
## iter   5 value -2.512696
## iter   6 value -2.512696
## iter   6 value -2.512696
## iter   6 value -2.512696
## final  value -2.512696 
## converged
\label{fig:fig7} *Figure 7*

Figure 7

From Figure 7, a plot of the standard residuals does not show any obvious patterns. There is an outlier close to 4 standard deviations around year 1980. However, rest of the points looks good. The ACF of the residuals look white noise. The norm QQ plot looks good. Most of the points seems to be within the bound. The Q-statistic is not significant at the lags shown. The p-values are above 0.05 at any given lags. This model is good to use.

\label{fig:fig9} *Figure 8*

Figure 8

Figure 8 is a plot of Actual vs fitted value on log applied time series. There are some gaps between actual and fitted value but looks good overall.


6.Forecasting

\label{fig:fig10} *Figure 9*

Figure 9

Figure 9 shows predicted value on log applied time series. We can that predicted value slowly increasing.

Predicted value without log
pred se
1994 Qtr4 1916.333 1.084304
1995 Qtr1 1900.281 1.101536
1995 Qtr2 1941.450 1.108358
1995 Qtr3 1969.898 1.112788
1995 Qtr4 1982.620 1.122813
1996 Qtr1 1989.613 1.132799
1996 Qtr2 2007.898 1.140607
1996 Qtr3 2028.855 1.146967


7.Conclusion

Based on diagnosis of the modeling and prediction, we can conclude that ARIMA (3,1,0) model is easy to handle and captured main feature of the time series. However, as we can see from Figure 1, there are sudden increase and decrease as time increases. And there was no seasonal pattern with this time series, but it is possible that there might be a seasonal pattern with newer data.