Exercise 5.1

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)
Bricks (aus_production)
NSW Lambs (aus_livestock)
Household wealth (hh_budget).
Australian takeaway food turnover (aus_retail).

Solution

Austrialian Population

Typically population growth follows a geometric pattern. Since the Australian population is increasing at a steady trend, we know the NAIVE(y) and SNAIVE(y) will produce unrealistic forecast. They will forecast a constant or declining population.

Of the three methods, the best forecast of the raw time series would be the RW drift method since this is the only one for which the population increases. While RW predicts linear instead of geometric growth, it is still a good local approximation. Drawing a line from the first to most recent observation seems to do a good job.

# Filter the data
auspop <- global_economy %>% 
  dplyr::filter( Country == 'Australia')

# Fit the model to a drift method
auspop_fit <- auspop %>% model(trend_model = RW( Population ~ drift()))

( fc_auspop = auspop_fit %>%  forecast( h=3 ) )

## # A fable: 3 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country   .model       Year          Population     .mean
##   <fct>     <chr>       <dbl>              <dist>     <dbl>
## 1 Australia trend_model  2018   N(2.5e+07, 6e+09) 24850204.
## 2 Australia trend_model  2019 N(2.5e+07, 1.2e+10) 25101475.
## 3 Australia trend_model  2020 N(2.5e+07, 1.9e+10) 25352746.

fc_auspop %>% autoplot(auspop) + 
  labs(title="Drift Forecast of Australian Population", 
       subtitle = "h=3 years forward", 
       xlab="Year", ylab="Population" )

Thus, the population is forecast in 2018 o be 24,850,204 and so forth.

Bricks

The bricks time series stops at Q2 2005 so we first filter the time series before forecasting.

bricks_dat <- aus_production %>% filter(!is.na(Bricks)) 
tail(bricks_dat)

## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 2004 Q1   435    3974    409   1963       54561   173
## 2 2004 Q2   390    5027    423   2180       54644   215
## 3 2004 Q3   412      NA    428   2307       55915   227
## 4 2004 Q4   454      NA    397   2157       52850   190
## 5 2005 Q1   416      NA    355   1980       55035   170
## 6 2005 Q2   403      NA    435   2481       55117   206

p1 = autoplot(bricks_dat, Bricks) + labs(title="Bricks Production in Australia", subtitle="1956Q1-2005Q2")
p2 = gg_season(bricks_dat, Bricks)

plot_grid(p1, p2, labels="AUTO", align="h")

We observe strong annual seasonality in the bricks time series plots above. Q1 is usually lowest. Q3 is highest. The cause of seasonality could be weather conditions to permit bricklaying.

bricks_dat %>% model( snaive = SNAIVE(Bricks ~ lag("year") ) ) -> mod_bricks

( fc_bricks = mod_bricks %>% forecast( h = 4 ) )

## # A fable: 4 x 4 [1Q]
## # Key:     .model [1]
##   .model Quarter       Bricks .mean
##   <chr>    <qtr>       <dist> <dbl>
## 1 snaive 2005 Q3 N(428, 2336)   428
## 2 snaive 2005 Q4 N(397, 2336)   397
## 3 snaive 2006 Q1 N(355, 2336)   355
## 4 snaive 2006 Q2 N(435, 2336)   435

fc_bricks %>% autoplot( bricks_dat)

Thus, we forecast in Q3 2005 the production of 428 million bricks.

NSW Lambs

The slaughter count of New South Wales lambs has both trend and seasonality. It looks volatile in the short term. It may be cyclical due to economic recessions or consumer preferences for lamb.

The STL decomposition below shows our original data and the components.

nswlamb = aus_livestock %>% filter( State == 'New South Wales', 
                          Animal == 'Lambs'
                          )

tail(nswlamb)

## # A tsibble: 6 x 4 [1M]
## # Key:       Animal, State [1]
##      Month Animal State            Count
##      <mth> <fct>  <fct>            <dbl>
## 1 2018 Jul Lambs  New South Wales 437000
## 2 2018 Aug Lambs  New South Wales 394300
## 3 2018 Sep Lambs  New South Wales 306900
## 4 2018 Oct Lambs  New South Wales 411200
## 5 2018 Nov Lambs  New South Wales 426400
## 6 2018 Dec Lambs  New South Wales 351600

nswdcmp = nswlamb %>% model( stl = STL(Count) )

components(nswdcmp) %>% autoplot()

tail(components(nswdcmp)  )

## # A dable: 6 x 9 [1M]
## # Key:     Animal, State, .model [1]
## # :        Count = trend + season_year + remainder
##   Animal State .model    Month  Count  trend season_year remainder season_adjust
##   <fct>  <fct> <chr>     <mth>  <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
## 1 Lambs  New … stl    2018 Jul 437000 4.10e5      29633.    -2159.       407367.
## 2 Lambs  New … stl    2018 Aug 394300 4.05e5      17251.   -27923.       377049.
## 3 Lambs  New … stl    2018 Sep 306900 4.00e5      -6482.   -87034.       313382.
## 4 Lambs  New … stl    2018 Oct 411200 3.96e5      21252.    -5914.       389948.
## 5 Lambs  New … stl    2018 Nov 426400 3.91e5      13161.    22731.       413239.
## 6 Lambs  New … stl    2018 Dec 351600 3.85e5     -34714.     1159.       386314.

We chose the seasonal naive model over the drift model. The reason is that seasonal fluctuation are much larger than trend effects. The STL decomposition table shows that Nov-Dec 2018 trend changes are about 5000 lambs per month. In contrast, seasonal effect show a monthly fluctuation of 20-47 thousand lambs per month.

The forecast for the next 6 months is shown below. We forecast 447,900 lambs slaughterd in Jan 2019 in NSW, for example.

nswlamb %>% model( snaive = SNAIVE(Count ~ lag("year") ) ) -> mod_lambs

( fc_lambs = mod_lambs %>% forecast(h=6) )

## # A fable: 6 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State           .model    Month              Count  .mean
##   <fct>  <fct>           <chr>     <mth>             <dist>  <dbl>
## 1 Lambs  New South Wales snaive 2019 Jan N(447900, 3.1e+09) 447900
## 2 Lambs  New South Wales snaive 2019 Feb N(421000, 3.1e+09) 421000
## 3 Lambs  New South Wales snaive 2019 Mar N(454100, 3.1e+09) 454100
## 4 Lambs  New South Wales snaive 2019 Apr N(419100, 3.1e+09) 419100
## 5 Lambs  New South Wales snaive 2019 May N(515800, 3.1e+09) 515800
## 6 Lambs  New South Wales snaive 2019 Jun N(446800, 3.1e+09) 446800

fc_lambs %>% autoplot( nswlamb)

Household wealth

Wealth is defined as a percentage of net disposable income from 1995-2016 in this dataset. Four countries are included: Australia, Japan, Canada, USA.

The St. Louis Federal Reserve argues that household wealth is driven by equity and housing. Most household own a house and some financial assets. They argue that while household wealth is near a recent peak, the current level could decline if real estate or stock markets do so.
(https://www.stlouisfed.org/publications/in-the-balance/2017/household-wealth-is-at-a-post-ww-ii-high-should-we-celebrate-or-worry)

These markets are cyclical but not seasonal which rules out using SNAIVE() for forecasting.

Overall longer periods (.e.g 1950-2017), household wealth fluctuates around 5.5. This rules out an upward trend extrapolation using RW forecasting method.

Consequently, I argue for the use of NAIVE(). This assumes that current markets don’t collapse or skyrocket. Therefore, tomorrow’s wealth should be around today’s level.

hh_budget

## # A tsibble: 88 x 8 [1Y]
## # Key:       Country [4]
##    Country    Year  Debt     DI Expenditure Savings Wealth Unemployment
##    <chr>     <dbl> <dbl>  <dbl>       <dbl>   <dbl>  <dbl>        <dbl>
##  1 Australia  1995  95.7 3.72          3.40   5.24    315.         8.47
##  2 Australia  1996  99.5 3.98          2.97   6.47    315.         8.51
##  3 Australia  1997 108.  2.52          4.95   3.74    323.         8.36
##  4 Australia  1998 115.  4.02          5.73   1.29    339.         7.68
##  5 Australia  1999 121.  3.84          4.26   0.638   354.         6.87
##  6 Australia  2000 126.  3.77          3.18   1.99    350.         6.29
##  7 Australia  2001 132.  4.36          3.10   3.24    348.         6.74
##  8 Australia  2002 149.  0.0218        4.03  -1.15    349.         6.37
##  9 Australia  2003 159.  6.06          5.04  -0.413   360.         5.93
## 10 Australia  2004 170.  5.53          4.54   0.657   379.         5.40
## # … with 78 more rows

autoplot(hh_budget, Wealth)

We generate a 3 year ahead forecast for all 4 countries and plot the results.

mod_wealth = hh_budget %>% model( naive = NAIVE(Wealth ) )

(fc_wealth = mod_wealth %>% forecast( h = 3 ))

## # A fable: 12 x 5 [1Y]
## # Key:     Country, .model [4]
##    Country   .model  Year       Wealth .mean
##    <chr>     <chr>  <dbl>       <dist> <dbl>
##  1 Australia naive   2017  N(422, 536)  422.
##  2 Australia naive   2018 N(422, 1071)  422.
##  3 Australia naive   2019 N(422, 1607)  422.
##  4 Canada    naive   2017  N(565, 570)  565.
##  5 Canada    naive   2018 N(565, 1139)  565.
##  6 Canada    naive   2019 N(565, 1709)  565.
##  7 Japan     naive   2017  N(602, 348)  602.
##  8 Japan     naive   2018  N(602, 695)  602.
##  9 Japan     naive   2019 N(602, 1043)  602.
## 10 USA       naive   2017 N(609, 1113)  609.
## 11 USA       naive   2018 N(609, 2225)  609.
## 12 USA       naive   2019 N(609, 3338)  609.

fc_wealth %>% autoplot( hh_budget) + 
  labs(title="Weath Forecast", 
       subtitle="Naive()", 
       ylab="% of net disposable income")

Australian takeaway food turnover

In the next two plots, we see a strong positive trend in all states of Australia over all months.

We see limited evidence of seasonality by month. The monthly average turnover hardly changes.
While states vary in their spending due to population differences, the appetite for takeaway is increasing.

takeaway = aus_retail %>% filter( Industry == 'Takeaway food services') %>% 
  select(State, Industry, Month, Turnover)
head(takeaway)

## # A tsibble: 6 x 4 [1M]
## # Key:       State, Industry [1]
##   State                        Industry                  Month Turnover
##   <chr>                        <chr>                     <mth>    <dbl>
## 1 Australian Capital Territory Takeaway food services 1982 Apr      3.2
## 2 Australian Capital Territory Takeaway food services 1982 May      3.3
## 3 Australian Capital Territory Takeaway food services 1982 Jun      3.5
## 4 Australian Capital Territory Takeaway food services 1982 Jul      3.5
## 5 Australian Capital Territory Takeaway food services 1982 Aug      3.7
## 6 Australian Capital Territory Takeaway food services 1982 Sep      3.9

takeaway %>% gg_subseries(Turnover)

takeaway %>% autoplot(Turnover)

The trend effect is dominant therefore we choose the RW drift method to forecast turnover.

mod_takeaway = takeaway %>% model(  RW(Turnover ~ drift() ) )

fc_takeaway = mod_takeaway %>% forecast( h = 3 )

We display three of the states below. The plot

fc_takeaway  %>% filter( State %in% c('Australian Capital Territory',
                                      'New South Wales' ,
                                      'Victoria' ) ) %>% 
     autoplot(takeaway )

Exercise 5.2

Use the Facebook stock price (data set gafa_stock) to do the following:

a. Produce a time plot of the series.

b. Produce forecasts using the drift method and plot them.

c. Show that the forecasts are identical to extending the line drawn between the first and last observations.

d. Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

Solution

Before plotting the adjusted closing price of Facebook, we need to correct 2 difficulties which cause error messages during model construction

an irregular time interval in the tsibble whereas a daily interval is most suitable
the dataset has gaps due to weekends and holidays since it follows market days.

To change from irregular to daily interval, we call as_tsibble with regular equal to TRUE. To create all calendar days, we use fill_gaps to add weekends and holidays with NA values for Adj_Close. The NA values for Adj_Close don’t pose problems during model building.

Below is the time series plot.

fb = gafa_stock %>% 
  filter(Symbol=="FB") %>% 
  select(Symbol, Date, Adj_Close)

fb = as_tsibble( fb, key = "Symbol", index = "Date", regular = TRUE)

# Use fb2 for model construction
fb2 = fb %>% fill_gaps()

#dim(fb)
#dim(fb2)

fb2 %>% autoplot(Adj_Close) + 
  labs( title = "Facebook Closing Price History") + xlab("Date")

We plot a 20 day forecast of FB stock price using the drift method().

mod_fb2 = fb2 %>% model(  RW(Adj_Close ~ drift() ) )

fc_fb2 = mod_fb2 %>% forecast( h = 20 ) 

fc_fb2 %>% autoplot(fb2)

The forecast at future dates are identical to a straight line interpolation as follows:

\[\hat{f}(T+h | I_{T} ) = f(T) + \frac{f(T) - f(1)}{d(1, T)} d(T,T+h) \] In the above formula, the forecast of \(\hat{f}()\) on \(T+h\) given all information \(I_T\) up to and including \(T\) for \(h \geq 1\). The day count difference between days \(d(x,y)\) represents the business days between \(x\) and \(y\) excluding one of the endpoints.

I show the first price \(f(1)\), last price \(f(T)\), days difference \(d(1,T)\), the ratio below.

(first_price = fb$Adj_Close[1])

## [1] 54.71

(last_price = fb$Adj_Close[ length(fb$Adj_Close)])

## [1] 131.09

(days_diff = length(fb$Adj_Close)-1)

## [1] 1257

(delta = (last_price - first_price) / days_diff )

## [1] 0.06076372

The table below shows the forecast prices and manually calculated prices are the same (to all visible decimal places). This proves that drift method is equivalent to straight line interpolation of end points.

pred_fb2 = data.frame( date = fc_fb2$Date[1:5], 
                      forecast = fc_fb2$.mean[1:5] ,
                      my_calcs = last_price + delta * c(1:5)
                      )

pred_fb2 %>% kable() %>% kable_styling()

date	forecast	my_calcs
2019-01-01	131.1508	131.1508
2019-01-02	131.2115	131.2115
2019-01-03	131.2723	131.2723
2019-01-04	131.3331	131.3331
2019-01-05	131.3938	131.3938

d. Other simple forecast methods we consider are seasonal naive and naive. We examine the next 20 days of forecast stock prices. For the seasonal naive model, we use a 2 month lag but there is no obvious choice since stock prices are not seasonal.

mod_fb2 = fb2 %>% model( snaive = SNAIVE(Adj_Close ~ lag(60) ) ,
                         naive  = NAIVE(Adj_Close)
                         )

( fc_fb2 = mod_fb2 %>% forecast( h = 20 ) )

## # A fable: 40 x 5 [1D]
## # Key:     Symbol, .model [2]
##    Symbol .model Date         Adj_Close .mean
##    <chr>  <chr>  <date>          <dist> <dbl>
##  1 FB     snaive 2019-01-01 N(150, 219)  150.
##  2 FB     snaive 2019-01-02  N(NA, 219)   NA 
##  3 FB     snaive 2019-01-03  N(NA, 219)   NA 
##  4 FB     snaive 2019-01-04 N(149, 219)  149.
##  5 FB     snaive 2019-01-05 N(150, 219)  150.
##  6 FB     snaive 2019-01-06 N(152, 219)  152.
##  7 FB     snaive 2019-01-07 N(148, 219)  148.
##  8 FB     snaive 2019-01-08 N(145, 219)  145.
##  9 FB     snaive 2019-01-09  N(NA, 219)   NA 
## 10 FB     snaive 2019-01-10  N(NA, 219)   NA 
## # … with 30 more rows

We plot the 2 model forecasts for 20 days together with approximately 1 year of historical data.

fc_fb2 %>% autoplot(fb2 %>% slice_tail(n=365) )

We conclude that naive model is the most logical. The current level is a reasonable short term prediction of future prices according to the efficient market hypothesis. There is no support in theory or visually for seasonality in the stock price. Moreover, the drift method, while attractive, is highly sensitive to outliers. When the current stock price changes, future forecasts will jump around wildly.

Exercise 5.3

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)

# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))

# Look at the residuals
fit %>% gg_tsresiduals()

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

What do you conclude?

Solution

Replicating the suggested plots gives the following output for SNAIVE forecasting of Beer production.

# Extract data of interest
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)

# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))

# Look at the residuals
fit %>% gg_tsresiduals()

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

We also find the innovation residuals have a negative mean.

mean(augment(fit)$.innov , na.rm = TRUE)

## [1] -1.571429

We find two interesting observations:

The seasonal naive model consistently overestimates future beer production using the prior year’s production because there is a long term trend decline in beer production. This could reflect disinterest in beer consumption.
Consistent with the model deficiency, the innovation residuals are \(\color{red}{\text{not white noise}}\) because the lag 4 autocorrelation is significantly negative (over -50%) and mean is negative.

For completeness, we plot the first 4 autocorrelations of the innovation residuals. The negative autocorrelation at lag 4 is consistent across data points of all 4 quarters.

gg_lag(augment(fit), .innov , geom="point", lags = 1:4) +
  labs(title="4 lags of Innovation Residuals", subtitle="1992-2010" )

Exercise 5.4

Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Solution

Australian Exports

Australian exports are well suited to the Naive time series model. The plot of the innovation residuals from 1960 to 2017 shows low autocorrelations in exports. The histogram looks somewhat symmetric and normal shaped. A white noise process for the residuals seems plausible. We show the plots below.

# Extract data of interest
recent_exports <- global_economy %>%
filter(Year>= 1960, Country == "Australia")

# Define and estimate a model
#fit <- recent_exports %>% model(SNAIVE(Exports ~ lag(8) ))
fit <- recent_exports %>% model(NAIVE(Exports ))

# Look at the residuals
fit %>% gg_tsresiduals()

# Look a some forecasts
fit %>% forecast(h = 5) %>% autoplot(recent_exports) +
  labs(title="Forecast of Australian Exports", subtitle="NAIVE() model")

Moreover, the mean of the innovation residuals is close to zero as shown below.

mean(augment(fit)$.innov , na.rm = TRUE)

## [1] 0.1451912

Bricks Production

For the bricks data, neither NAIVE nor SNAIVE looks like a good candidate to model the residuals. We plot SNAIVE with lag of 4 to illustrate the lack of fit. The innovation residuals and their ACF look non-white noise.

# Define and estimate a model
fit <- aus_production %>% 
  filter_index( . ~ '2005 Q2') %>% 
  select(Bricks) %>% model(SNAIVE(Bricks ~ lag(4) ))

# Look at the residuals
fit %>% gg_tsresiduals()

# Look a some forecasts
fit %>% forecast() %>% autoplot(aus_production) +
  labs(title="Bricks Production Forecast" , subtitle = "Seasonal Naive (lag 4)")

The naive model below for Bricks production also does not work well. There appears to be seasonality at lags 2 and 4 of different amplitudes.

# Define and estimate a model
#fit <- recent_exports %>% model(SNAIVE(Exports ~ lag(8) ))
fit <- aus_production %>% 
  filter_index( . ~ '2005 Q2') %>% 
  select(Bricks) %>% model(NAIVE(Bricks ))

# Look at the residuals
fit %>% gg_tsresiduals()

# Look a some forecasts
fit %>% forecast(h=5) %>% autoplot(aus_production)

I conclude that a more advanced time series model with trend and multiple seasonalities may allow better forecasting.

Exercise 5.7

For your retail time series (from Exercise 8 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

        myseries_train <- myseries %>%
          filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

          autoplot(myseries, Turnover) +
          autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

        fit <- myseries_train %>%
          model(SNAIVE())

Check the residuals.

        fit %>% gg_tsresiduals()

Do the residuals appear to be uncorrelated and normally distributed?

Produce forecasts for the test data

        fc <- fit %>%
          forecast(new_data = anti_join(myseries, myseries_train))
        fc %>% autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

        fit %>% accuracy()
        fc %>% accuracy(myseries)

How sensitive are the accuracy measures to the amount of training data used?

Solution

We construct the training dataset below. As before, the chosen seed produces the sales turnover data in Western Australia for the Pharmaceutical, cosmetic and toiletry goods sector.

set.seed(392381)
myseries <- aus_retail %>% filter(`Series ID` == sample(aus_retail$`Series ID`, 1 ))

myseries <- myseries %>% fill_gaps()

myseries_train <- myseries %>%
          filter(year(Month) < 2011)

We plot the data.

autoplot(myseries, Turnover) +
      autolayer(myseries_train, Turnover, colour = "red") +
  labs(title="Turnover data: Training + Test", 
       subtitle="Western Australia: Pharmaceutical, cosmetic, toiletry goods")

        fit <- myseries_train %>%
          model(SNAIVE(Turnover ~ lag(12)))

Check the residuals.

        fit %>% gg_tsresiduals()

The residuals are autocorrelated and not normally distributed. Possibly, a Box-Cox transformation may be helpful in stabilizing the variance, but we will leave that as out of scope.

Produce forecasts for the test data

We compare the accuracy of the model forecasts against the actual values.

insample = fit %>% fabletools::accuracy() %>% select(MAE, RMSE, MAPE, MASE, RMSSE)

outofsample = fc %>% fabletools::accuracy(myseries) %>% select( MAE, RMSE, MAPE, MASE, RMSSE)

# Gather the results in a single table from in-sample vs. forecast

df_accuracy = rbind(insample, outofsample) 

row.names(df_accuracy) = c("In Sample", "Out of Sample")

df_accuracy %>% 
  kable(digits = 2, caption="Accuracy InSample vs Out of Sample" ) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

Accuracy InSample vs Out of Sample

	MAE	RMSE	MAPE	MASE	RMSSE
In Sample	5.29	7.99	11.56	1.00	1.0
Out of Sample	33.54	35.93	23.58	6.34	4.5

We conclude that the model performs poorly both in sample and out of sample. Using the MAPE, mean absolute percentage error, the in-sample value of 11.56% suggests significant challenge to fit the data. However, the out-of-sample MAPE is 23.58% which is twice the relative error of the in-sample data.

Using scale-dependent measures like MAE, the same pattern holds. The in-sample MAE, mean absolute error, is 5.29 million AUD per month. The out of sample MAE is 33.54 million AUD per month. The recommended scaled error metrics, MASE, mean absolute scaled error, and RMSSE, root mean squared scaled errors, tell a consistent story.

If we believe the model error is caused by too large a training data set, we can select a shorter time window.

Let’s observe the impact of using historical data from 2004-2011 and training data from 2011-2018.

short_total <- myseries %>%
          filter(year(Month) > 2004)

short_train    <- short_total %>%
          filter(year(Month) < 2011)

short_fit <- short_train %>%
          model(SNAIVE(Turnover ~ lag(12)))

short_fit %>% gg_tsresiduals()

short_fc <- short_fit %>%
          forecast(new_data = anti_join(short_total, short_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

short_fc %>% autoplot(myseries)

insample_short = short_fit %>% 
  fabletools::accuracy() %>% 
  select(MAE, RMSE, MAPE, MASE, RMSSE)

outofsample_short = short_fc %>% 
  fabletools::accuracy(myseries) %>% 
  select( MAE, RMSE, MAPE, MASE, RMSSE)

df_accuracy_short = rbind(insample_short, outofsample_short) 

row.names(df_accuracy_short) = c("In Sample", "Out of Sample")

df_accuracy_short %>% 
  kable(digits = 2, 
        caption="Short Data: Accuracy InSample vs Out of Sample" ) %>% 
  kable_styling(bootstrap_options = c("striped", "hover"))

Short Data: Accuracy InSample vs Out of Sample

	MAE	RMSE	MAPE	MASE	RMSSE
In Sample	12.51	14.96	14.27	1.00	1.0
Out of Sample	33.54	35.93	23.58	6.34	4.5

We conclude the accuracy of the test data is insensitive to the length of the training history.
This is because metrics like MAE, RMSE, MAPE are independent of the length of training history.

Data 624 HW 3 (Week 4) Time Series

Alexander Ng

Due 09/26/2021

Overview

Exercise 5.1

Solution

Austrialian Population

Bricks

NSW Lambs

Household wealth

Australian takeaway food turnover

Exercise 5.2

Solution

Exercise 5.3

Solution

Exercise 5.4

Solution

Australian Exports

Bricks Production

Exercise 5.7

Solution