DS624_HW3_Jagdish

#?global_economy
#?aus_production
#?aus_livestock
#?hh_budget
tail(aus_retail)

5.1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

aus.pop<-global_economy%>%filter(Country=="Australia")

aus.pop%>%autoplot(Population/1000000)+labs(x="Years", y="Population in Mlns", title= "Australian Population in millions")

From the above plot, we can see that the population has been increasing at a steady rate. Because of this, using the Naive and SNaive models would not be appropriate since the Naive model would just forecast based on the last known value and SNaive would incorporate seasonality which doesn’t really exist in the above. Using the Random walk model with a drift seems to be the best choice here.

aus.pop.fit<-aus.pop%>%model(RW(Population ~ drift()))

aus.pop.fc<-aus.pop.fit%>%forecast(h=5)

aus.pop.fc%>%autoplot(aus.pop)+ labs(title="Random Walk with drift - Forecast of Australian Population", subtitle = "h=5 years forecast", xlab="Years", ylab="Population")

Bricks (aus_production)

aus_production%>%autoplot(Bricks)

## Warning: Removed 20 row(s) containing missing values (geom_path).

From the plot above, it looks like there is seasonality in the production of bricks in Australia. We can generate the ggseason plot to verify this further. From the plot below, it looks like there is en element of seasonality wherein Q2 and Q3 production is typically higher than Q1 and Q4. Q1 is typically the lowest amount and Q3 is typically the highest amount.

So it would make sense to use the Seasonal Naive model with a lag = year, in this case.

aus_production%>%gg_season(Bricks)

## Warning: Removed 20 row(s) containing missing values (geom_path).

bricks<-aus_production%>%drop_na()%>%select(Quarter, Bricks)
  
bricks.fit<-bricks%>%model(SNAIVE(Bricks~lag("year"))) 

bricks.fc<-bricks.fit%>%forecast(h=8)

bricks.fc%>%autoplot(bricks)+ labs(title="Forecast of Australian Brick production based on Seasonal Naive model", subtitle = "h=8 quarters forecast", xlab="Quarters", ylab="Bricks Production")

NSW Lambs (aus_livestock)

nsw.lambs<-aus_livestock%>%filter(Animal=="Lambs",State=='New South Wales')%>%drop_na()
nsw.lambs%>%autoplot(Count)

From the plot above, we can see that the slaughter count seems to be seasonal in nature. To verify, we an plot the seasonal plot.

nsw.lambs%>%gg_season(Count)

nsw.decomp<-nsw.lambs%>%model(stl = STL(Count))

components(nsw.decomp)%>%autoplot()

It’s clear that the data is seasonal, but it also seems to have a trend component. It looks like the seasonal naive model would be the best alternative.

nsw.lambs.fit<-nsw.lambs%>%model(SNAIVE(Count)) 

nsw.lambs.fc<-nsw.lambs.fit%>%forecast(h=4)

nsw.lambs.fc%>%autoplot(nsw.lambs)+labs(title="Forecast of Lambs slaughter in New South Wales", subtitle = "h=4 quarters forecast", xlab="Quarters", ylab="Count of Lambs Slaughtered")

Household wealth (hh_budget)

hh.wealth<-hh_budget%>%select(Wealth)%>%drop_na()

hh.wealth%>%autoplot(Wealth)

The plot shows cyclicality and a strong trend component post 2011. So it seems using the Random Walk with a drift model would be the best choice here.

hhwealth.fit<-hh.wealth%>%model(RW(Wealth ~ drift())) 

hhwealth.fc<-hhwealth.fit%>%forecast(h=5)

hhwealth.fc%>%autoplot(hh.wealth)+ labs(title="Random Walk with drift - Forecast of Household Wealth", subtitle = "h=5 years forecast", xlab="Years", ylab="Household Wealth")

Australian takeaway food turnover (aus_retail).

tail(aus_retail)

food<-aus_retail%>%filter(`Series ID`=='A3349435A')%>%select(Month,Turnover)

We use the Series ID corresponding to Industry = “Takeaway food services” to filter the data, and realize that all the data for this industry is for State = “western Australia”.

food%>%autoplot(Turnover)

The data shows a strong trend component with some seasonality.

food%>%gg_season(Turnover)

The seasonal plot shows some seasonality - declines in February and increases in December. It looks like the random walk model with drift would be the better choice here.

food.fit<-food%>%model(RW(Turnover ~ drift())) 

food.fit.fc<-food.fit%>%forecast(h=12)

food.fit.fc%>%autoplot(food)+ labs(title="Random Walk with drift - Forecast of Food Takeaway Industry Turnover", subtitle = "h=12 months forecast", xlab="Months", ylab="Food Takeaway Turnover")

5.2. Use the Facebook stock price (data set gafa_stock) to do the following:

Produce a time plot of the series.

fb<-gafa_stock%>%filter(Symbol=='FB')%>%select(Date, Close)

fb%>%autoplot(Close)+labs(title = 'Facebook Stock Closing Price', y = 'Closing Value', x = 'Date')

Produce forecasts using the drift method and plot them.

fb<-fb%>%mutate(Date=row_number())%>%update_tsibble(index=Date,regular=TRUE)

fb_fc<-fb%>%model(RW(Close ~ drift()))%>%forecast(h=20)%>%autoplot(fb)+labs(title='Facebook Stock Price Drift Forecast', y='Closing Price',x='Date')

fb_fc

Show that the forecasts are identical to extending the line drawn between the first and last observations.

fb_fc+geom_segment(aes(xend=Date[length(Date)],yend=Close[length(Close)],x = Date[1],y = Close[1]), linetype='solid', color='Green')

The above plot shows that the forecasted values are along the same straight line drawn from the closing price on the start date to the closing price on the end date of the time series data.

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

fb_fc_new<-fb%>%model(Naive = NAIVE(Close),
Mean=MEAN(Close),
Drift = RW(Close ~ drift()))%>%forecast(h=50)%>%autoplot(fb)+labs(title='Facebook Stock Price Forecasts using different models', y='Closing Price',x='Date')

fb_fc_new

The mean model would not be a good candidate for this time series. While the efficient markets hypothesis would expect the naive model to be the best candidate (assuming all information is reflected in the current price and efficient markets), I feel that the drift model is the best candidate for this time series based on its history.

###5.3. Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production %>%
filter(year(Quarter) >= 1992)

# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))

# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing non-finite values (stat_bin).

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

What do you conclude?

The plot of the innovation residuals does not look like white noise. There seems to be some seasonality in the residuals, and also the lag 4 ACF looks very significant suggesting there is information that has not been extracted from the residuals. Even the .resid plot does not look completely normal, but more bimodal in nature. We can verify the average value of the residuals as follows:

mean(augment(fit)$.innov , na.rm = TRUE)

## [1] -1.571429

This shows that while the mean of the innovation residuals is quite small and close to zero, it is not zero. SO there is a negative drift in the fitted model.

5.4. Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

aus.exports.fit %>% forecast() %>% autoplot(aus.exports)

The residuals for this time series seem to be closer to normality as can be seen from the .resid plot. The lag 1 ACF also looks significant indicating that the residuals are not completely white noise. But compare to the previous time series, this model looks like a better fit for this time series.

# Bricks data
aus.bricks.fit <- aus_production %>% model(SNAIVE(Bricks))

aus.bricks.fit %>% gg_tsresiduals()

## Warning: Removed 24 row(s) containing missing values (geom_path).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing non-finite values (stat_bin).

The residuals show significant left-skewness. The ACF plot shows significant auto-correlation for several lags. In short, this model is not a good fit for this time series. Maybe more transformations can be carried out before re-fitting the model.

aus.bricks.fit %>% forecast() %>% autoplot(aus_production)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 8 row(s) containing missing values (geom_path).

## Warning: Removed 20 row(s) containing missing values (geom_path).

The forecasted values show significant seasonality.

5.7. For your retail time series (from Exercise 8 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(42)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train<-myseries%>% filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) + autolayer(myseries_train, Turnover, colour = "red")

Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

fit <-myseries_train %>% model(SNAIVE(Turnover))

Check the residuals.

fit %>% gg_tsresiduals()

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing non-finite values (stat_bin).

Do the residuals appear to be uncorrelated and normally distributed?

The residuals appear to be highly correlated, and skewed on both sides i.e. fat-tailed.

Produce forecasts for the test data

fc <- fit %>% forecast(new_data = anti_join(myseries, myseries_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc %>% autoplot(myseries)

Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

fc %>% accuracy(myseries)

How sensitive are the accuracy measures to the amount of training data used?

Some of the accuracy metrics like MAE, RMSE, MAPE are not impacted by the amount of training data used.

DS624_HW3_Jagdish

Jagdish Chhabria

9/26/2021

5.1. Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

5.2. Use the Facebook stock price (data set gafa_stock) to do the following:

5.4. Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

5.7. For your retail time series (from Exercise 8 in Section 2.10):