Data642-HW3

Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case:

Australian Population (global_economy)

p=global_economy %>%
     filter(Country == "Australia")

autoplot(p,Population) +
labs(title = "Population of Australia")

p%>%
model(RW(Population ~ drift())) %>%
       forecast(h = "5 years") %>%
        autoplot(global_economy) +
            labs(y = "Number of People", title = "Population of Australia")

Above motheod allows us to show forecasts to increase over time, where the amount of change over time (called the drift) is set to be the average change seen in the data regarding Population of Australia.

Bricks (aus_production)

aus_production %>%
  
  autoplot(Bricks)

## Warning: Removed 20 row(s) containing missing values (geom_path).

brick<-aus_production %>% 
  filter(!is.na(Bricks))
  
brick %>%
  model(SNAIVE(Bricks ~ lag("year")))%>%
   forecast(h = 10 )%>%
  autoplot(brick)+
labs(title="SNAIVE Forecast ", 
       subtitle = "10 Year Forecast", 
       xlab="Year" )

It is a good choice becasue the data highly seasonal that is reasonable to use Snaive forecast.

NSW Lambs (aus_livestock)

aus_livestock %>%
  distinct(State)

## # A tibble: 8 x 1
##   State                       
##   <fct>                       
## 1 Australian Capital Territory
## 2 New South Wales             
## 3 Northern Territory          
## 4 Queensland                  
## 5 South Australia             
## 6 Tasmania                    
## 7 Victoria                    
## 8 Western Australia

aus_livestock %>%
  distinct(Animal)

## # A tibble: 7 x 1
##   Animal                    
##   <fct>                     
## 1 Bulls, bullocks and steers
## 2 Calves                    
## 3 Cattle (excl. calves)     
## 4 Cows and heifers          
## 5 Lambs                     
## 6 Pigs                      
## 7 Sheep

aus_livestock %>%
  filter(State=="Tasmania" & Animal=="Calves") %>%
  autoplot(Count)

train=aus_livestock %>%
  filter(State=="Tasmania" & Animal=="Calves") 
  

     train_fit=train%>% 
       model(Mean = MEAN(Count),
            `Seasonal Naïve` = SNAIVE(Count),
            `Naïve` = NAIVE(Count),
             Drift= NAIVE(Count~drift()))

train_fit %>% 
  forecast(h = 10)%>%
  autoplot(train)

Because of the quite steady trend, I believe the naive would be appropriate with arguably mean model as a contender.

Household wealth (hh_budget)

hh_budget %>% 
  autoplot(Wealth)

model1 <- hh_budget %>% 
      model( naive = NAIVE(Wealth))

 

 model1 %>% 
  forecast(h = 5)%>%
  autoplot(hh_budget) + 
  labs(title="Drift Forecast", 
       ylab="% of net income")

First I wanted to use RW forecasting , but because overall longer periods, household wealth fluctuates around 5.5, this rules out an upward trend.

The use of NAIVE() forecasting predicts a steady future of wealth assuming the current markets don’t collapse or skyrocket.

Australian takeaway food turnover (aus_retail).

aus_retail%>%
  distinct(State)

## # A tibble: 8 x 1
##   State                       
##   <chr>                       
## 1 Australian Capital Territory
## 2 New South Wales             
## 3 Northern Territory          
## 4 Queensland                  
## 5 South Australia             
## 6 Tasmania                    
## 7 Victoria                    
## 8 Western Australia

aus_retail%>%
  distinct(Industry)

## # A tibble: 20 x 1
##    Industry                                                         
##    <chr>                                                            
##  1 Cafes, restaurants and catering services                         
##  2 Cafes, restaurants and takeaway food services                    
##  3 Clothing retailing                                               
##  4 Clothing, footwear and personal accessory retailing              
##  5 Department stores                                                
##  6 Electrical and electronic goods retailing                        
##  7 Food retailing                                                   
##  8 Footwear and other personal accessory retailing                  
##  9 Furniture, floor coverings, houseware and textile goods retailing
## 10 Hardware, building and garden supplies retailing                 
## 11 Household goods retailing                                        
## 12 Liquor retailing                                                 
## 13 Newspaper and book retailing                                     
## 14 Other recreational goods retailing                               
## 15 Other retailing                                                  
## 16 Other retailing n.e.c.                                           
## 17 Other specialised food retailing                                 
## 18 Pharmaceutical, cosmetic and toiletry goods retailing            
## 19 Supermarket and grocery stores                                   
## 20 Takeaway food services

aus_retail %>%
  filter(State=="Tasmania" & Industry=="Takeaway food services") %>%
  autoplot(Turnover)

train= aus_retail %>%
  filter(State=="Tasmania" & Industry=="Takeaway food services")

     train_fit=train%>% 
       model(`Seasonal Naïve` = SNAIVE(Turnover),
            `Naïve` = NAIVE(Turnover),
             Drift= NAIVE(Turnover~drift()))

train_fit %>% 
  forecast(h = 10)%>%
  autoplot(train)

It seems that the drift model shows the increasing trend. However, the seasonal model captures the seasonality and the naive model conservatively captures the both the seasonality and upward trend which cancel each other out.

2.Use the Facebook stock price (data set gafa_stock) to do the following: Produce a time plot of the series.

meta<-gafa_stock %>%
  filter(Symbol=="FB")

meta%>%autoplot(Close)+labs(y="dollars", title="Facebook Closing prices")

Produce forecasts using the drift method and plot them.

  fbdf <- meta%>%
         mutate(Close = as.numeric(meta$Close)) %>%
         filter_index("2018-01-01" ~ "2019-01-01") %>%
         mutate(day = row_number()) %>%
         update_tsibble(index = day, regular = TRUE) %>%
         select(Date, Close)

        fbdf%>%
         model(RW(Close ~ drift())) %>%
         forecast(h = 180) %>%
         autoplot(fbdf) +
         labs(y = "price", title = "    Forecast Facebook Stock Prices ")

Show that the forecasts are identical to extending the line drawn between the first and last observations.

     fbdf %>%
         model(RW(Close~ drift())) %>%
         forecast(h = 180) %>%
         autoplot(fbdf) +
         geom_line(aes(x = day, y = Close))+
         geom_segment(aes (x = 150, y = 175, xend = 217.5, yend = 124.06 ))+
         labs(y = "price ", title = "Forecast Facebook Stock Prices")

Try using some of the other benchmark functions to forecast the same data set. Which do you think is best? Why?

train <- fbdf %>%
  filter(Date >= as.Date("2017-01-01") & Date <= as.Date("2017-01-31"))



fb2 <- fbdf %>%
  model(
    Mean = MEAN(Close),
    `Naïve` = NAIVE(Close),
    `Seasonal naïve` = SNAIVE(Close ~ lag("month")),
    Drift = NAIVE(Close ~ drift())
  )

## Warning: 1 error encountered for Seasonal naïve
## [1] invalid 'times' argument

f2 <- fb2 %>%
  forecast(data = train)


f2 %>%
  autoplot(fbdf, level = NULL) +
  autolayer(train, Close, colour = "black") +
  labs(y = "dollar",
       title = "Facebook daily stock prices") +
  guides(colour = guide_legend(title = "Forecast"))

## Warning: Removed 2 row(s) containing missing values (geom_path).

In theory or visually for seasonality in the stock price it does not look like it predictability. Moreover, the drift method, while attractive, is highly sensitive to outliers.

Apply a seasonal naïve method to the quarterly Australian beer production data from 1992. Check if the residuals look like white noise, and plot the forecasts. The following code will help.

# Extract data of interest
recent_production <- aus_production%>%
  filter(year(Quarter) >= 1992)


# Define and estimate a model
fit <- recent_production %>% model(SNAIVE(Beer))
# Look at the residuals
fit %>% gg_tsresiduals()

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

## Warning: Removed 4 rows containing non-finite values (stat_bin).

# Look a some forecasts
fit %>% forecast() %>% autoplot(recent_production)

What do you conclude?

library(stats)


Box.test(recent_production$Beer, lag = 24, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  recent_production$Beer
## X-squared = 494.29, df = 24, p-value < 2.2e-16

Significant test indicates that data is probably not white noise. Also it is expected that each auto correlation is close to zero. Additionally, the seasonal naive model consistently overestimates future beer production using the prior year’s production because there is a long term trend decline in beer production.

4.Repeat the previous exercise using the Australian Exports series from global_economy and the Bricks series from aus_production. Use whichever of NAIVE() or SNAIVE() is more appropriate in each case.

Global Economy

recent_exports <- global_economy %>%
filter(Year>= 1960, Country == "Australia")

fit <- recent_exports %>% model(NAIVE(Exports ))
fit %>% gg_tsresiduals()

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing non-finite values (stat_bin).

Box.test(recent_exports$Exports, lag = 24, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  recent_exports$Exports
## X-squared = 398.24, df = 24, p-value < 2.2e-16

fit %>% forecast(h = 5) %>% autoplot(recent_exports)

The residuals indicate a normal distribution leading to believe that forecasts from this method will be reliable. Additionally, the p-value is very small what indicates that the data is not white noise.

Brick Production

bricks <- aus_production %>% 
  select(Bricks)

fit <- bricks %>% model(SNAIVE(Bricks))

fit %>% gg_tsresiduals()

## Warning: Removed 24 row(s) containing missing values (geom_path).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing non-finite values (stat_bin).

Box.test(bricks$Bricks, lag = 12, type = "Ljung")

## 
##  Box-Ljung test
## 
## data:  bricks$Bricks
## X-squared = 1258.1, df = 12, p-value < 2.2e-16

fit %>% forecast(h=5) %>% autoplot(aus_production)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 5 row(s) containing missing values (geom_path).

## Warning: Removed 20 row(s) containing missing values (geom_path).

The naive model below for Bricks production does not work well. It appeared there is a seasonality in data about Q2 and maybe Q4.

7.For your retail time series (from Exercise 8 in Section 2.10):

Create a training dataset consisting of observations before 2011 using

set.seed(666)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries,.vars=Turnover)

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myseries, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

c. Fit a seasonal naïve model using SNAIVE() applied to your training data (myseries_train).

  fit <- myseries_train %>%
          model(SNAIVE(Turnover ~ lag(12)))

d.Check the residuals.

fit %>% gg_tsresiduals()

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing non-finite values (stat_bin).

Do the residuals appear to be uncorrelated and normally distributed?

The residuals are autocorrelated and not normally distributed.

e.Produce forecasts for the test data

fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining, by = c("State", "Industry", "Series ID", "Month", "Turnover")

fc %>% autoplot(myseries)

f. Compare the accuracy of your forecasts against the actual values.

fit %>% accuracy()

## # A tibble: 1 x 12
##   State  Industry   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>      <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New S~ Hardware,~ SNAIV~ Trai~  9.46  26.3  21.2  4.66  12.8     1     1 0.807

fc %>% accuracy(myseries)

## # A tibble: 1 x 12
##   .model  State  Industry  .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>   <chr>  <chr>     <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE~ New S~ Hardware~ Test   78.4  98.6  79.6  17.5  17.9  3.75  3.75 0.936

The accuracy measures from the training set smaller compared.

How sensitive are the accuracy measures to the amount of training data used?

The accuracy measures are very sensitive to the amount of training data used.It can be concluded that the model performs poorly in comparison to the test data which had fewer points. On the other hand, too many points leads to the possibility of introducing irrelevant data to forecasts. But not having enough data leads to under training and missing out on potential long term patterns needed for accurate forecasts.