The midterm investigates three time series datasets, simulated white noise, hh_budget and aus_retail. The questions ask about the ACF, decomposition methods, forecasting, and the use of training and test datasets to measure forecast accuracy.
library(tsibble)
Attaching package: 'tsibble'
The following objects are masked from 'package:base':
intersect, setdiff, union
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ lubridate::interval() masks tsibble::interval()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(ggplot2)
Question 1
Simulate a white noise time series with 250 data points. Plot the time series and the ACF of the time series. Is there are trend? Is there a seasonal pattern? Are there any meaningful statistically significant correlations?
Use a seed of 1234.
Answer
The White Noise time series’ narrative lacks any discernible seasonal rhythms or trends. Moreover, no statistically significant relationships are visible in this ACF plot. In this randomly produced series, there are no discernible statistically significant associations.
Provide your code here.
set.seed(1234)wn <-rnorm(250, mean =0, sd =1)plot(wn, type ="l", xlab ="Time", ylab ="White Noise Time Series")
acf(wn)
Question 2
Try the X11, SEATS, and STL Decomposition methods on the Household Budget data, hh_budget to estimate the tends in Wealth for the different countries in the dataset.
Which methods work? If not, why does the method fail?
Is there are seasonal component in these times series?
Answer
X11, SEATS do not work as the observation is by year.they can work only with monthly or quarterly data.STL will work and we can choose the tend window.
Provide your code here.
Hint: Plot the time series, set up the model, etc.
There is no seasonal component or clear pattern. It fluctuates.In all the countries increasing trend is observed.
Question 3
Try different forecasting methods to forecast 12 steps into the future the Turnover in the Liquor Industry in New South Wales, Australia using the aus_retail dataset.
Try all of the methods and determine a best method by visual inspection of forecasts for one year.
Now split the data into training and testing subsets of the data. Use the data until Jan 2019 as the training data. Using the method you have selected measure its error for forecasting the testing data, which is 2020 data.
Provide your code here.
aus_retail_sw <- aus_retail %>%filter(State =="New South Wales"&str_detect(Industry, "^L"))aus_retail_sw %>%autoplot(Turnover)
retail_fit <- aus_retail_sw %>%model(Mean =MEAN(Turnover),Naive =NAIVE(Turnover),SNaive =SNAIVE(Turnover),Drift =RW(Turnover ~drift()),TSLM =TSLM(Turnover ~trend()),TSLM_S =TSLM(Turnover ~trend() +season()) )retail_fit %>%forecast(h =12) %>%autoplot(aus_retail_sw, level =NULL) +labs(y ="$Million AUD",title ="Turnover in the Liquor Industry in New South Walves, Australia") +guides(colour =guide_legend(title ="Forecast"))
Seasonal Naive and TSLM with seasonal forecasts appear to be accurate in this situation based on forecast graphs. But, if forced to pick just one, seasonal Ivaive is evidently the greatest approach because of how closely its trajectory resembles earlier seasons.
Splitting the data
train <- aus_retail_sw %>%filter(year(Month) <=2017)test <- aus_retail_sw %>%filter(year(Month) <=2017)#check if data have been split appropriatelyautoplot(aus_retail_sw, Turnover) +autolayer(train, Turnover, colour ="blue") +ggtitle("check if data have been split appropriately")
The residuals plot in the ACH indicates a normal distribution and there is autocorrelation between the lattice parameters. SNAIVE is a good decision right now.
#forecastfc1 <- liquor_fit %>%forecast(h =12)#fc1 %>%# autoplot(bind_rows(train, test),# level = NULL) +# guides(colour = guide_legend(title = "Forecast")) +# ggtitle("forecasting 2018 vs actual 2018 with SNAIVE")
# A tibble: 4 × 12
.model State Indus…¹ .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SNaive New … Liquor… Test 9.82 17.8 14.6 3.57 4.73 1.55 1.40 0.414
2 TSLM_S New … Liquor… Test 22.2 34.1 22.2 6.36 6.36 2.36 2.67 0.0212
3 Drift New … Liquor… Test -188. 194. 188. -63.8 63.8 20.0 15.2 0.0923
4 TSLM New … Liquor… Test 21.6 53.4 25.7 5.29 6.76 2.73 4.18 0.101
# … with abbreviated variable name ¹Industry
We can corroborate the prior visually drawn conclusion that SNAIVE is the best forecasting method in this instance by looking at the validation table, which shows that Seasonal NAIVE has the best performance in all accuracy measurements (RIMSE, MAL, MAPE, etc.).