The following Assignment 1 will be on from https://otexts.com/fpp2/graphics.html Chapter 2.

Exercise 2.1

a

autoplot() was used to plot each of these in separate plots as shown below

library(forecast)
library(kableExtra)
library(dplyr)
library(ggplot2)

defaulttheme<-theme(panel.background = element_blank(),
                            panel.border = element_rect(color = "black", fill=NA))

autoplot(gold)+defaulttheme

autoplot(woolyrnq)+defaulttheme

autoplot(gas)+defaulttheme

b

The frequency of each gold, woolyrnq and gas are shown respectively below

frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12

c

as shown below the which.max function identifies 770 as the outlier point in the gold time series object.

which.max(gold)
## [1] 770

Exercise 2.2

read the data, convert to time series and construct a time series plot below

tute1 = read.csv("http://otexts.com/fpp2/extrafiles/tute1.csv", header = T)

head(tute1,5)
##        X  Sales AdBudget   GDP
## 1 Mar-81 1020.2    659.2 251.8
## 2 Jun-81  889.2    589.0 290.9
## 3 Sep-81  795.0    512.5 290.8
## 4 Dec-81 1003.9    614.1 292.4
## 5 Mar-82 1057.7    647.2 279.1
skimr::skim(tute1)
Data summary
Name tute1
Number of rows 100
Number of columns 4
_______________________
Column type frequency:
character 1
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
X 0 1 6 6 0 100 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sales 0 1 948.74 98.25 735.1 871.10 960.65 1018.70 1115.5 <U+2583><U+2583><U+2586><U+2587><U+2585>
AdBudget 0 1 591.93 54.34 489.9 569.47 608.50 634.97 665.9 <U+2585><U+2581><U+2582><U+2587><U+2586>
GDP 0 1 281.18 14.37 249.3 271.35 282.60 290.30 330.6 <U+2582><U+2587><U+2587><U+2582><U+2581>
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)

autoplot(mytimeseries, facets = T)+defaulttheme

When you dont include the argument facets = T, you obtain a plot as shown below where each of the time series plots within the dataset are presented in the same plot

autoplot(mytimeseries)

Exercise 2.3

library(httr)
url1<-"https://otexts.com/fpp2/extrafiles/retail.xlsx"
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [https://otexts.com/fpp2/extrafiles/retail.xlsx]
##   Date: 2021-02-14 21:58
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 639 kB
## <ON DISK>  C:\Users\REGIST~1\AppData\Local\Temp\RtmpofpRk0\fileb9483c3d1532.xlsx
retaildata  <- readxl::read_excel(tf, skip = 1)
myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))

The autoplot() function plots the general trend of our dataset and we see that there is a global trend upwards. additionally this upward trend seems to begin to taper off after 2000, but picks back on the upward trend after 2010. additionally, there is a seasonal pattern being observed about 5 times per year. no cyclical trends are observed.

autoplot(myts)+defaulttheme

The ggseasonplot function allows us to exhibit these trends throughout the year and observe exactly how this seasonality occurs and the plot below depicts very clearly that on most years, there is quite an increase in sales during the winter holiday seasons as we would expect

ggseasonplot(myts, year.labels = T, year.labels.left = T)+defaulttheme

the subseriesplot allows us to visualize this trend over the months with a bit more clarity. from the plot below, we observe that there is a very clear differentiation between holiday shopping in comparison to the remainder of the year. the horizontal line represents the mean for each month.

ggsubseriesplot(myts)+defaulttheme

the lag plot below shows the correlation of each season (month) by some lag value of Yt-k for different values of k. We observe that our correlations are strongest at a lag of 12 for all months, but can be scattered for different months throughout different lags

gglagplot(myts)+defaulttheme

The autocorrelation measures the linear relationship between lagged values on a time series plot and our ACFs for our Australian retail dataset generally has strong ACF values. these ACF values are strongest when at the 12 lag interval as mentioned from previous plot. the overall trend does take a decreasing pattern.

ggAcf(myts)+defaulttheme

Exercise 2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

library(fpp2)
## -- Attaching packages ---------------------------------------------- fpp2 2.4 --
## v fma       2.4     v expsmooth 2.3
## 
Question6<- function(dataset){

  a<-autoplot(dataset)+defaulttheme
  b<-ggseasonplot(dataset, year.labels = T, year.labels.left = T)+defaulttheme
  c<-ggsubseriesplot(dataset)+defaulttheme
  d<-gglagplot(dataset)+defaulttheme
  e<-ggAcf(dataset)+defaulttheme
  

  list(a,b,c,d,e)
}

hsales

the hsales dataset describes the monthly sales of new one-family houses sold in the USA since 1973.

Based on the plots below, there is no general directional trend observed on the dataset, but there is both seasonality and cycles observed. both the seasonal plot and the subseries plot clearly shows that homesales generally increase in the march-may timeframe and slowly drop over time into the winter with peaks and troughs in-between. the strongest lag correlation occurs with at a 1-seasonal lag period as shown in the lag plot and the autocorrelation plot.

Question6(hsales)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

bricksq

the bricksq dataset describes the Australian quarterly clay brick production: 1956–1994.

Based on the plots below, there is an increasing general directional trend observed on the dataset that seems to taper off after 1980. There is both seasonality and cycles observed. both the seasonal plot and the subseries plot clearly shows that brick clay production is generally lower during Q1 than other quarters The strongest lag correlation occurs with at a 1-seasonal lag period as shown in the lag plot and the autocorrelation plot.

Question6(bricksq)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

usdeaths

the usdeaths dataset describes the Monthly accidental deaths in USA.

Based on the plots below, there is no general directional trend observed on the dataset, but there is both seasonality and cycles observed. both the seasonal plot and the subseries plot clearly shows that usdeaths generally increase in the warmer months between May and August timeframe and drop over time into the winter with peaks and troughs in-between. the strongest lag correlation occurs with at a 1-seasonal and 6 seasonal lag period as shown in the lag plot and the autocorrelation plot.

Question6(usdeaths)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

sunspotarea

the sunspotarea dataset describes annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.

Based on the plots below, there is no general directional trend observed on the dataset and no seasonality but there are cycles observed. the strongest lag correlation occurs with at a 1-lag and 5 lag period as shown in the lag plot and the autocorrelation plot.

  autoplot(sunspotarea)+defaulttheme

  gglagplot(sunspotarea)+defaulttheme

  ggAcf(sunspotarea)+defaulttheme

gasoline

the gasoline dataset describes the US finished motor gasoline product supplied as Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.

Based on the plots below, there is an increasing general directional trend observed on the dataset with it tapering off after 2007 , there is also no seasonality or cycles observed. both the seasonal plot and the subseries plot clearly show a scatter than seems random and the strongest lag correlation occurs with at a 1-seasonal and 52 seasonal lag period as shown in the lag plot and the autocorrelation plot.

Question6<- function(dataset){

  a<-autoplot(dataset)+defaulttheme
  b<-ggseasonplot(dataset, year.labels = T, year.labels.left = T)+defaulttheme
  d<-gglagplot(dataset)+defaulttheme
  e<-ggAcf(dataset)+defaulttheme
  

  list(a,b,d,e)
}
Question6(gasoline)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]