The following Assignment 1 will be on from https://otexts.com/fpp2/graphics.html Chapter 2.
autoplot() was used to plot each of these in separate plots as shown below
library(forecast)
library(kableExtra)
library(dplyr)
library(ggplot2)
defaulttheme<-theme(panel.background = element_blank(),
panel.border = element_rect(color = "black", fill=NA))
autoplot(gold)+defaulttheme
autoplot(woolyrnq)+defaulttheme
autoplot(gas)+defaulttheme
The frequency of each gold, woolyrnq and gas are shown respectively below
frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12
as shown below the which.max function identifies 770 as the outlier point in the gold time series object.
which.max(gold)
## [1] 770
read the data, convert to time series and construct a time series plot below
tute1 = read.csv("http://otexts.com/fpp2/extrafiles/tute1.csv", header = T)
head(tute1,5)
## X Sales AdBudget GDP
## 1 Mar-81 1020.2 659.2 251.8
## 2 Jun-81 889.2 589.0 290.9
## 3 Sep-81 795.0 512.5 290.8
## 4 Dec-81 1003.9 614.1 292.4
## 5 Mar-82 1057.7 647.2 279.1
skimr::skim(tute1)
| Name | tute1 |
| Number of rows | 100 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| X | 0 | 1 | 6 | 6 | 0 | 100 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Sales | 0 | 1 | 948.74 | 98.25 | 735.1 | 871.10 | 960.65 | 1018.70 | 1115.5 | <U+2583><U+2583><U+2586><U+2587><U+2585> |
| AdBudget | 0 | 1 | 591.93 | 54.34 | 489.9 | 569.47 | 608.50 | 634.97 | 665.9 | <U+2585><U+2581><U+2582><U+2587><U+2586> |
| GDP | 0 | 1 | 281.18 | 14.37 | 249.3 | 271.35 | 282.60 | 290.30 | 330.6 | <U+2582><U+2587><U+2587><U+2582><U+2581> |
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
autoplot(mytimeseries, facets = T)+defaulttheme
When you dont include the argument facets = T, you obtain a plot as shown below where each of the time series plots within the dataset are presented in the same plot
autoplot(mytimeseries)
library(httr)
url1<-"https://otexts.com/fpp2/extrafiles/retail.xlsx"
GET(url1, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [https://otexts.com/fpp2/extrafiles/retail.xlsx]
## Date: 2021-02-14 21:58
## Status: 200
## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## Size: 639 kB
## <ON DISK> C:\Users\REGIST~1\AppData\Local\Temp\RtmpofpRk0\fileb9483c3d1532.xlsx
retaildata <- readxl::read_excel(tf, skip = 1)
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
The autoplot() function plots the general trend of our dataset and we see that there is a global trend upwards. additionally this upward trend seems to begin to taper off after 2000, but picks back on the upward trend after 2010. additionally, there is a seasonal pattern being observed about 5 times per year. no cyclical trends are observed.
autoplot(myts)+defaulttheme
The ggseasonplot function allows us to exhibit these trends throughout the year and observe exactly how this seasonality occurs and the plot below depicts very clearly that on most years, there is quite an increase in sales during the winter holiday seasons as we would expect
ggseasonplot(myts, year.labels = T, year.labels.left = T)+defaulttheme
the subseriesplot allows us to visualize this trend over the months with a bit more clarity. from the plot below, we observe that there is a very clear differentiation between holiday shopping in comparison to the remainder of the year. the horizontal line represents the mean for each month.
ggsubseriesplot(myts)+defaulttheme
the lag plot below shows the correlation of each season (month) by some lag value of Yt-k for different values of k. We observe that our correlations are strongest at a lag of 12 for all months, but can be scattered for different months throughout different lags
gglagplot(myts)+defaulttheme
The autocorrelation measures the linear relationship between lagged values on a time series plot and our ACFs for our Australian retail dataset generally has strong ACF values. these ACF values are strongest when at the 12 lag interval as mentioned from previous plot. the overall trend does take a decreasing pattern.
ggAcf(myts)+defaulttheme
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
library(fpp2)
## -- Attaching packages ---------------------------------------------- fpp2 2.4 --
## v fma 2.4 v expsmooth 2.3
##
Question6<- function(dataset){
a<-autoplot(dataset)+defaulttheme
b<-ggseasonplot(dataset, year.labels = T, year.labels.left = T)+defaulttheme
c<-ggsubseriesplot(dataset)+defaulttheme
d<-gglagplot(dataset)+defaulttheme
e<-ggAcf(dataset)+defaulttheme
list(a,b,c,d,e)
}
the hsales dataset describes the monthly sales of new one-family houses sold in the USA since 1973.
Based on the plots below, there is no general directional trend observed on the dataset, but there is both seasonality and cycles observed. both the seasonal plot and the subseries plot clearly shows that homesales generally increase in the march-may timeframe and slowly drop over time into the winter with peaks and troughs in-between. the strongest lag correlation occurs with at a 1-seasonal lag period as shown in the lag plot and the autocorrelation plot.
Question6(hsales)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
the bricksq dataset describes the Australian quarterly clay brick production: 1956–1994.
Based on the plots below, there is an increasing general directional trend observed on the dataset that seems to taper off after 1980. There is both seasonality and cycles observed. both the seasonal plot and the subseries plot clearly shows that brick clay production is generally lower during Q1 than other quarters The strongest lag correlation occurs with at a 1-seasonal lag period as shown in the lag plot and the autocorrelation plot.
Question6(bricksq)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
the usdeaths dataset describes the Monthly accidental deaths in USA.
Based on the plots below, there is no general directional trend observed on the dataset, but there is both seasonality and cycles observed. both the seasonal plot and the subseries plot clearly shows that usdeaths generally increase in the warmer months between May and August timeframe and drop over time into the winter with peaks and troughs in-between. the strongest lag correlation occurs with at a 1-seasonal and 6 seasonal lag period as shown in the lag plot and the autocorrelation plot.
Question6(usdeaths)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
the sunspotarea dataset describes annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.
Based on the plots below, there is no general directional trend observed on the dataset and no seasonality but there are cycles observed. the strongest lag correlation occurs with at a 1-lag and 5 lag period as shown in the lag plot and the autocorrelation plot.
autoplot(sunspotarea)+defaulttheme
gglagplot(sunspotarea)+defaulttheme
ggAcf(sunspotarea)+defaulttheme
the gasoline dataset describes the US finished motor gasoline product supplied as Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.
Based on the plots below, there is an increasing general directional trend observed on the dataset with it tapering off after 2007 , there is also no seasonality or cycles observed. both the seasonal plot and the subseries plot clearly show a scatter than seems random and the strongest lag correlation occurs with at a 1-seasonal and 52 seasonal lag period as shown in the lag plot and the autocorrelation plot.
Question6<- function(dataset){
a<-autoplot(dataset)+defaulttheme
b<-ggseasonplot(dataset, year.labels = T, year.labels.left = T)+defaulttheme
d<-gglagplot(dataset)+defaulttheme
e<-ggAcf(dataset)+defaulttheme
list(a,b,d,e)
}
Question6(gasoline)
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]