Please submit exercises 2.1, 2.2, 2.3 and 2.6 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.
Use the help function to explore what the series gold, woolyrnq and gas represent.
help(gold) # Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.
help(woolyrnq) # Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.
help(gas) # Australian monthly gas production: 1956–1995.
head(gold)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## [1] 306.25 299.50 303.45 296.75 304.40 298.35
head(woolyrnq)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1965 6172 6709 6633 6660
## 1966 6786 6800
head(gas)
## Jan Feb Mar Apr May Jun
## 1956 1709 1646 1794 1878 2173 2321
library(ggplot2)
autoplot(gold)+
ggtitle("Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989")+
xlab("1 January 1985 - 31 March 1989")+
ylab("US dollars")
autoplot(woolyrnq)+
ggtitle("Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994")+
xlab("Mar 1965 – Sep 1994")+
ylab("Production")
autoplot(gas)+
ggtitle("Australian monthly gas production: 1956–1995")+
xlab("1956–1995")+
ylab("Production")
frequency(gold)
## [1] 1
frequency(woolyrnq)
## [1] 4
frequency(gas)
## [1] 12
which.max(gold)
## [1] 770
The 770th row has the maximum value which is considered as an outlier.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read.csv("https://otexts.com/fpp2/extrafiles/tute1.csv", header=TRUE)
str(tute1)
## 'data.frame': 100 obs. of 4 variables:
## $ X : Factor w/ 100 levels "Dec-00","Dec-01",..: 57 32 82 7 58 33 83 8 59 34 ...
## $ Sales : num 1020 889 795 1004 1058 ...
## $ AdBudget: num 659 589 512 614 647 ...
## $ GDP : num 252 291 291 292 279 ...
View(tute1)
## Warning in system2("/usr/bin/otool", c("-L", shQuote(DSO)), stdout = TRUE):
## running command ''/usr/bin/otool' -L '/Library/Frameworks/R.framework/Resources/
## modules/R_de.so'' had status 1
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
(The [,-1] removes the first column which contains the quarters as we don’t need them now.)
autoplot(mytimeseries, facets=TRUE)
Check what happens when you don’t include facets=TRUE.
autoplot(mytimeseries)
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
The second argument (skip=1) is required because the Excel sheet has two header rows.
myts <- ts(retaildata[,"A3349882C"], frequency = 12, start =c(1982,4))
head(myts)
## Apr May Jun Jul Aug Sep
## 1982 139.3 136.0 143.5 150.2 144.0 146.9
View(myts)
## Warning in system2("/usr/bin/otool", c("-L", shQuote(DSO)), stdout = TRUE):
## running command ''/usr/bin/otool' -L '/Library/Frameworks/R.framework/Resources/
## modules/R_de.so'' had status 1
autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
autoplot(myts)+
ggtitle("A3349882C")+
xlab("April 1982 to December 2013")+
ylab("Sales")
ggseasonplot(myts)
ggsubseriesplot(myts)
gglagplot(myts)
ggAcf(myts)
The plot shows a trend of increase in sales, especially between 2008 to 2010. The seasonality is becoming more obvious compare to the 80s and 90s; it shows increase in sales in Feburary, August, and November. The spike is also incrasing year by year, and it hits the peak in early December. It makes sense beacause we have the biggest holidays- Thanks giving and Christmas, and las plot tells a negatie relationship.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
hsales
autoplot(hsales)
ggseasonplot(hsales)
gglagplot(hsales)
ggAcf(hsales)
The dataset shows, a seasonal pick sales exist between Feburary and March, and lower sales in rest of year. We can see this clearly in the seasonal plot. However, there is no clear trend, and cyclicity is showing in auoto and seasonal plot. It seems the sales was good at 70s and decreased dramatically at 80s.
usdeaths
autoplot(usdeaths)
ggseasonplot(usdeaths)
gglagplot(usdeaths)
ggsubseriesplot(usdeaths)
ggAcf(usdeaths)
The plot shows it has seasonality and no trend , no cyclicity.
bricksq
autoplot(bricksq)
ggseasonplot(bricksq)
ggsubseriesplot(bricksq)
gglagplot(bricksq)
ggAcf(bricksq)
Q2 and Q3 have higher production compare to Q1, Q4. during the recession between 1982 to 1983, the brick production drops dramatically. It has increase in trending and week seasonality. It shows a positive relationships in lag with seasonality and trend.
sunspotarea
autoplot(sunspotarea)
#ggseasonplot(sunspotarea)
#ggsubseriesplot(sunsportarea)
gglagplot(sunspotarea)
ggAcf(sunspotarea)
It has cyclicity, however it is not a seasonal data.
gasoline
autoplot(gasoline)
ggseasonplot(gasoline)
#ggsubseriesplot(gasoline)
gglagplot(gasoline)
ggAcf(gasoline)
The plot has seasonality and trend, and seasonal length is too long to make the lagplot useful.