library(knitr)
library(kableExtra)
#install.packages("fpp2")
library(fpp2)
Use the help function to explore what the series gold
, woolyrnq
and gas
represent.
# help(gold)
# help(woolyrnq)
# help(gas)
head(gold)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## [1] 306.25 299.50 303.45 296.75 304.40 298.35
tsdisplay(gold)
head(woolyrnq)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1965 6172 6709 6633 6660
## 1966 6786 6800
tsdisplay(woolyrnq)
head(gas)
## Jan Feb Mar Apr May Jun
## 1956 1709 1646 1794 1878 2173 2321
tsdisplay(gas)
a.Use autoplot() to plot each of these in separate plots.
autoplot(gold) + ggtitle("Daily Morning Gold Prices ($) Jan 1 1985 - Mar 31 1989") +
ylab("$") + xlab("Days")
autoplot(woolyrnq) + ggtitle("Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994") +
ylab("Tons") + xlab("")
autoplot(gas) + ggtitle("Australian monthly gas production: 1956 - 1995") +
ylab("") + xlab("Months")
b.What is the frequency of each series? Hint: apply the frequency() function.
paste0("Gold Frequency is: ", frequency(gold))
## [1] "Gold Frequency is: 1"
paste0("Woolyrnq Frequency is: ", frequency(woolyrnq))
## [1] "Woolyrnq Frequency is: 4"
paste0("Gas Frequency is: ", frequency(gas))
## [1] "Gas Frequency is: 12"
c.Use which.max() to spot the outlier in the gold series. Which observation was it?
#
gold.outlier.when <- which.max(gold)
paste0("gold get maximum value at position: ", gold.outlier.when)
## [1] "gold get maximum value at position: 770"
paste0("gold's maximum value was: ", gold[gold.outlier.when])
## [1] "gold's maximum value was: 593.7"
Download the file tute1.csv from OTexts.org/fpp2/extrafiles/tute1.csv, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read.csv("tute1.csv", header=TRUE)
kable(head(tute1))
X | Sales | AdBudget | GDP |
---|---|---|---|
Mar-81 | 1020.2 | 659.2 | 251.8 |
Jun-81 | 889.2 | 589.0 | 290.9 |
Sep-81 | 795.0 | 512.5 | 290.8 |
Dec-81 | 1003.9 | 614.1 | 292.4 |
Mar-82 | 1057.7 | 647.2 | 279.1 |
Jun-82 | 944.4 | 602.0 | 254.0 |
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
autoplot(mytimeseries, facets=TRUE)
Check what happens when you don’t include facets=TRUE
autoplot(mytimeseries)
When not
including ‘facets=TRUE’,The resultant graph does not subset them into individual plots.
Download some monthly Australian retail data from OTexts.org/fpp2/extrafiles/retail.xlsx. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
autoplot(myts)
ggseasonplot(myts)
ggsubseriesplot(myts)
gglagplot(myts, lags = 12)
ggAcf(myts)
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? This time series data appears to have positive trend due to the ACF AutoCorrelation. It has slowly decreasing positive values over time. It does not appear to have enough seasonal lag, and there is no cyclic trend as well. The overall sales appear to have positive trend over the period of time.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
Let’s explore hsales
head(hsales)
## Jan Feb Mar Apr May Jun
## 1973 55 60 68 63 65 61
autoplot(hsales)
ggseasonplot(hsales)
ggsubseriesplot(hsales)
gglagplot(hsales)
ggAcf(hsales, lag.max = 400)
Per plots above, we spot seasonality and cyclicity. There is no trend in the data. The seasonal component appears to be Spring (March). The cyclic component is indicated by some roughs and crests (around 10yrs period).
Let’s explore usdeaths
head(usdeaths)
## Jan Feb Mar Apr May Jun
## 1973 9007 8106 8928 9137 10017 10826
autoplot(usdeaths)
ggseasonplot(usdeaths)
ggsubseriesplot(usdeaths)
gglagplot(usdeaths)
ggAcf(usdeaths, lag.max = 60)
From the plot above, It seems that July appears to have most number of deaths. We do spot Seasonality in the this usdeaths time series data.
Let’s explore bricksq
head(bricksq)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1956 189 204 208 197
## 1957 187 214
autoplot(bricksq)
ggseasonplot(bricksq)
ggsubseriesplot(bricksq)
gglagplot(bricksq)
ggAcf(bricksq, lag.max = 200)
According to graphic above, we spot strong trend in the bricksq timeseries dataset with little seasonality.
Let’s explore sunspotarea
head(sunspotarea)
## Time Series:
## Start = 1875
## End = 1880
## Frequency = 1
## [1] 213.13333 109.28333 92.85833 22.21667 36.33333 446.75000
autoplot(sunspotarea)
#ggseasonplot(sunspotarea) -Seasonal plots do not work with frequencies of 1
#ggsubseriesplot(sunspotarea) -ggsubseriesplot is not seasonal so useless to draw it
gglagplot(sunspotarea)
ggAcf(sunspotarea, lag.max = 50)
From above graph, there is no clear evidence of trend and seasonality in this time series data. We do spot strong cyclicity in the sunspotarea time series data.
Let’s explore gasoline
head(gasoline)
## Time Series:
## Start = 1991.1
## End = 1991.19582477755
## Frequency = 52.1785714285714
## [1] 6.621 6.433 6.582 7.224 6.875 6.947
autoplot(gasoline)
ggseasonplot(gasoline)
#ggsubseriesplot(gasoline) -cannot draw
gglagplot(gasoline)
ggAcf(gasoline, lag.max = 1000)
In the above graph, we clearly spot seasonality and trend in the gasoline time series data. The lag plot shows some positive corelation which indicates that there is a seasonality component. The sales tend to be positively trending over the period of time but flattening a bit around and after 2005.