library(tidyr)
library(dplyr)
library(knitr)
library(utils)
library(ggplot2)
library(forecast)
library(readxl)
library(fpp2)
Use the help function to explore what the series gold, woolyrnq and gas represent.
Use autoplot() to plot each of these in separate plots.
What is the frequency of each series? Hint: apply the frequency() function.
Use which.max() to spot the outlier in the gold series. Which observation was it?
help(gold)
# Daily morning gold prices in US dollars. 1 January 1985 – 31 March 1989.
head(gold)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## [1] 306.25 299.50 303.45 296.75 304.40 298.35
help(woolyrnq)
# Quarterly production of woollen yarn in Australia: tonnes. Mar 1965 – Sep 1994.
head(woolyrnq)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1965 6172 6709 6633 6660
## 1966 6786 6800
help(gas)
# Australian monthly gas production: 1956–1995.
head(gas)
## Jan Feb Mar Apr May Jun
## 1956 1709 1646 1794 1878 2173 2321
autoplot(gold) +
ggtitle("Daily morning gold prices") +
xlab("1 January 1985 to 31 March 1989") +
ylab("US Dollars")
autoplot(woolyrnq) +
ggtitle("Quarterly production of woollen yarn in Australia") +
xlab("Mar 1965 to Sep 1994") +
ylab("Tonnes")
autoplot(gas) +
ggtitle("Australian monthly gas production") +
xlab("1956 to 1995") +
ylab("production")
# Frequencies:
frequency(gold)
## [1] 1
frequency(gas)
## [1] 12
frequency(woolyrnq)
## [1] 4
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
file1 = "tute1.csv"
file1 = "https://raw.githubusercontent.com/vsinha-cuny/data624/master/hw1/tute1.csv"
tute1 = read.csv(file = file1, header = T)
View(tute1)
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
autoplot(mytimeseries, facets=TRUE)
autoplot(mytimeseries, facets=FALSE)
Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
library(openxlsx)
file2 = "retail.xlsx"
file2 = "https://raw.githubusercontent.com/vsinha-cuny/data624/master/hw1/retail.xlsx"
retaildata <- read.xlsx(file2, sheet=1, startRow=2)
myts <- ts(retaildata[,"A3349873A"], frequency=12, start=c(1982,4))
autoplot(myts) +
ggtitle("Australian retail data")
ggseasonplot(myts)
Seasonality: There is a large jump in sales starting in October and lasting till end of December.
ggsubseriesplot(myts)
The subseries plot shows the mean values of the time series. In this case we see that the values are in an upward trend from October to December.
gglagplot(myts, lags=12)
From the lagplot we see that there is a strong correlation for all lag values. It is strongest for lag=12.
ggAcf(myts)
The ACF plot shows strong autocorrelation. The seasonality is reflected in the strongest ACF being observed at lag=12.
Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.
# Sales of one-family houses
# Description
# Monthly sales of new one-family houses sold in the USA since 1973.
# Usage
# hsales
# Format
# Time series data
# Source
# Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, John Wiley & Sons: New York. Chapter 3.
# References
# US Census Bureau, Manufacturing and Construction Division
# Examples
# plot(hsales)
# plot(stl(hsales,"periodic"),main="Sales of new one-family houses, USA")
autoplot(hsales) +
ggtitle("Monthly sales of new one-family houses sold in the USA since 1973")
ggseasonplot(hsales)
ggsubseriesplot(hsales)
gglagplot(hsales, lags=12)
ggAcf(hsales)
# Accidental deaths in USA
# Description
# Monthly accidental deaths in USA.
# Usage
# usdeaths
# Format
# Time series data
# Source
# Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, John Wiley & Sons: New York. Exercises 2.3 and 2.4.
autoplot(usdeaths) +
ggtitle("Monthly accidental deaths in USA")
ggseasonplot(usdeaths)
ggsubseriesplot(usdeaths)
gglagplot(usdeaths, lags=12)
ggAcf(usdeaths)
# Quarterly clay brick production
# Description
# Australian quarterly clay brick production: 1956–1994.
# Usage
# bricksq
# Format
# Time series data
# Source
# Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications, John Wiley & Sons: New York. Chapter 1 and Exercise 2.3.
# Examples
# plot(bricksq)
# seasonplot(bricksq)
# tsdisplay(bricksq)
autoplot(bricksq) +
ggtitle("Australian quarterly clay brick production: 1956–1994")
ggseasonplot(bricksq)
ggsubseriesplot(bricksq)
gglagplot(bricksq, lags=12)
ggAcf(bricksq)
# Annual average sunspot area (1875-2015)
# Description
# Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.
# Format
# Annual time series of class ts.
# Source
# NASA
# Examples
# autoplot(sunspotarea)
autoplot(sunspotarea) +
ggtitle("Annual average sunspot area (1875-2015)")
# ggseasonplot(sunspotarea)
# ggsubseriesplot(sunspotarea)
# gglagplot(sunspotarea)
ggAcf(sunspotarea)
# US finished motor gasoline product supplied.
# Description
# Weekly data beginning 2 February 1991, ending 20 January 2017. Units are "million barrels per day".
# Format
# Time series object of class ts.
# Source
# US Energy Information Administration.
# Examples
# autoplot(gasoline, xlab="Year")
autoplot(gasoline) +
ggtitle("US finished motor gasoline product supplied")
ggseasonplot(gasoline)
# ggsubseriesplot(gasoline)
gglagplot(gasoline, lags=12)
ggAcf(gasoline)