CUNY 624 Homework 1

Joel Park

library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: fma
## Loading required package: expsmooth

2.1 Use the help function to explore what the series gold, woolyrnq and gas represent.

?gold
?woolyrnq
?gas
  1. Use autoplot() to plot each of these in separate plots.
autoplot(gold) + ggtitle("Daily Morning Gold Prices ($) Jan 1 1985 - Mar 31 1989") +
  ylab("$") + xlab("Days")

autoplot(woolyrnq) + ggtitle("Quarterly Production of Woolen Yarn in Australia") +
  ylab("Tons") + xlab("")

autoplot(gas) + ggtitle("Australian monthly gas production: 1956 - 1995") +
  ylab("") + xlab("Months")

  1. What is the frequency of each series? Hint: apply the frequency() function.
# gold
paste0("Frequency 'Gold': ", frequency(gold))
## [1] "Frequency 'Gold': 1"
# woolyrnq
paste0("Frequency 'Woolyrnq': ", frequency(woolyrnq))
## [1] "Frequency 'Woolyrnq': 4"
# gas
paste0("Frequency 'Gas': ", frequency(gas))
## [1] "Frequency 'Gas': 12"
  1. Use which.max() to spot the outlier in the gold series. Which observation was it?
which.max(gold)
## [1] 770

It is observation number 770.

2.2 Down the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget, and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:
tute1 <- read.csv("https://otexts.com/fpp2/extrafiles/tute1.csv", header=TRUE)
View(tute1)
  1. Convert the data to time series.
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)

(The [,-1] removes the first column which contains the quarters as we don’t need them now.)

  1. Construct time series plots of each of the three series.
autoplot(mytimeseries, facets=TRUE)

Check what happens when you don’t include facets=TRUE.

autoplot(mytimeseries, facets=FALSE)

The graph does not subset them into individual plots.

2.3 Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

  1. You can read the data into R with the following script:
retaildata <- readxl::read_excel("retail.xlsx", skip=1)

The second argument (skip=1) is required because the Excel sheet has two header rows.

  1. Select one of the time series as follows (but replace the column name with your own chosen column):
# Category: Turnover ;  Western Australia ;  Furniture, floor coverings, houseware and textile goods retailing
myts <- ts(retaildata[,"A3349661X"],
           frequency=12, start=c(1982,4))
  1. Explore your chosen retail time series using the following functions:

autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

autoplot(myts) + ggtitle("Turnover ;  Western Australia ;  Furniture, floor coverings, houseware and textile goods retailing")

ggseasonplot(myts)

ggsubseriesplot(myts)

gglagplot(myts)

ggAcf(myts)

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Reference: https://otexts.com/fpp2/autocorrelation.html Reference: https://otexts.com/fpp2/tspatterns.html

“Trend: A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend as “changing direction”, when it might go from an increasing trend to a decreasing trend.”

“Seasonal: A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known frequency. The monthly sales of antidiabetic drugs above shows seasonality which is induced partly by the change in the cost of the drugs at the end of the calendar year.”

“Cyclic: A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions, and are often related to the “business cycle”. The duration of these fluctuations is usually at least 2 years.”

“When data have a trend, the autocorrelations for small lags tend to be large and positive because observations nearby in time are also nearby in size. So the ACF of trended time series tend to have positive values that slowly decrease as the lags increase.

When data are seasonal, the autocorrelations will be larger for the seasonal lags (at multiples of the seasonal frequency) than for other lags.

When data are both trended and seasonal, you see a combination of these effects."

From the textbook reference, this data appears to have a positive trend due to the ACF trended time series that have slowly decreasing positive values over time. There does not seem to be a seasonal lag as evidenced in the ACF. There does not seem to be an obvious cyclical trend, though perhaps the data presented only demonstrated a portion of the cycling that is not quite so evident to us.

It seems that overall that the sales have been overall positively increasing (trend) over the course of multiple years.

2.6 Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

From hsales:

Monthly sales of new one-family houses sold in the USA since 1973.

head(hsales)
##      Jan Feb Mar Apr May Jun
## 1973  55  60  68  63  65  61
autoplot(hsales)

ggseasonplot(hsales)

ggsubseriesplot(hsales)

gglagplot(hsales)

ggAcf(hsales)

The data suggests that spring tends to sell the most amount of homes and that there may be some seasonal component with it. As the autocorrelation suggests, there appears to be a positive autocorrelation around 0, 12, 24. There may be a cyclical component as there appear to be some peaks and troughs cycling approximately every 10 years or so. However, there does not seem to be an obvious positive or negative trend to the data.

I suspect that people tend to look for homes in the beginning of the year and close on a purchase by spring time. Why the beginning of the year? Perhaps for tax reasons, or homes are difficult to sell during the winter with the snow, etc. We could certainly dig into why this trend may be occurring.

From usdeaths: Monthly accidental deaths in USA.

head(usdeaths)
##        Jan   Feb   Mar   Apr   May   Jun
## 1973  9007  8106  8928  9137 10017 10826
autoplot(usdeaths)

ggseasonplot(usdeaths)

ggsubseriesplot(usdeaths)

gglagplot(usdeaths)

ggAcf(usdeaths)

There certainly is a seasonal component (but no cyclical component). The number of accidental deaths occur most likely in the summer time. This correlates with the ACF and the lab plot. The lag plot shows the most positive correlation at lag 12 which corresponds to the 12th month. It probably isnt surprising that more people are out during the summer time and perhaps may be engaging in more activities i.e. driving, sports, etc. that could lead to accidental deaths.

From bricksq: Australian quarterly clay brick production: 1956-1994

head(bricksq)
##      Qtr1 Qtr2 Qtr3 Qtr4
## 1956  189  204  208  197
## 1957  187  214
autoplot(bricksq)

ggseasonplot(bricksq)

ggsubseriesplot(bricksq)

gglagplot(bricksq)

ggAcf(bricksq)

There is certainly a positive trend that appears to flatten out somewhat by year 1980. There is no obvious cyclical or seasonal component to this as noted in the lag plot and ACF. The subseries plots also do not show significant difference between the quarters.

The demand for brick clay was likely rising in the late 20th century. However, perhaps with new synthetic or building material (or perhaps even stylistic), there has not been a significant brick clay production.

From sunspotarea: Annual averages of the daily sunspot areas (in units of millionths of a hemisphere) for the full sun. Sunspots are magnetic regions that appear as dark spots on the surface of the sun. The Royal Greenwich Observatory compiled daily sunspot observations from May 1874 to 1976. Later data are from the US Air Force and the US National Oceanic and Atmospheric Administration. The data have been calibrated to be consistent across the whole history of observations.

head(sunspotarea)
## Time Series:
## Start = 1875 
## End = 1880 
## Frequency = 1 
## [1] 213.13333 109.28333  92.85833  22.21667  36.33333 446.75000
autoplot(sunspotarea)

#ggseasonplot(sunspotarea) # Seasonal plots do not work with frequencies of 1
#ggsubseriesplot(sunspotarea)
gglagplot(sunspotarea)

ggAcf(sunspotarea)

The data frequency is yearly. While there certainly does not appear to be a trend, there does seem to be a cyclical component that appears to occur almost every 10 years on the micro level. This data is insufficient to determine seasonality (but appears unlikely to be so given that the data is yearly).

It is possible that approximately every 10 years, the sun and earth are positioned in such a way that certain areas of these two bodies may be orienting their magnetic poles such that these spots occur either more or less frequently.

From gasoline: Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.

head(gasoline)
## Time Series:
## Start = 1991.1 
## End = 1991.19582477755 
## Frequency = 52.1785714285714 
## [1] 6.621 6.433 6.582 7.224 6.875 6.947
autoplot(gasoline)

ggseasonplot(gasoline)

#ggsubseriesplot(gasoline)
gglagplot(gasoline)

ggAcf(gasoline)

Gasoline sales tend to be positively trending with some flattening (or perhaps cycling) from 2005 and onwards. Though not obvious cyclical, it is certainly possible that the cycling may occur over the course of multiple decades and that there is not enough evidence to demonstrate it. The lag plot shows positive correlation throughout and the ACF shows some “scalloping” which suggests that there may be some seasonal component as well.