This document contains the homework problems for the first half of the semester. book 1: https://otexts.com/fpp2/

Week 1 HW Problems

HA 2.1 & 2.3

2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

  • Use autoplot() to plot each of these in separate plots:

  • What is the frequency of each series? Hint: apply the frequency() function.

  • Use which.max() to spot the outlier in the gold series. Which observation was it?

library(fpp2)
library(ggplot2)

#help("gold")
#help("woolyrnq")
#help("gas")

Gold

The Gold data represents the daily morning gold prices in US dollars. The time range for this data runs from Januay 1st 1985 through March 31st 1989.

summary(gold);
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   285.0   337.7   403.2   392.5   443.7   593.7      34
autoplot(gold) +
  ggtitle("Daily Morning Gold Prices") +
  xlab("Day") +
  ylab("Price")

What is the Frequency of Gold?

print(paste0("The Frequency of Gold is ",frequency(gold)))
## [1] "The Frequency of Gold is 1"

Where is the outlier?

print(paste0("Outlier: ", which.max(gold)))
## [1] "Outlier: 770"

Using autoplot,we can see the overall daily trend of gold prices. They appear to be mostly increasing up until a bit before day 800.

The frequency is 1,which is to be expected based on the description of the data. For TS objects, the time series frequency is the time partiton. In our case, our time is partitioned daily.

The data has an outlier at 770. The gold price surges. This would be interesting to investigate what happened on that day. Was there some external force that influenced the price of gold that day?

Wool

The woolyrnq data represents the quarterly production of wollen yar in Australia. This data set runs from March 1965 through September 1994.

summary(woolyrnq)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3324    4882    5466    5658    6646    7819
autoplot(woolyrnq) +
  ggtitle("Quarterly Production of Woollen Yarn in Australia") +
  xlab("Year") +
  ylab("Wool")

print(paste0("The Frequency of Woolyrnq is ",frequency(woolyrnq)))
## [1] "The Frequency of Woolyrnq is 4"

The plot of the time series seems to indicate a mostly decreasing trend. The frequency of our time series is 4, which makes sense since the data is partitioned quarterly. Such a drastic decrease around 1970 is something worth investigating.

Gas

The gas data represents Australian monthly gas production. The data runs from 1956 through 1995.

summary(gas)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1646    2675   16788   21415   38629   66600
autoplot(gas) +
  ggtitle("Australian Mothly Gas Production") +
  xlab("Year") +
  ylab("Gas Produced")

print(paste0("The Frequency of Gas is ",frequency(gas)))
## [1] "The Frequency of Gas is 12"

The trend is indicating a mostly increasing trend, especially aftr 1970. One could speculate if there was a spike in the demand for gas produced. The frequency is 12, meaning monthly time partitons.

2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

  • You can read the data into R with the following script: retaildata <- readxl::read_excel(“retail.xlsx”, skip=1) The second argument (skip=1) is required because the Excel sheet has two header rows.

  • Select one of the time series as follows (but replace the column name with your own chosen column): myts <- ts(retaildata[,“A3349873A”], frequency=12, start=c(1982,4))

  • Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf(). Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Loading in the Data Using the Provided Script

The data will be kept here: https://github.com/vindication09/DATA-624/blob/master/retail.xlsx

library(readxl)

retaildata <- readxl::read_excel("C:/Users/traveler/Downloads/retail.xlsx", skip=1)

head(retaildata)
## # A tibble: 6 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # ... with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, ...

Select a Time Series of your Choosing by Replacing the Column Name

myts <- ts(retaildata[,"A3349350R"], frequency=12, start=c(1982,4))

Explore the Selected Time Series with the Given Functions

autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

autoplot(myts) + 
  ggtitle("A3349350R")+
  xlab("Time") +
  ylab("Sales");

ggseasonplot(myts);

ggsubseriesplot(myts);

gglagplot(myts);

ggAcf(myts)

There appears to be mostly increasing tred with the exception of a slight dip after 2010. The auto plot shows evidence of seasonal changes in the data, evident by the constant fluctiations withi each period. We can use the seaosonal plot to drill down furthur.

The seasonal plot actually shows a spike in consumer spending between from Nov and December. The slope of each spike increases every year. This could be representative of an increasing consumer culture mindset.

The sub series plot is confirming what we have already suspected regarding seasonality especially when it comes to the month of December. We know December to be a major retail month.

Lag plots show negative and positive relationships. We see evidence of trends within the lag plots.