CUNY DATA624 Homework 1

Question 2.1

1) Use the help function to explore what the series gold, woolyrnq and gas represent.

B. What is the frequency of each series? Hint: apply the frequency() function.

## [1] 1

The frequency for the gold dataset is Annual

## [1] 4

The frequency for the woolyrnq dataset is Quarterly

## [1] 12

The frequency for the gas dataset is Monthly

C. Use which.max() to spot the outlier in the gold series. Which observation was it?

## [1] 770

The outlier in the gold series occurs on day 770.

Question 2.2

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

A. You can read the data into R with the following script:

B. Convert the data to time series

C. Construct time series plots of each of the three series

Check what happens when you don’t include facets=TRUE.

The plot is no longer a facet grid and plots everying on the same plot/graph. This provides a clearer picture of how the three series compare.

Question 2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

A. You can read the data into R with the following script:

The second argument (skip=1) is required because the Excel sheet has two header rows.

B. Select one of the time series as follows (but replace the column name with your own chosen column):

C. Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf()

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

From an initial first look at the time series plot above, there is certainly a seasonal trend, but no cyclic behavior. There’s also an obvious upward trend in the data.

The graphs above show clear seasonality where there appears to be a peak around the holidays, which probably makes sense with food retail.

The above lag plot tells us that our data is not random and suggests that there is a clear pattern, which was also made obvious in the previous plots

The ACF plot above indicates that there is no white noise in the data and there is plenty to go on to build a forecasting model

Question 2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

- Can you spot any seasonality, cyclicity and trend?

- What do you learn about the series?

hsales Sales of one-family houses

The plots above tell us that the data exhibits both seasonal and cyclic behavior. There is no identifiable trend in either direction. The seasonality is clear from the autoplot() function and confirmed from both ggseasonplot() and ggsubseriesplot(). You can see that the season peaks at the start of spring and tapers off for the slow season over the late fall and winter. You can also tell from the aforementioned plots that there is a cyclic nature in the data, especially from the autoplot() where there were a couple down cycles in the early 80s and early 90s and up cycles in the late 70s, late 80s and late 90s. The ggAcf() function shows us that there isn’t a lot of white noise in the data, indicating that a decent predictive model could be developed from it.

usdeaths Accidental deaths in USA

Much like the previous dataset, the usdeaths dataset appears very seasonal, but doesn’t appear to have a trend. The data also doesn’t appear to be cyclic either. There doesn appear to be a higher number of accidental deaths in 1973, the rest of the years afterwards appear fairly stable with little fluctuation. Generally, you can see that the most accidental deaths occur in the warmer months where people are out and about and probably getting hurt from outdoor activities, whereas in the winter months, most people remain in the safety of their homes reducing their risk. The relationship appears very strong in lags 1 and 12 and there is little white noise as evidenced from the ACF plot.

bricksq Quarterly clay brick production

This data has a very obvious upward trend, but does appear to plateau around the mid 70s where it becomes very cyclic with steep drop offs in the mid 70s and mid 80s. There is a little seasonality where brick production peaks in the 3rd quarter, which probably correlates with housing sales, but that inference is for another time and another question. The lag plots exhibits very strong correlation and the ACF plot indicates that there is no white noise in the data.

sunspotarea Annual average sunspot area (1875-2015)

## <simpleError in ggseasonplot(sunspotarea): Data are not seasonal>
## <simpleError in ggsubseriesplot(sunspotarea): Data are not seasonal>

The data is clearly cyclic with no obvious trend. This cyclic nature is known as the solar cycle where the sun flips it’s magnetic poles about every 11 years (https://spaceplace.nasa.gov/solar-cycles/en/). From the plots above, you can tell that this appears to be true. There is no seasonality with the data as shown by the error messages from ggseasonplot() and ggsubseriesplot(). It’s also clear that it’s not seasonal due to the length of the upward and downward swings, which occurs over a period of 11 years. There is little white noise in the data inferring that this data is very predictable.

gasoline US finished motor gasoline product supplied

## <simpleError in ggsubseriesplot(gasoline): Each season requires at least 2 observations. This may be caused from specifying a time-series with non-integer frequency.>

From the data, you can see that there is an obvious upward trend and clear seasonality. There does appear to by minor cyclic behavior from 2005-2015. It would be interesting to see this data updated to 2020. There is peak supply during the summer months with drop offs in the late fall and winter months. There is no white noise in the data and seems very predictable. I am curious to see if the drop off in the late 2000s had to due with rising fuel efficiency standards and also curious to see what the rebound was about around 2015.

Chester Poon

1/31/2020