Import Packages that are needed. I will always import tidyverse and ggplot even if im unsure as it is generally very useful.

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(forecast)

## Warning: package 'forecast' was built under R version 4.3.2

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(fma)

## Warning: package 'fma' was built under R version 4.3.2

library(fpp2)

## Warning: package 'fpp2' was built under R version 4.3.2

## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ expsmooth 2.3

## Warning: package 'expsmooth' was built under R version 4.3.2

##

Question 1:

By using the help function, it can be determined that the woolyrnq dataset has the quarterly production of woollen yarn in Australia from 1965 to 1994. The gold dataset contains the daily morning gold prices in dollars from 1985 to 1989. The gas dataset contains the Australian monthly gas production from 1956 to 1995.

help(woolyrnq)

## starting httpd help server ... done

help(gold)
help(gas)

The gold dataset looks as though it contains cyclical and (maybe) seasonal patterns as well as trends

autoplot(gold)

The woolyrnq dataset looks like it contains cyclical patterns

autoplot(woolyrnq)

The gas plot looks as though it contains a cyclical and seasonal pattern as well as an upward trend throughout time.

autoplot(gas)

Based on the frequency function, we can see that the gold dataset has one observation per day, woolyrnq has 4 observations per year, and gas has 12 observations per year.

paste("Gold Frequency:", frequency(gold))

## [1] "Gold Frequency: 1"

paste("woolyrnq Frequency:", frequency(woolyrnq))

## [1] "woolyrnq Frequency: 4"

paste("Gas Frequency:",frequency(gas))

## [1] "Gas Frequency: 12"

We can see that the outlier in the gold data set is on day 770

paste("The outlier is on day: ", which.max(gold))

## [1] "The outlier is on day:  770"

Question 2:

The book link for this dataset was broken so I read it in from a github source that I found. Viewing it shows that there is a date column, sales, Ad budget, and GDP column.

tute1 <- read_csv("https://raw.githubusercontent.com/nealxun/forecasting_principle_and_practices/master/extrafiles/tute1.csv")

## New names:
## Rows: 100 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (3): Sales, AdBudget, GDP
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

view(tute1)

The [,-1] will remove the first column as its unneeded. When plotted without facets, the different series (sales,adbudget, and GDP) are plotted on the same plot with one scale

mytimeseries_no_facets <- ts(tute1[,-1], start = 1981, frequency = 4)
autoplot(mytimeseries_no_facets)

When facets = TRUE the different time series are plotted on individual plots with custom scales for each one.

mytimeseries <- ts(tute1[,-1], start = 1981, frequency = 4)
autoplot(mytimeseries, facets = TRUE)

Question 3:

Read data into dataframe

retaildata <- readxl::read_excel("retail.xlsx", skip = 1)

Look at different columns

head(retaildata)

## # A tibble: 6 × 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## 3 1982-06-01 00:00:00      298       40.3      62.7      401       62.3
## 4 1982-07-01 00:00:00      308.      40.9      65.6      414.      68.2
## 5 1982-08-01 00:00:00      299.      42.1      62.6      404.      66  
## 6 1982-09-01 00:00:00      305.      42        64.4      412.      62.3
## # ℹ 184 more variables: A3349336V <dbl>, A3349337W <dbl>, A3349397X <dbl>,
## #   A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>, A3349790V <dbl>,
## #   A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>, A3349873A <dbl>,
## #   A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>, A3349789K <dbl>,
## #   A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>, A3349799R <dbl>,
## #   A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>, A3349416V <dbl>,
## #   A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>, A3349727C <dbl>, …

I arbitrarily chose “A3349468W” as my timeseries data.

myts <- ts(retaildata[,"A3349468W"], frequency = 12, start = c(1982, 4))

Data exploration

autoplot(myts)

There does appear to be a seasonal element to the data. From November to December there is an uptick in sales from November to December, before it drops off in January.

ggseasonplot(myts)

The subseries plot shows the increase in sales in December as well.

ggsubseriesplot(myts)

The ACF plot has peaks and troughs, which is evident of a seasonal trend. The over decrease in size of the bars for the ACF is due to the upward trend of the data.

ggAcf(myts)

gglagplot(myts)

Question 4:

Information on all the datasets can be gathered from this code block

help(bicoal)
help(chicken)
help(dole)
help(usdeaths)
help(lynx)
help(goog)
help(writing)
help(fancy)
help(a10)
help(h02)

All time series are plotted.The google time series had its axis modified using the ggtitle, ylab and xlab functions.

autoplot(bicoal)

autoplot(chicken)

autoplot(dole)

autoplot(usdeaths)

autoplot(lynx)

autoplot(goog) + ggtitle("Google Stock Price") + ylab("Closing Price") + xlab("Day")

autoplot(writing)

autoplot(fancy)

autoplot(a10)

autoplot(h02)

ggAcf(a10)

# Question 5

The season trends for all for plots can be seen using the ggseasonplot function. The writing time series has a seasonality effect in August, where sales strongly decrease, before returning to normal in september. For the fancy dataset, monthly sales of souvenirs increase in December before dropping in January. The A10 plot has weak seasonality, with expenditures decreasing from January to February. There is an unusual year in 2008, as the data is incomplete. The h02 dataset has expenditures increasing from February to December before decreasing from January to February.

ggseasonplot(writing)

ggseasonplot(fancy)

ggseasonplot(a10)

ggseasonplot(h02)

The subseriesplot confirms these trends that were noted.

ggsubseriesplot(writing)

ggsubseriesplot(fancy)

ggsubseriesplot(a10)

ggsubseriesplot(h02)

hw_week2_steve_phillips

Steve Phillips

2024-02-02

Import Packages that are needed. I will always import tidyverse and ggplot even if im unsure as it is generally very useful.

Question 1:

The gold dataset looks as though it contains cyclical and (maybe) seasonal patterns as well as trends

The woolyrnq dataset looks like it contains cyclical patterns

The gas plot looks as though it contains a cyclical and seasonal pattern as well as an upward trend throughout time.

Based on the frequency function, we can see that the gold dataset has one observation per day, woolyrnq has 4 observations per year, and gas has 12 observations per year.

We can see that the outlier in the gold data set is on day 770

Question 2:

The book link for this dataset was broken so I read it in from a github source that I found. Viewing it shows that there is a date column, sales, Ad budget, and GDP column.

The [,-1] will remove the first column as its unneeded. When plotted without facets, the different series (sales,adbudget, and GDP) are plotted on the same plot with one scale

When facets = TRUE the different time series are plotted on individual plots with custom scales for each one.

Question 3:

Read data into dataframe

Look at different columns

I arbitrarily chose “A3349468W” as my timeseries data.

Data exploration

There does appear to be a seasonal element to the data. From November to December there is an uptick in sales from November to December, before it drops off in January.

The subseries plot shows the increase in sales in December as well.

The ACF plot has peaks and troughs, which is evident of a seasonal trend. The over decrease in size of the bars for the ACF is due to the upward trend of the data.

Question 4:

Information on all the datasets can be gathered from this code block

All time series are plotted.The google time series had its axis modified using the ggtitle, ylab and xlab functions.

The subseriesplot confirms these trends that were noted.

Question 8:

1.) B: Daily temperature of Cow matches with B

2.) A: Monthly accidental deaths matches A

3.) D: Monthly Air passengers matches with D

4.) C: Annual mink trappings matches with C