Libraries Used:

library(lubridate)
library(tsibble)
library(dplyr)
library(tidyverse)
library(fpp3)
library(forecast)

Problem 2.1:

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.

data("aus_production")
?aus_production

data("pelt")
?pelt

data("gafa_stock")
?gafa_stock

data("vic_elec")
?vic_elec

The time interval for Aus_production is Quarterly and it extends from 1956 to 2010. The time interval for pelt is Yearly and it extends from 1845 to 1935. The time interval for gafa_stock is every Business day when the Market is open and it extends from the start of 2014 to the end of 2018. The time interval for vic_elec is every 30 minutes and it extends from 2012 to 2014.

aus_production %>% 
  autoplot(Bricks)

pelt %>% 
  autoplot(Lynx)

gafa_stock %>% 
  autoplot(Close)

# Modifying Axis and Title
vic_elec %>% 
  autoplot(Demand) +
  labs(x = "Date", y = "Demand") +
  ggtitle("Electricity Demand Over Time")

Problem 2.2:

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% group_by(Symbol) %>%
  filter(Close==max(Close)) %>%
  select(Symbol,
         Date,
         Close)
## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

As we can see AAPL had a Max at 232.07 on 10/03/2018. AMZN had a max of 2039.51 on 9/4/2018. FB had a max of 217.50 on 7/25/2018. GOOG had a max of 1268.33 on 7/26/2018.

Problem 2.3

A:

tute1 <- read.csv('https://raw.githubusercontent.com/Jlok17/2022MSDS/main/Source/Data%20624/tute1.csv')
head(tute1)
##      Quarter  Sales AdBudget   GDP
## 1 1981-03-01 1020.2    659.2 251.8
## 2 1981-06-01  889.2    589.0 290.9
## 3 1981-09-01  795.0    512.5 290.8
## 4 1981-12-01 1003.9    614.1 292.4
## 5 1982-03-01 1057.7    647.2 279.1
## 6 1982-06-01  944.4    602.0 254.0

B:

mytimeseries <- tute1 %>% 
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter)

C:

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()+ 
  facet_grid(name ~ ., scales = "free_y")

When Facet_grid is not included all the plots are together on one graph but when it is Added we see that 3 different graphs are formed for each Name: AdBudget, GDP, and Sales.

Problem 2.4:

The USgas package contains data on the demand for natural gas in the US.

Install the USgas package.

Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

A:

#install.packages('USgas')
library(USgas)
data("us_total")
str(us_total)
## 'data.frame':    1266 obs. of  3 variables:
##  $ year : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ state: chr  "Alabama" "Alabama" "Alabama" "Alabama" ...
##  $ y    : int  324158 329134 337270 353614 332693 379343 350345 382367 353156 391093 ...

B:

us_total <- us_total %>%
  rename(natural_gas_consumption_mcf = y)
us_total_tsibble <- us_total %>%
  filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
  as_tsibble(key = state, index = year)
us_total_tsibble
## # A tsibble: 138 x 3 [1Y]
## # Key:       state [6]
##     year state       natural_gas_consumption_mcf
##    <int> <chr>                             <int>
##  1  1997 Connecticut                      144708
##  2  1998 Connecticut                      131497
##  3  1999 Connecticut                      152237
##  4  2000 Connecticut                      159712
##  5  2001 Connecticut                      146278
##  6  2002 Connecticut                      177587
##  7  2003 Connecticut                      154075
##  8  2004 Connecticut                      162642
##  9  2005 Connecticut                      168067
## 10  2006 Connecticut                      172682
## # ℹ 128 more rows

C:

# Plot the annual natural gas consumption
us_total_tsibble %>% autoplot(natural_gas_consumption_mcf)

Problem 2.5:

df5 <-  read.csv('https://raw.githubusercontent.com/Jlok17/2022MSDS/main/Source/Data%20624/tourism.xlsx%20-%20Sheet1.csv')
str(df5)
## 'data.frame':    24320 obs. of  5 variables:
##  $ Quarter: chr  "1998-01-01" "1998-04-01" "1998-07-01" "1998-10-01" ...
##  $ Region : chr  "Adelaide" "Adelaide" "Adelaide" "Adelaide" ...
##  $ State  : chr  "South Australia" "South Australia" "South Australia" "South Australia" ...
##  $ Purpose: chr  "Business" "Business" "Business" "Business" ...
##  $ Trips  : num  135 110 166 127 137 ...
# Converting data type from "Quarter" to Date and "Trips" to numeric
df5 <- df5 %>%
  mutate(Quarter = as.Date(Quarter),
         Trips = as.numeric(Trips))

# Creating a tsibble identical to the tourism one
tsib_df5 <- as_tsibble(df5, key = c(Region, State, Purpose), index = Quarter)


# Combination of Region and Purpose with the maximum number of overnight trips on average
max_avg_trips <- df5 %>%
  group_by(Region, Purpose) %>%
  summarise(avg_trips = mean(Trips)) %>%
  arrange(desc(avg_trips))
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
head(max_avg_trips)
## # A tibble: 6 Ă— 3
## # Groups:   Region [4]
##   Region          Purpose  avg_trips
##   <chr>           <chr>        <dbl>
## 1 Sydney          Visiting      747.
## 2 Melbourne       Visiting      619.
## 3 Sydney          Business      602.
## 4 North Coast NSW Holiday       588.
## 5 Sydney          Holiday       550.
## 6 Gold Coast      Holiday       528.
# Tsibble for Total Trips by State
total_trips_by_state <- df5 %>%
  group_by(State) %>%
  summarise(total_trips = sum(Trips)) %>%
  arrange(desc(total_trips))

head(total_trips_by_state)
## # A tibble: 6 Ă— 2
##   State             total_trips
##   <chr>                   <dbl>
## 1 New South Wales       557367.
## 2 Victoria              390463.
## 3 Queensland            386643.
## 4 Western Australia     147820.
## 5 South Australia       118151.
## 6 Tasmania               54137.

As we can see that New South Wales, Victoria and Queensland have a huge lead over the other places with being tripled and doubled respectively compared to Western Australia.

Problem 2.8

data("PBS")
data("us_employment")
data("us_gasoline")

Employed

us_employment %>% 
  filter(Title == "Total Private") %>% 
  autoplot(Employed) + 
  ggtitle("Autoplot")

us_employment %>%filter(Title == "Total Private") %>% gg_season(Employed) +
  ggtitle("Seasonal Decomposition")

us_employment %>% 
  filter(Title == "Total Private") %>% 
  gg_subseries(Employed) +
  ggtitle("Subseries Plot")

us_employment %>% 
  filter(Title == "Total Private") %>% 
  gg_lag(Employed) +
  ggtitle("Lag Plot")

us_employment %>%
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot() +
  ggtitle("Autocorrelation Function")

In the US Employment data set, there is an upward trend shown in Total Private employment throughout the years. With a big of a decline around 2008 which seems on par with the housing bubble crash. There is a noticeable seasonal pattern characterized by employment growth in the first six months, followed by a decline, and then another surge in employment. The lag plot reveals a robust positive correlation across all lag subplots. I believe that the seasonal decomposition graphic could also be more refined if using the number of employed is taken by the factor population growth.

Bricks

aus_production %>% 
  autoplot(Bricks) +
  ggtitle("Autoplot")

aus_production %>% 
  gg_season(Bricks) +
  ggtitle("Seasonal Decomposition")

aus_production %>% 
  gg_subseries(Bricks) +
  ggtitle("Subseries Plot")

aus_production %>% 
  gg_lag(Bricks) +
  ggtitle("Lag Plot")

aus_production %>% 
  ACF(Bricks) %>% 
  autoplot() + 
  ggtitle("Autocorrelation Function")

For the AUS Production data set, we can see that the production of bricks does not show a distinct trend, but has some very notable seasonality on an annual basis showcasing a cyclic pattern. There seems to have been a significant reduction in bricks production during the early period of the 1980’s. The seasonal plot indicates a modest uptick in bricks production during Q1 and Q3, followed by a decline in Q4. Finally the Lag plot shows a consistent positive correlation season to season.

Pelt:

pelt %>% 
  autoplot(Hare) +
  ggtitle("Autoplot")

pelt %>% 
  gg_subseries(Hare)+
  ggtitle("Subseries Plot")

pelt %>% 
  gg_lag(Hare) +
  ggtitle("Lag Plot")

pelt %>% 
  ACF(Hare) %>% 
  autoplot() + 
  ggtitle("Autocorrelation Function")

For the Pelts data set for Hare, I notice that there isn’t a distinct trend, but it is shown a potential seasonal pattern accompanied by some cyclic behavior.There seem to be sharp fluctuations in the number of traded Hare pelts through a few year periods, with a general decrease as the decade comes to an end. The lag plot illustrates a moderate positive correlation particularly in lag 1.

PBS

PBS %>% 
  filter(ATC2 == "H02")  %>% 
  autoplot(Cost) + 
  ggtitle("Autoplot")

For the PBS data set, I encountered some difficulty plotting with the different function except for autoplot. When analyzing each concession for H02, from autoplot there seems to be seasonality as there is heavy increases and decreases throughout the year as it follows yearly. This seems to be prominent in every single type of Concession which can be inferred as a common business trend for the company. As well as it peaking toward year end can give the assumption of loading within the fiscal year.

Gasoline

us_gasoline %>% 
  autoplot() + 
  ggtitle("Autoplot")
## Plot variable not specified, automatically selected `.vars = Barrels`

us_gasoline %>% 
  gg_season() +
  ggtitle("Seasonal Decomposition")
## Plot variable not specified, automatically selected `y = Barrels`

us_gasoline %>% 
  gg_subseries()+
  ggtitle("Subseries Plot")
## Plot variable not specified, automatically selected `y = Barrels`

us_gasoline %>% 
  gg_lag() +
  ggtitle("Lag Plot")
## Plot variable not specified, automatically selected `y = Barrels`

us_gasoline %>% 
  ACF() %>% 
  autoplot() + 
  ggtitle("Autocorrelation Function")
## Response variable not specified, automatically selected `var = Barrels`

Gasoline Barrels series demonstrates a general positive trend over the time period with some general seasonality however it seems to have lots of noise but in can see some trends of peaks and declines at specific times of the month. The lag plot indicates positive correlation with some over plotting. In general I can’t really see any unusual years with in the data but seems to be reflected by the overplotting.