Homework 1

HW1: 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.2
## ✔ lubridate   1.9.3     ✔ fable       0.5.0
## ✔ ggplot2     3.5.1

## Warning: package 'dplyr' was built under R version 4.4.2

## Warning: package 'ggplot2' was built under R version 4.4.2

## Warning: package 'tsibble' was built under R version 4.4.3

## Warning: package 'tsibbledata' was built under R version 4.4.3

## Warning: package 'feasts' was built under R version 4.4.3

## Warning: package 'fabletools' was built under R version 4.4.3

## Warning: package 'fable' was built under R version 4.4.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)

2.1

Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?
Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.

autoplot(aus_production, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

Bricks is the clay brick production counted in the millions. The time intervals are quarterly.

autoplot(pelt, Lynx)

Lynx is the number of canadian lynx pelts traded. The time interval is annual.

autoplot(gafa_stock, Close)

Gafa stock contains stock prices from 2015 to 2018 from google, amazon, facebook , and apple. Close is the closing price for stock with a daily time interval.

autoplot(vic_elec, Demand) +
labs(y="Demand in MWh", x="Time",
     title="Electricity demand for Victoria, Australia")

Demand in Vic_elec shows half-hourly total electricity demand for Victoria, Australia.

2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock |> 
  group_by(Symbol) |>
  filter(Close == max(Close)) |> 
  select(Symbol,Date, Close)

The highest close prices for all 4 stocks occured in 2018. Amazon had the highest close at 2039.51 and fb had the lowest at 217.50.

2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

d. Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

Removing facet_grid() puts all the graphs on the same chart instead of individual charts. In this case, the values are all vastly different from each other so there is no overlap between sales, ad budget, and GDP. However, it may be more difficult to see the shape of the data without facet_grid if there was overlap between values.

2.4

The USgas package contains data on the demand for natural gas in the US.

a. Install the USgas package.

library(USgas)

## Warning: package 'USgas' was built under R version 4.4.3

b. Create a tsibble from us_total with year as the index and state as the key.

gas <- us_total |> 
  as_tsibble(index=year, key=state)

gas

c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island)

new_eng <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")

gas |>
  filter(state %in% new_eng) |>
  autoplot(y) +
  labs(x = "Year", y ="Annual Gas Consumption")

The chart shows annual gas consumption in these 6 states over time. We can see that Connecticut has the highest consumption and it is trending upwards. The other states don’t appear to have a trend or seem to be relatively stable. Due to overlap, I thought it might be helpful to try the facet_grid() function to see line shapes a little better.

gas |>
  filter(state %in% new_eng) |>
  autoplot(y) +
  labs(x = "Year", y ="Annual Gas Consumption") +
  facet_grid(state ~., scale = "free_y")

It is easier to see the line shapes now but harder to understand the relationships between the states comparatively. However, we can see that Massachusetts and Vermont do seem to actually be trending upwards which is not something that was clear in the initial graph. There also seems to be some kind of cyclical pattern in the Massachusetts gas consumption as there regular ups and downs. New Hampshire and Maine gas consumption seems to have increased between 2002 and 2005, and 2000 and 2002, respectively, but has since been steady. Rhode Island interestingly has the opposite where consumption has decreased from 1997 and 2000 but has been relatively steady since.

2.5

a.Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

tourism <- readxl::read_excel("tourism.xlsx")

b.Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_df <- tourism |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))

tourism_df

c.Find what combination of Region and Purpose had the maximum number of overnight trips on average.

tourism_df |>
  group_by(Region, Purpose) |>
  summarize(avg_trip = mean(Trips)) |>
  arrange(desc(avg_trip))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.

d.Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

Note:This question was a little unclear to me but may have just been the wording. I grouped by state and included the sum of trips in the tsibble.

tourism_df_trips <- tourism_df |>
  group_by(State) |>
  summarize(total_trips = sum(Trips))

print(head(tourism_df_trips))

## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter total_trips
##   <chr>   <qtr>       <dbl>
## 1 ACT   1998 Q1        551.
## 2 ACT   1998 Q2        416.
## 3 ACT   1998 Q3        436.
## 4 ACT   1998 Q4        450.
## 5 ACT   1999 Q1        379.
## 6 ACT   1999 Q2        558.

2.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? -What do you learn about the series? -What can you say about the seasonal patterns? - Can you identify any unusual years?

Total Private” Employed from us_employment

glimpse(us_employment)

## Rows: 143,412
## Columns: 4
## Key: Series_ID [148]
## $ Month     <mth> 1939 Jan, 1939 Feb, 1939 Mar, 1939 Apr, 1939 May, 1939 Jun, …
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title     <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed  <dbl> 25338, 25447, 25833, 25801, 26113, 26485, 26481, 26848, 2746…

employed_df <- us_employment |> 
  filter(Title == "Total Private")

autoplot(employed_df)

## Plot variable not specified, automatically selected `.vars = Employed`

gg_season(employed_df)

## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Plot variable not specified, automatically selected `y = Employed`

gg_subseries(employed_df)

## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Plot variable not specified, automatically selected `y = Employed`

gg_lag(employed_df)

## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Plot variable not specified, automatically selected `y = Employed`

ACF(employed_df, Employed) |>
  autoplot()

From autoplot, we can see there a general trend upwards as well as potentially some seasonality. gg_season() supports a lack of a strong seasonal trend, with more of a flat line every year.gg_subseries() shows us that the pattern in employment seems to be relatively consistent every month. The blue line (mean) is in a similar point every month.We can also see a pretty clear drop in employment 2008 which makes sense to me based on the knowledge of the big recession during that time period.

Bricks from aus_production

bricks_df <- aus_production |>
  select(Bricks)

autoplot(bricks_df, Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season(bricks_df,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries(bricks_df, Bricks)

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag(bricks_df, Bricks)

## Warning: Removed 20 rows containing missing values (gg_lag).

ACF(bricks_df, Bricks) |> 
  autoplot()

The plots show us that there is an overall trend down in brick production although there was once an increase in the past. There seem to be some cyclical pattern with an upward trend between Q1 and Q2 and downward trend between Q3 and Q4. There were several outliers which all point downward.

Hare from pelt

hare_df <- pelt |>
  select(Hare)

autoplot(pelt, Hare)

gg_subseries(hare_df, Hare)

gg_lag(hare_df, Hare)

ACF(hare_df, Hare) |>
  autoplot()

There does not appear to be a trend or clear seasonality. There does seem to be cyclicity with a spike and dip every 10 years. gg_season could not be used for this data because the data is annual and does not contain seasonal or monthly data points.

H02 Cost from PBS

glimpse(PBS)

## Rows: 67,596
## Columns: 9
## Key: Concession, Type, ATC1, ATC2 [336]
## $ Month      <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991 Nov, 1991 Dec,…
## $ Concession <chr> "Concessional", "Concessional", "Concessional", "Concession…
## $ Type       <chr> "Co-payments", "Co-payments", "Co-payments", "Co-payments",…
## $ ATC1       <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
## $ ATC1_desc  <chr> "Alimentary tract and metabolism", "Alimentary tract and me…
## $ ATC2       <chr> "A01", "A01", "A01", "A01", "A01", "A01", "A01", "A01", "A0…
## $ ATC2_desc  <chr> "STOMATOLOGICAL PREPARATIONS", "STOMATOLOGICAL PREPARATIONS…
## $ Scripts    <dbl> 18228, 15327, 14775, 15380, 14371, 15028, 11040, 15165, 168…
## $ Cost       <dbl> 67877.00, 57011.00, 55020.00, 57222.00, 52120.00, 54299.00,…

h02_df <- PBS |>
  filter(ATC2 == "H02") |>
  select(Cost)

autoplot(h02_df,Cost)

gg_season(h02_df, Cost)

gg_subseries(h02_df,Cost)

ACF(h02_df, Cost) |>
  autoplot()

This data is one of the more challenging to read with autoplot since there are 4 different measurements. Both the Concessional/copayments and concessional/safety net plots trend up with seasonality. The general/copayments does not seem to have any trends or seasonal/cyclical patterns. The general/safety net plot shows a seasonal pattern but no trend.

Barrels from us_gasoline

glimpse(us_gasoline)

## Rows: 1,355
## Columns: 2
## $ Week    <week> 1991 W06, 1991 W07, 1991 W08, 1991 W09, 1991 W10, 1991 W11, 1…
## $ Barrels <dbl> 6.621, 6.433, 6.582, 7.224, 6.875, 6.947, 7.328, 6.777, 7.503,…

gas_df <- us_gasoline 

autoplot(gas_df, Barrels)

gg_season(gas_df, Barrels)

gg_subseries(gas_df, Barrels)

gg_lag(gas_df, Barrels)

ACF(gas_df, Barrels) |>
  autoplot()

There was a period of uptrending growth until around 2005, where there seems to be more of a leveling off and it is not clear if the trend back up will return. There also seems to be seasonality effects but the relationship doesnt seem to be very clear. There are also seem to be a few outliers but it’s unclear if they are true outliers because of the variation in data.