#This assignment involves solving problems 2.1, 2.2, 2.3, 2.4, 2.5, and 2.8 from the Hyndman online Forecasting book.

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

#?aus_production
#?pelt
#?gafa_stock
#?vic_elec

The aus_production dataset contains quarterly production of six commodities in Australia. The Bricks represent a time series of quarterly clay brick production in millions of bricks in the dataset.

2.The pelt dataset contains annual trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935 for Hudson Bay Company. The Lynx represents a time series of the annual number of Canadian Lynx pelts traded in the dataset.

3.The gafa_stock is a dataset containing historical stock prices in $USD on irregular trading days from 2014-2018 for Google, Amazon, Facebook and Apple. The Close represent a time series of the closing price for the stock on irregular trading days in the dataset.

4.The vic_elec is a dataset (tsibble) containing half-hourly electricity demand for Victoria, Australia. The Demand is a time series of half-hourly total electricity demand in MWh in the dataset.

#head(aus_production)
bricks <- aus_production %>% select(Quarter,Bricks)
print(interval(bricks))

## <interval[1]>
## [1] 1Q

#head(pelt)
lynx <- pelt %>% select(Year, Lynx)
print(interval(lynx))

## <interval[1]>
## [1] 1Y

#head(gafa_stock)
close <- gafa_stock %>% select(Date, Close)
print(interval(close))

## <interval[1]>
## [1] !

#head(vic_elec)
demand <- vic_elec %>% select(Time, Demand)
print(interval(demand))

## <interval[1]>
## [1] 30m

Therefore, the intervals for the four time series are: 1. Bricks: Quarterly
2. Lynx: Annual 3. Close: Irregular Day 4. Demand: Half-hourly

Create Time Plot for Four Time Series

autoplot(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

autoplot(pelt,Lynx)

autoplot(gafa_stock,Close)

autoplot(vic_elec,Demand)

Time Plot for Demand with Modified Labels and Titles

autoplot(vic_elec,Demand)+
labs(y = "Electricity in MWh", x= "Time Interval (30m)",
      title = " Half-hourly Electricity Demand in Victoria, Australia")

#Problem-2.2: Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

peak_closing_days <- gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Symbol, Date, Close)

print(peak_closing_days)

## # A tsibble: 4 x 3 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date       Close
##   <chr>  <date>     <dbl>
## 1 AAPL   2018-10-03  232.
## 2 AMZN   2018-09-04 2040.
## 3 FB     2018-07-25  218.
## 4 GOOG   2018-07-26 1268.

Problem-2.3

# a.Read the data 
tute1 <- readr::read_csv("tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)
head(tute1)

## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254

# b. Convert the data to time series
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

head(mytimeseries)

## # A tsibble: 6 x 4 [1Q]
##   Quarter Sales AdBudget   GDP
##     <qtr> <dbl>    <dbl> <dbl>
## 1 1981 Q1 1020.     659.  252.
## 2 1981 Q2  889.     589   291.
## 3 1981 Q3  795      512.  291.
## 4 1981 Q4 1004.     614.  292.
## 5 1982 Q1 1058.     647.  279.
## 6 1982 Q2  944.     602   254

# c. Construct time series plots of each of the three series
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

# plot without using facet_grid
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

The facet_grid creates a separate subplot (facet) for each unique name that means for each original column from the data. On the other hand, when we plot multiple time series without using facet_grid, the plot does not create separate subplots (facets) with individual y-axis scales for each unique name or column in the data. Instead, it overlays all the time series on a single y-axis scale, which reduces the readability of the values and also makes it difficult to identify each plot by name at a glance.

#Problem-2.4:

## a. Install the USgas package:
#install.packages("USgas")
library(USgas)

## b. Create a tsibble from us_total with year as the index and state as the key:

us_total<-us_total %>%
  as_tsibble(index = year, key = state)

head(us_total)

## # A tsibble: 6 x 3 [1Y]
## # Key:       state [1]
##    year state        y
##   <int> <chr>    <int>
## 1  1997 Alabama 324158
## 2  1998 Alabama 329134
## 3  1999 Alabama 337270
## 4  2000 Alabama 353614
## 5  2001 Alabama 332693
## 6  2002 Alabama 379343

#?us_total

## c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island:

# Define required states
states_new_eng <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")

# Filter the required states data
new_eng_states<- us_total %>%
  filter(state %in% states_new_eng) 

# Plot the gas consumption data 
ggplot(new_eng_states, aes(x = year, y = y, colour = state)) +
  geom_line() + labs(title = "Annual Natural Gas Consumption by State in New England", x = "Year", y = "Natural Gas Consumption (mcf)",colour = "State") + facet_grid(state ~ ., scales = "free_y")

#Problem-2.5

## a. Read tourism data
tourism1<-readxl::read_excel("tourism.xlsx")
head(tourism1)

## # A tibble: 6 × 5
##   Quarter    Region   State           Purpose  Trips
##   <chr>      <chr>    <chr>           <chr>    <dbl>
## 1 1998-01-01 Adelaide South Australia Business  135.
## 2 1998-04-01 Adelaide South Australia Business  110.
## 3 1998-07-01 Adelaide South Australia Business  166.
## 4 1998-10-01 Adelaide South Australia Business  127.
## 5 1999-01-01 Adelaide South Australia Business  137.
## 6 1999-04-01 Adelaide South Australia Business  200.

## b. Create a tsibble which is identical to the tourism tsibble from the tsibble package.

#head(tourism)
#?tourism
tourism1_tsibble <- tourism1 %>% mutate(Quarter=yearquarter(Quarter))%>%
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))
head(tourism1_tsibble)

## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

## c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
#reg_purpose<-c(Region,Purpose)
max_avg_trips_combination<- tourism1_tsibble %>%
  group_by(Region,Purpose) %>%
  summarise(Avg_Trips = mean(Trips, na.rm = TRUE)) %>%
  arrange(desc(Avg_Trips)) %>% ungroup() %>% slice(1)

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.

print(max_avg_trips_combination)

## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
##   Region    Purpose  Quarter Avg_Trips
##   <chr>     <chr>      <qtr>     <dbl>
## 1 Melbourne Visiting 2017 Q4      985.

Therefore, the maximum number of overnight trips on average was for the visiting purpose in Melbourne, which occurred in the fourth quarter of the year 2017.

## d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

tsibble_new <- tourism1 %>% mutate(Quarter=yearquarter(Quarter))%>% group_by(Quarter,State) %>%summarise(Tot_Trips = sum(Trips),.groups = "drop") %>% as_tsibble(index = Quarter,key=State)

head(tsibble_new)

## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   Quarter State Tot_Trips
##     <qtr> <chr>     <dbl>
## 1 1998 Q1 ACT        551.
## 2 1998 Q2 ACT        416.
## 3 1998 Q3 ACT        436.
## 4 1998 Q4 ACT        450.
## 5 1999 Q1 ACT        379.
## 6 1999 Q2 ACT        558.

Problem-2.8

Plot using autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

#head(us_employment)
#head(PBS)
#head(aus_production)
#head(pelt)
#head(PBS)
#head(us_gasoline)

# Filter data for “Total Private” Employed and "HO2" Cost 
us_employment_private<-us_employment%>%filter(Title=="Total Private")
PBS1<-PBS%>%filter(ATC2=="H02")%>%select(Month,Cost)
#head(PBS1)

autoplot(us_employment_private,Employed)

us_employment_private |>
  gg_season(Employed, labels = "both") +
  labs(y = "Number of Employment",
       title = "US Monthly Private Employment")

us_employment_private |>
  gg_subseries(Employed) +
  labs(y = "Employed",title = "Subseries Plots of US Monthly Private Employment")

us_employment_private |>
  gg_lag(Employed,geom="point")+ labs(y = "Employed",title = "Lag Plot of US Monthly Private Employment")

us_employment_private|> ACF(Employed)|>autoplot()+
  labs(title = " ACF of US Monthly Private Employment")

All nine lag plots for the monthly US private employment data show a strong positive trend. This means that if employment is high in one month, it tends to stay high in the following months. The Autocorrelation Function (ACF) confirms this steady trend with little drop over time. There are no clear cycles or seasonal patterns observed in the time series. In summary, the data shows a consistent increase in employment without seasonal any seasonal effects.

?aus_production

## starting httpd help server ... done

autoplot(aus_production,Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |>
  gg_season(Bricks, labels = "both") +
  labs(y = "Bricks (million)",
       title = "Quarterly Production of Bricks in Australia")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).

aus_production |>
  gg_subseries(Bricks) +
  labs(y = "Bricks",title = "Subseries Plots of Quarterly Production of Bricks in Australia")

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production |>
  gg_lag(Bricks,geom="point")+ labs(y = "Bricks",title = "Lag Plot of Quarterly Production of Bricks in Australia")

## Warning: Removed 20 rows containing missing values (gg_lag).

aus_production|> ACF(Bricks)|>autoplot()+
  labs(title = " ACF of Quarterly Production of Bricks in Australia")

The time series for quarterly brick production reveals a consistent seasonal pattern across all four quarters (Q1, Q2, Q3, and Q4). Each quarter shows a similar seasonal trend with production reaching its peak around 1980. Although the specific peaks and troughs differ slightly between quarters, the overall seasonal behavior is uniform. It indicates a regular pattern of fluctuations in production throughout the year.

#?pelt

autoplot(pelt,Hare)

pelt |>
  ggplot(aes(x = Year, y = Hare)) +
  geom_line() +
  labs(y = "Number of Hare Pelts Traded",
       x = "Year",
       title = "Yearly Hare Pelt Trading Record of Hudson Bay Company")

pelt |>
  gg_subseries(Hare) +
  labs(y = "Hare",title = "Subseries Plots of Yearly Hare Pelt Trading Record of Hudson Bay Company")

pelt |>
  gg_lag(Hare,geom="point")+ labs(y = "Hare",title = "Lag Plot of Yearly Hare Pelt Trading Record of Hudson Bay Company")

pelt |> ACF(Hare)|>autoplot()+
  labs(title = " ACF of Yearly Hare Pelt Trading Record of Hudson Bay Company")

As the pelt data is strictly yearly, the gg_season was not suitable to use here. So, ggplot function is used here instead of using gg_season function.The Hare time series shows ups and downs with different peaks and troughs, it suggests variable seasonality, or cyclical behavior. The variability in the peaks and troughs indicates that the factors driving the data are not consistent year over year, leading to different patterns each time.The trend in the data is absent.

?PBS
autoplot(PBS1,Cost)

PBS1 |> gg_season(Cost, labels = "both") +
  labs(y = "Cost",
       title = "Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")

PBS1 |> gg_subseries(Cost) +
  labs(y = "Cost",
       title = "Subseries Plots of Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")

#PBS1 |> gg_lag(Cost,geom="point") +
  #labs(y = "Cost",
      # title = "Lag Plot of Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")

PBS1 |> ACF(Cost)|> autoplot() +
  labs(title = "ACF of Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")

I have gotten four time series for the PBS dataset.I tried to get only one time series by selecting specific column, but for some unknown reasons the dataset did not allow me to exclude other columns. Therefore for convenience, here I have explained only the cost time series of H02 medication for the general co-payments category. For this time series, some seasonal patterns are observed. Medication costs tend to increase at the beginning of each month, fluctuate for a period, and then rise again towards the end of the month. However, this end-of-month increase does not occur in November and December.

#?us_gasoline
autoplot(us_gasoline,Barrels)

us_gasoline |> gg_season(Barrels, labels = "both") +
  labs(y = "Barrels (million)",
       title = "Weekly US finished Motor Gasoline Product Supply")

us_gasoline |> gg_subseries(Barrels) +
  labs(y = "Barrels (million)",
       title = "Subseries Plots of Weekly US finished Motor Gasoline Product Supply")

us_gasoline |>
  gg_lag(Barrels, geom="point") +
  labs(y = "Barrels (million)", 
       title = "Lag Plot of Weekly Gasoline Supply")

us_gasoline |>
  ACF(Barrels) |>
  autoplot() + labs(title="ACF of Weekly US finished Motor Gasoline Product Supply")

The lag plots for the weekly gasoline supply data reveal a strong positive relationship across all nine plots, indicating consistent seasonality in the supply patterns. This suggests that the gasoline supply in one week is closely related to the supply in the same week of previous years, highlighting a regular, predictable pattern over time. Moreover, the ACF plot reveals that the data are both trended and seasonal. The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due to the seasonality.

Data624_HW1

Mahmud Hasan Al Raji

2024-09-09

Create Time Plot for Four Time Series

Time Plot for Demand with Modified Labels and Titles

Problem-2.3

Problem-2.8