#This assignment involves solving problems 2.1, 2.2, 2.3, 2.4, 2.5, and 2.8 from the Hyndman online Forecasting book.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.1 ✔ fabletools 0.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
#?aus_production
#?pelt
#?gafa_stock
#?vic_elec
2.The pelt dataset contains annual trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935 for Hudson Bay Company. The Lynx represents a time series of the annual number of Canadian Lynx pelts traded in the dataset.
3.The gafa_stock is a dataset containing historical stock prices in $USD on irregular trading days from 2014-2018 for Google, Amazon, Facebook and Apple. The Close represent a time series of the closing price for the stock on irregular trading days in the dataset.
4.The vic_elec is a dataset (tsibble) containing half-hourly electricity demand for Victoria, Australia. The Demand is a time series of half-hourly total electricity demand in MWh in the dataset.
#head(aus_production)
bricks <- aus_production %>% select(Quarter,Bricks)
print(interval(bricks))
## <interval[1]>
## [1] 1Q
#head(pelt)
lynx <- pelt %>% select(Year, Lynx)
print(interval(lynx))
## <interval[1]>
## [1] 1Y
#head(gafa_stock)
close <- gafa_stock %>% select(Date, Close)
print(interval(close))
## <interval[1]>
## [1] !
#head(vic_elec)
demand <- vic_elec %>% select(Time, Demand)
print(interval(demand))
## <interval[1]>
## [1] 30m
Therefore, the intervals for the four time series are: 1. Bricks:
Quarterly
2. Lynx: Annual 3. Close: Irregular Day 4. Demand: Half-hourly
autoplot(aus_production,Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
autoplot(pelt,Lynx)
autoplot(gafa_stock,Close)
autoplot(vic_elec,Demand)
autoplot(vic_elec,Demand)+
labs(y = "Electricity in MWh", x= "Time Interval (30m)",
title = " Half-hourly Electricity Demand in Victoria, Australia")
#Problem-2.2: Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
peak_closing_days <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
select(Symbol, Date, Close)
print(peak_closing_days)
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
# a.Read the data
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
head(tute1)
## # A tibble: 6 × 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
# b. Convert the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
head(mytimeseries)
## # A tsibble: 6 x 4 [1Q]
## Quarter Sales AdBudget GDP
## <qtr> <dbl> <dbl> <dbl>
## 1 1981 Q1 1020. 659. 252.
## 2 1981 Q2 889. 589 291.
## 3 1981 Q3 795 512. 291.
## 4 1981 Q4 1004. 614. 292.
## 5 1982 Q1 1058. 647. 279.
## 6 1982 Q2 944. 602 254
# c. Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
# plot without using facet_grid
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The facet_grid creates a separate subplot (facet) for each unique name that means for each original column from the data. On the other hand, when we plot multiple time series without using facet_grid, the plot does not create separate subplots (facets) with individual y-axis scales for each unique name or column in the data. Instead, it overlays all the time series on a single y-axis scale, which reduces the readability of the values and also makes it difficult to identify each plot by name at a glance.
#Problem-2.4:
## a. Install the USgas package:
#install.packages("USgas")
library(USgas)
## b. Create a tsibble from us_total with year as the index and state as the key:
us_total<-us_total %>%
as_tsibble(index = year, key = state)
head(us_total)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
#?us_total
## c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island:
# Define required states
states_new_eng <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
# Filter the required states data
new_eng_states<- us_total %>%
filter(state %in% states_new_eng)
# Plot the gas consumption data
ggplot(new_eng_states, aes(x = year, y = y, colour = state)) +
geom_line() + labs(title = "Annual Natural Gas Consumption by State in New England", x = "Year", y = "Natural Gas Consumption (mcf)",colour = "State") + facet_grid(state ~ ., scales = "free_y")
#Problem-2.5
## a. Read tourism data
tourism1<-readxl::read_excel("tourism.xlsx")
head(tourism1)
## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
## b. Create a tsibble which is identical to the tourism tsibble from the tsibble package.
#head(tourism)
#?tourism
tourism1_tsibble <- tourism1 %>% mutate(Quarter=yearquarter(Quarter))%>%
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
head(tourism1_tsibble)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## c. Find what combination of Region and Purpose had the maximum number of overnight trips on average.
#reg_purpose<-c(Region,Purpose)
max_avg_trips_combination<- tourism1_tsibble %>%
group_by(Region,Purpose) %>%
summarise(Avg_Trips = mean(Trips, na.rm = TRUE)) %>%
arrange(desc(Avg_Trips)) %>% ungroup() %>% slice(1)
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
print(max_avg_trips_combination)
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter Avg_Trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
Therefore, the maximum number of overnight trips on average was for the visiting purpose in Melbourne, which occurred in the fourth quarter of the year 2017.
## d. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tsibble_new <- tourism1 %>% mutate(Quarter=yearquarter(Quarter))%>% group_by(Quarter,State) %>%summarise(Tot_Trips = sum(Trips),.groups = "drop") %>% as_tsibble(index = Quarter,key=State)
head(tsibble_new)
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## Quarter State Tot_Trips
## <qtr> <chr> <dbl>
## 1 1998 Q1 ACT 551.
## 2 1998 Q2 ACT 416.
## 3 1998 Q3 ACT 436.
## 4 1998 Q4 ACT 450.
## 5 1999 Q1 ACT 379.
## 6 1999 Q2 ACT 558.
Plot using autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
#head(us_employment)
#head(PBS)
#head(aus_production)
#head(pelt)
#head(PBS)
#head(us_gasoline)
# Filter data for “Total Private” Employed and "HO2" Cost
us_employment_private<-us_employment%>%filter(Title=="Total Private")
PBS1<-PBS%>%filter(ATC2=="H02")%>%select(Month,Cost)
#head(PBS1)
autoplot(us_employment_private,Employed)
us_employment_private |>
gg_season(Employed, labels = "both") +
labs(y = "Number of Employment",
title = "US Monthly Private Employment")
us_employment_private |>
gg_subseries(Employed) +
labs(y = "Employed",title = "Subseries Plots of US Monthly Private Employment")
us_employment_private |>
gg_lag(Employed,geom="point")+ labs(y = "Employed",title = "Lag Plot of US Monthly Private Employment")
us_employment_private|> ACF(Employed)|>autoplot()+
labs(title = " ACF of US Monthly Private Employment")
All nine lag plots for the monthly US private employment data show a
strong positive trend. This means that if employment is high in one
month, it tends to stay high in the following months. The
Autocorrelation Function (ACF) confirms this steady trend with little
drop over time. There are no clear cycles or seasonal patterns observed
in the time series. In summary, the data shows a consistent increase in
employment without seasonal any seasonal effects.
?aus_production
## starting httpd help server ... done
autoplot(aus_production,Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production |>
gg_season(Bricks, labels = "both") +
labs(y = "Bricks (million)",
title = "Quarterly Production of Bricks in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).
aus_production |>
gg_subseries(Bricks) +
labs(y = "Bricks",title = "Subseries Plots of Quarterly Production of Bricks in Australia")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production |>
gg_lag(Bricks,geom="point")+ labs(y = "Bricks",title = "Lag Plot of Quarterly Production of Bricks in Australia")
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production|> ACF(Bricks)|>autoplot()+
labs(title = " ACF of Quarterly Production of Bricks in Australia")
The time series for quarterly brick production reveals a consistent
seasonal pattern across all four quarters (Q1, Q2, Q3, and Q4). Each
quarter shows a similar seasonal trend with production reaching its peak
around 1980. Although the specific peaks and troughs differ slightly
between quarters, the overall seasonal behavior is uniform. It indicates
a regular pattern of fluctuations in production throughout the year.
#?pelt
autoplot(pelt,Hare)
pelt |>
ggplot(aes(x = Year, y = Hare)) +
geom_line() +
labs(y = "Number of Hare Pelts Traded",
x = "Year",
title = "Yearly Hare Pelt Trading Record of Hudson Bay Company")
pelt |>
gg_subseries(Hare) +
labs(y = "Hare",title = "Subseries Plots of Yearly Hare Pelt Trading Record of Hudson Bay Company")
pelt |>
gg_lag(Hare,geom="point")+ labs(y = "Hare",title = "Lag Plot of Yearly Hare Pelt Trading Record of Hudson Bay Company")
pelt |> ACF(Hare)|>autoplot()+
labs(title = " ACF of Yearly Hare Pelt Trading Record of Hudson Bay Company")
As the pelt data is strictly yearly, the gg_season was not suitable to
use here. So, ggplot function is used here instead of using gg_season
function.The Hare time series shows ups and downs with different peaks
and troughs, it suggests variable seasonality, or cyclical behavior. The
variability in the peaks and troughs indicates that the factors driving
the data are not consistent year over year, leading to different
patterns each time.The trend in the data is absent.
?PBS
autoplot(PBS1,Cost)
PBS1 |> gg_season(Cost, labels = "both") +
labs(y = "Cost",
title = "Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")
PBS1 |> gg_subseries(Cost) +
labs(y = "Cost",
title = "Subseries Plots of Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")
#PBS1 |> gg_lag(Cost,geom="point") +
#labs(y = "Cost",
# title = "Lag Plot of Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")
PBS1 |> ACF(Cost)|> autoplot() +
labs(title = "ACF of Monthly Medicare Australia Prescription Cost for ATC2 H02 Drugs")
I have gotten four time series for the PBS dataset.I tried to get only
one time series by selecting specific column, but for some unknown
reasons the dataset did not allow me to exclude other columns. Therefore
for convenience, here I have explained only the cost time series of H02
medication for the general co-payments category. For this time series,
some seasonal patterns are observed. Medication costs tend to increase
at the beginning of each month, fluctuate for a period, and then rise
again towards the end of the month. However, this end-of-month increase
does not occur in November and December.
#?us_gasoline
autoplot(us_gasoline,Barrels)
us_gasoline |> gg_season(Barrels, labels = "both") +
labs(y = "Barrels (million)",
title = "Weekly US finished Motor Gasoline Product Supply")
us_gasoline |> gg_subseries(Barrels) +
labs(y = "Barrels (million)",
title = "Subseries Plots of Weekly US finished Motor Gasoline Product Supply")
us_gasoline |>
gg_lag(Barrels, geom="point") +
labs(y = "Barrels (million)",
title = "Lag Plot of Weekly Gasoline Supply")
us_gasoline |>
ACF(Barrels) |>
autoplot() + labs(title="ACF of Weekly US finished Motor Gasoline Product Supply")
The lag plots for the weekly gasoline supply data reveal a strong
positive relationship across all nine plots, indicating consistent
seasonality in the supply patterns. This suggests that the gasoline
supply in one week is closely related to the supply in the same week of
previous years, highlighting a regular, predictable pattern over time.
Moreover, the ACF plot reveals that the data are both trended and
seasonal. The slow decrease in the ACF as the lags increase is due to
the trend, while the “scalloped” shape is due to the seasonality.