Hide Assignment Information Instructions Please submit exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .pdf file with your code.
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_Symbol, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series. What is the time interval of each series? Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
help(aus_production)
## starting httpd help server ... done
aus_production
The time interval is quarterly for the aus_production
aus_production %>%
autoplot(Bricks) +
labs(
y = "Bricks Produced",
title = "Bricks Produced Quarterly: Melbourne, Australia"
)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
help(gafa_stock)
gafa_stock
The time interval is daily for the gafa_Symbol
gafa_stock %>%
autoplot(Close) +
labs(
y = "Closing Price",
title = "Symbols Closing Price"
)
# Pelt dataset
help(pelt)
pelt
The time interval is quarterly for the pelt is yearly
pelt %>%
autoplot(Lynx) +
labs(
y = "Lynx Pelts",
title = "Lynx Pelts Collected Overtime"
)
help(vic_elec)
vic_elec
The time interval is quarterly for the vic_elec is 30 minutes
vic_elec %>%
autoplot(Demand) +
labs(
y = "Demand",
title = "Electricity Demand Overtime"
)
Use filter() to find what days corresponded to the peak closing price for each of the four Symbols in gafa_Symbol
peak_days <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
ungroup()
# Plot the closing prices and highlight the peak prices
gafa_stock %>%
autoplot(Close) +
geom_point(data = peak_days, aes(x = Date, y = Close), color = "red", size = 3) +
labs(y = "Closing Price", title = "Symbol Closing Prices with Peak Days Highlighted") +
theme_minimal()
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
You can read the data into R with the following script:
tute1 <- readr::read_csv("https://raw.githubusercontent.com/Mikhail-Broomes/Data-624/main/Homeworks/Homework%201/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
Convert the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
What happens when we don’t include facet_grid it makes tyhe grid uniform
instead of separating the plots by the name of the series
The USgas package contains data on the demand for natural gas in the US.
Install the USgas package. Create a tsibble from us_total with year as the index and state as the key.
library(USgas)
## Warning: package 'USgas' was built under R version 4.3.3
us_tibble <- us_total %>%
as_tsibble(index = year, key = state)
us_tibble
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
new_england_tibble <- us_tibble %>%
filter(state %in% c("Maine", "Vermon", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")) %>%
autoplot(y) +
labs(title = "Annual Natural Gas Consumption by State in New England",
y = "Consumption (billion cubic feet)")
new_england_tibble
2.5
Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
library(readxl)
## Warning: package 'readxl' was built under R version 4.3.3
url <- "https://raw.githubusercontent.com/Mikhail-Broomes/Data-624/main/Homeworks/Homework%201/tourism.xlsx"
temp_file <- tempfile(fileext = ".xlsx")
download.file(url, temp_file, mode = "wb")
df_tourism <- read_excel(temp_file)
print(df_tourism)
## # A tibble: 24,320 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
## 7 1999-07-01 Adelaide South Australia Business 169.
## 8 1999-10-01 Adelaide South Australia Business 134.
## 9 2000-01-01 Adelaide South Australia Business 154.
## 10 2000-04-01 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
Create a tsibble which is identical to the tourism tsibble from the tsibble package.
??tourism
tsibble::tourism
tourism_tsibble <- df_tourism %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c("Region", "State", "Purpose"))
tourism_tsibble
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
tourism_tsibble %>%
group_by(Region, Purpose) %>%
summarise(mean_trips = mean(Trips)) %>%
filter(mean_trips == max(mean_trips))%>%
arrange(desc(mean_trips), by_group=TRUE)
The combination of Melbourne and visiting had the maximum number of overnight trips on average.
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism_tsibble %>%
group_by(State) %>%
summarise(Trips = sum(Trips)) %>%
as_tsibble(index = Quarter, key = State)
2.8
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Can you spot any seasonality, cyclicity and trend?
us_employment
us_employment %>%
filter(Title == "Total Private") %>%
autoplot(Employed) +
labs(title = "Time Series of US Employment",
subtitle = "Total Private")
us_employment %>%
filter(Title == "Total Private") %>%
gg_season(Employed) +
labs(title = "Seasonal Plot of US Employment",
subtitle = "Total Private")
us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed) +
labs(title = "Subseries Plot of US Employment",
subtitle = "Total Private")
us_employment %>%
filter(Title == "Total Private") %>%
gg_lag(Employed) +
labs(title = "Lag Plot of US Employment",
subtitle = "Total Private")
us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot()
The employment data shows a generally positive trend over the time
period. It doesn’t seem to follow a seasonal pattern because it looks
similar in each month. Both the lag and autocorrelation plots suggest
that the data is mostly influenced by a general trend, with each point
closely related to the ones before it.
In the longer time series, there are a few years where the data drops. However, these increases and decreases happen at unpredictable times, making it difficult to identify any regular cycles or their lengths.
aus_production %>%
autoplot(Bricks) +
labs(title = "Time Series of Bricks Produced",
subtitle = "Melbourne, Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_season(Bricks) +
labs(title = "Seasonal Plot of Bricks Produced",
subtitle = "Melbourne, Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_subseries(Bricks) +
labs(title = "Subseries Plot of Bricks Produced",
subtitle = "Melbourne, Australia")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_lag(Bricks) +
labs(title = "Lag Plot of Bricks Produced",
subtitle = "Melbourne, Australia")
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks) %>%
autoplot()
The bricks produced data shows a clear seasonality in each quarter.
There is also a general upward trend in the data, with the number of
bricks produced increasing over time. The lag plot shows a strong
positive correlation between each point and the one before it,
suggesting that the data is influenced by both the general trend and the
seasonal pattern.
pelt %>%
autoplot(Hare) +
labs(title = "Time Series of Hare Pelts",
subtitle = "Canada")
pelt %>%
gg_subseries(Hare) +
labs(title = "Subseries Plot of Hare Pelts",
subtitle = "Canada")
pelt %>%
gg_lag(Hare) +
labs(title = "Lag Plot of Hare Pelts",
subtitle = "Canada")
pelt %>%
ACF(Hare) %>%
autoplot()
The hare pelts data shows a clear cyclical pattern, with the number of pelts collected increasing and decreasing in a regular cycle. There is also a general upward trend in the data, with the number of pelts collected increasing over time. The lag plot shows a strong positive correlation between each point and the one before it, suggesting that the data is influenced by both the general trend and the cyclical pattern.
PBS %>%
filter(ATC2 == "H02") %>%
autoplot(Cost) +
labs(title = "Time Series of H02 Cost",
subtitle = "PBS")
PBS %>%
filter(ATC2 == "H02") %>%
gg_season(Cost)
PBS %>%
filter(ATC2 == "H02") %>%
gg_subseries(Cost) +
labs(title = "Subseries Plot of H02 Cost",
subtitle = "PBS")
PBS %>%
filter(ATC2 == "H02") %>%
ACF(Cost) %>%
autoplot()
The H02 cost data shows a clear seasonality in each month. There is also a general upward trend in the data, with the cost increasing over time. The lag plot shows a strong positive correlation between each point and the one before it, suggesting that the data is influenced by both the general trend and the seasonal pattern.
us_gasoline %>%
autoplot(Barrels) +
labs(title = "Time Series of Barrels of Gasoline",
subtitle = "US")
us_gasoline %>%
gg_season(Barrels) +
labs(title = "Seasonal Plot of Barrels of Gasoline",
subtitle = "US")
us_gasoline %>%
gg_subseries(Barrels) +
labs(title = "Subseries Plot of Barrels of Gasoline",
subtitle = "US")
us_gasoline %>%
gg_lag(Barrels) +
labs(title = "Lag Plot of Barrels of Gasoline",
subtitle = "US")
us_gasoline %>%
ACF(Barrels) %>%
autoplot()
From the data, it is clear that there is a pattern in the data, with the
number of barrels of gasoline consumed increasing and decreasing in a
regular cycle. There is also a general upward trend in the data, with
the number of barrels consumed increasing over time. The lag plot shows
a strong positive correlation between each point and the one before it,
suggesting that the data is influenced by both the general trend and the
seasonal pattern.