HW1: 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.3 ✔ fable 0.5.0
## ✔ ggplot2 3.5.1
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'tsibbledata' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(dplyr)
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series.
What is the time interval of each series?
Use autoplot() to produce a time plot of each series.
For the last plot, modify the axis labels and title.
autoplot(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
Bricks
is the clay brick production counted in the millions. The time intervals
are quarterly.
autoplot(pelt, Lynx)
Lynx is the number of canadian lynx pelts traded. The time interval is
annual.
autoplot(gafa_stock, Close)
Gafa stock contains stock prices from 2015 to 2018 from google, amazon,
facebook , and apple. Close is the closing price for stock with a daily
time interval.
autoplot(vic_elec, Demand) +
labs(y="Demand in MWh", x="Time",
title="Electricity demand for Victoria, Australia")
Demand in Vic_elec shows half-hourly total electricity demand for
Victoria, Australia.
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
gafa_stock |>
group_by(Symbol) |>
filter(Close == max(Close)) |>
select(Symbol,Date, Close)
The highest close prices for all 4 stocks occured in 2018. Amazon had the highest close at 2039.51 and fb had the lowest at 217.50.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
d. Check what happens when you don’t include facet_grid().
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
Removing facet_grid() puts all the graphs on the same chart instead of
individual charts. In this case, the values are all vastly different
from each other so there is no overlap between sales, ad budget, and
GDP. However, it may be more difficult to see the shape of the data
without facet_grid if there was overlap between values.
The USgas package contains data on the demand for natural gas in the US.
library(USgas)
## Warning: package 'USgas' was built under R version 4.4.3
gas <- us_total |>
as_tsibble(index=year, key=state)
gas
new_eng <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
gas |>
filter(state %in% new_eng) |>
autoplot(y) +
labs(x = "Year", y ="Annual Gas Consumption")
The chart shows annual gas consumption in these 6 states over time. We
can see that Connecticut has the highest consumption and it is trending
upwards. The other states don’t appear to have a trend or seem to be
relatively stable. Due to overlap, I thought it might be helpful to try
the facet_grid() function to see line shapes a little better.
gas |>
filter(state %in% new_eng) |>
autoplot(y) +
labs(x = "Year", y ="Annual Gas Consumption") +
facet_grid(state ~., scale = "free_y")
It is easier to see the line shapes now but harder to understand the
relationships between the states comparatively. However, we can see that
Massachusetts and Vermont do seem to actually be trending upwards which
is not something that was clear in the initial graph. There also seems
to be some kind of cyclical pattern in the Massachusetts gas consumption
as there regular ups and downs. New Hampshire and Maine gas consumption
seems to have increased between 2002 and 2005, and 2000 and 2002,
respectively, but has since been steady. Rhode Island interestingly has
the opposite where consumption has decreased from 1997 and 2000 but has
been relatively steady since.
tourism <- readxl::read_excel("tourism.xlsx")
tourism_df <- tourism |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
tourism_df
tourism_df |>
group_by(Region, Purpose) |>
summarize(avg_trip = mean(Trips)) |>
arrange(desc(avg_trip))
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
## Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
Note:This question was a little unclear to me but may have just been the wording. I grouped by state and included the sum of trips in the tsibble.
tourism_df_trips <- tourism_df |>
group_by(State) |>
summarize(total_trips = sum(Trips))
print(head(tourism_df_trips))
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter total_trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
glimpse(us_employment)
## Rows: 143,412
## Columns: 4
## Key: Series_ID [148]
## $ Month <mth> 1939 Jan, 1939 Feb, 1939 Mar, 1939 Apr, 1939 May, 1939 Jun, …
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed <dbl> 25338, 25447, 25833, 25801, 26113, 26485, 26481, 26848, 2746…
employed_df <- us_employment |>
filter(Title == "Total Private")
autoplot(employed_df)
## Plot variable not specified, automatically selected `.vars = Employed`
gg_season(employed_df)
## Warning: `gg_season()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_season()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Plot variable not specified, automatically selected `y = Employed`
gg_subseries(employed_df)
## Warning: `gg_subseries()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_subseries()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Plot variable not specified, automatically selected `y = Employed`
gg_lag(employed_df)
## Warning: `gg_lag()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_lag()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Plot variable not specified, automatically selected `y = Employed`
ACF(employed_df, Employed) |>
autoplot()
From autoplot, we can see there a general trend upwards as well as potentially some seasonality. gg_season() supports a lack of a strong seasonal trend, with more of a flat line every year.gg_subseries() shows us that the pattern in employment seems to be relatively consistent every month. The blue line (mean) is in a similar point every month.We can also see a pretty clear drop in employment 2008 which makes sense to me based on the knowledge of the big recession during that time period.
bricks_df <- aus_production |>
select(Bricks)
autoplot(bricks_df, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season(bricks_df,Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries(bricks_df, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag(bricks_df, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
ACF(bricks_df, Bricks) |>
autoplot()
The plots show us that there is an overall trend down in brick
production although there was once an increase in the past. There seem
to be some cyclical pattern with an upward trend between Q1 and Q2 and
downward trend between Q3 and Q4. There were several outliers which all
point downward.
hare_df <- pelt |>
select(Hare)
autoplot(pelt, Hare)
gg_subseries(hare_df, Hare)
gg_lag(hare_df, Hare)
ACF(hare_df, Hare) |>
autoplot()
There does not appear to be a trend or clear seasonality. There does
seem to be cyclicity with a spike and dip every 10 years. gg_season
could not be used for this data because the data is annual and does not
contain seasonal or monthly data points.
glimpse(PBS)
## Rows: 67,596
## Columns: 9
## Key: Concession, Type, ATC1, ATC2 [336]
## $ Month <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991 Nov, 1991 Dec,…
## $ Concession <chr> "Concessional", "Concessional", "Concessional", "Concession…
## $ Type <chr> "Co-payments", "Co-payments", "Co-payments", "Co-payments",…
## $ ATC1 <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
## $ ATC1_desc <chr> "Alimentary tract and metabolism", "Alimentary tract and me…
## $ ATC2 <chr> "A01", "A01", "A01", "A01", "A01", "A01", "A01", "A01", "A0…
## $ ATC2_desc <chr> "STOMATOLOGICAL PREPARATIONS", "STOMATOLOGICAL PREPARATIONS…
## $ Scripts <dbl> 18228, 15327, 14775, 15380, 14371, 15028, 11040, 15165, 168…
## $ Cost <dbl> 67877.00, 57011.00, 55020.00, 57222.00, 52120.00, 54299.00,…
h02_df <- PBS |>
filter(ATC2 == "H02") |>
select(Cost)
autoplot(h02_df,Cost)
gg_season(h02_df, Cost)
gg_subseries(h02_df,Cost)
ACF(h02_df, Cost) |>
autoplot()
This data is one of the more challenging to read with autoplot since there are 4 different measurements. Both the Concessional/copayments and concessional/safety net plots trend up with seasonality. The general/copayments does not seem to have any trends or seasonal/cyclical patterns. The general/safety net plot shows a seasonal pattern but no trend.
glimpse(us_gasoline)
## Rows: 1,355
## Columns: 2
## $ Week <week> 1991 W06, 1991 W07, 1991 W08, 1991 W09, 1991 W10, 1991 W11, 1…
## $ Barrels <dbl> 6.621, 6.433, 6.582, 7.224, 6.875, 6.947, 7.328, 6.777, 7.503,…
gas_df <- us_gasoline
autoplot(gas_df, Barrels)
gg_season(gas_df, Barrels)
gg_subseries(gas_df, Barrels)
gg_lag(gas_df, Barrels)
ACF(gas_df, Barrels) |>
autoplot()
There was a period of uptrending growth until around 2005, where there
seems to be more of a leveling off and it is not clear if the trend back
up will return. There also seems to be seasonality effects but the
relationship doesnt seem to be very clear. There are also seem to be a
few outliers but it’s unclear if they are true outliers because of the
variation in data.