Below I load the necessary packages
We are tasked with completing questions 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 from chapter 2 of “Forecasting: Principles and Practice” Third Edition by Rob Hyndman and George Athanasopoulos.
We are tasked with exploring four different time series. Below I start with the Bricks time series from aus_production. I used the ? function to get a description of the data set. The data set is quarterly production data of selected commodities in Australia, captured is a tsibble.
bricks <- aus_production |> select(Quarter, Bricks)
head(bricks)
## # A tsibble: 6 x 2 [1Q]
## Quarter Bricks
## <qtr> <dbl>
## 1 1956 Q1 189
## 2 1956 Q2 204
## 3 1956 Q3 208
## 4 1956 Q4 197
## 5 1957 Q1 187
## 6 1957 Q2 214
I use autoplot to create a plot of the time series
bricks |> autoplot() + labs(title = 'Quarterly Bricks Production')
## Plot variable not specified, automatically selected `.vars = Bricks`
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
I used the ? function to get a description of the data set. The data set captures trading records from the Hudson Bay Company for the Snowshoe Hare and Canadian Lynx furs from 1845 to 1935
lynx <- pelt |> select(Year, Lynx)
head(lynx)
## # A tsibble: 6 x 2 [1Y]
## Year Lynx
## <dbl> <dbl>
## 1 1845 30090
## 2 1846 45150
## 3 1847 49150
## 4 1848 39520
## 5 1849 21230
## 6 1850 8420
The data is captured annually
I use autoplot to create a plot of the time series
lynx |> autoplot() + labs(title = 'Annual Lynx furs trading record')
## Plot variable not specified, automatically selected `.vars = Lynx`
I used the ? function to get a description of the data set. The data set captures historical stock prices from 2014-2018 for Google, Amazon, Facebook, and Apple in $USD.
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
The data is captured daily on irregular trading days (certain days don’t have trading activity, such as weekends or holidays).
I filter for the Closing price. There are four time series in this date set, the closing price for each of the four stocks.
close <- gafa_stock |> select(Date, Close)
head(close)
## # A tsibble: 6 x 3 [!]
## # Key: Symbol [1]
## Date Close Symbol
## <date> <dbl> <chr>
## 1 2014-01-02 79.0 AAPL
## 2 2014-01-03 77.3 AAPL
## 3 2014-01-06 77.7 AAPL
## 4 2014-01-07 77.1 AAPL
## 5 2014-01-08 77.6 AAPL
## 6 2014-01-09 76.6 AAPL
I use autoplot to create a plot of the time series
close |> autoplot() + labs(title = 'Daily close price')
## Plot variable not specified, automatically selected `.vars = Close`
Given the different prices, it would be more appropriate to separate the time series onto different graphs:
close |> ggplot(aes(x = Date, y= Close, color= Symbol)) + geom_line() + facet_grid(Symbol ~ ., scales = 'free_y') + labs(title = 'Daily close price')
I used the ? function to get a description of the data set. The data set contains half-hourly electricity demand data for Victoria Australia. The time interval is 30 minutes.
demand <- vic_elec |> select(Time, Demand)
head(demand)
## # A tsibble: 6 x 2 [30m] <Australia/Melbourne>
## Time Demand
## <dttm> <dbl>
## 1 2012-01-01 00:00:00 4383.
## 2 2012-01-01 00:30:00 4263.
## 3 2012-01-01 01:00:00 4049.
## 4 2012-01-01 01:30:00 3878.
## 5 2012-01-01 02:00:00 4036.
## 6 2012-01-01 02:30:00 3866.
I use autoplot to create a plot of the time series
demand |> autoplot() + labs(title = 'Half Hour Electricity demand for Victoria, Australia')
## Plot variable not specified, automatically selected `.vars = Demand`
Use filter()
to find what days corresponded to the peak
closing price for each of the four stocks
in gafa_stock
.
It should be noted that while the question asks us to find the day of the week corresponding to the maximum value of Close, it would be more accurate to view this for Adj_Close, which captures dividends, stock splits, etc.
unique(gafa_stock$Symbol)
## [1] "AAPL" "AMZN" "FB" "GOOG"
googleMax <- gafa_stock |> filter(Symbol=='GOOG') |> slice_max(Close)
weekdays(as.Date(googleMax$Date))
## [1] "Thursday"
aapleMax <- gafa_stock |> filter(Symbol=='AAPL') |> slice_max(Close)
weekdays(as.Date(aapleMax$Date))
## [1] "Wednesday"
facebookMax <- gafa_stock |> filter(Symbol=='FB') |> slice_max(Close)
weekdays(as.Date(facebookMax$Date))
## [1] "Wednesday"
amazonMax <- gafa_stock |> filter(Symbol=='AMZN') |> slice_max(Close)
weekdays(as.Date(amazonMax$Date))
## [1] "Tuesday"
We are tasked with importing a csv file that has quarterly data for a company and performing data analysis.
Note that the file is saved in my working directory. To reproduce this code, the user would need to save the file in their working directory.
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
Converting the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
Constructing time series plots
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
When we don’t include fact_grid, we get the below result - the time series are all graphed on one graph.
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
library(USgas)
We are tasked with creating a tsibble from us_total with year as the index and state as the key.
First, I examine the structure of us_total
head(us_total)
## year state y
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
Next, I create a tsibble
gasTsibble <- us_total |> as_tsibble(index = year, key = state)
head(gasTsibble)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
Next, we have to plot gas consumption for the states: Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island
newEnglandGas <- gasTsibble |> filter(state== c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island'))
newEnglandGas |> ggplot(aes(x = year, y= y, color = state)) +
geom_line() +
facet_grid(state ~ ., scales = "free_y")
We are instructed to download an excel file that contains tourism data and import it via the read_excel function.
I downloaded this file into my working directory. To reproduce this code would require the person executing the code to save the excel file in their working directory before running the code.
excelTourism <- readxl::read_excel('tourism.xlsx')
head(excelTourism)
## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
We are then instructed to create a tsibble that is identical to the tourism tsibble in the tsibble package. First, I check what the tourism tsibble looks like
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
Next, I convert my excelTourism data frame to a tsibble that matches the above. Before I can covert the data frame, I need to change the Quarter column into a tsibble recognized date structure.
excelTourism$Quarter <- yearquarter(excelTourism$Quarter)
head(excelTourism)
## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
Next, I convert the data frame to a tsibble
excelTourismTsibble <- as_tsibble(excelTourism, key = c('Region', 'State','Purpose'),
index = Quarter)
head(excelTourismTsibble)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
Next, we have to find what combination
of Region
and Purpose
had the maximum number
of overnight trips on average. I use the dataframe for this as opposed
to the tsibble because I want to find the average for each region by
type across time, so I no longer need the temporal structure.
excelTourism |> group_by(Region, Purpose) |> summarise(averageTrips = mean(Trips)) |> ungroup() |> head(n = 10)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 10 × 3
## Region Purpose averageTrips
## <chr> <chr> <dbl>
## 1 Adelaide Business 156.
## 2 Adelaide Holiday 157.
## 3 Adelaide Other 56.6
## 4 Adelaide Visiting 205.
## 5 Adelaide Hills Business 2.66
## 6 Adelaide Hills Holiday 10.5
## 7 Adelaide Hills Other 1.40
## 8 Adelaide Hills Visiting 14.2
## 9 Alice Springs Business 14.6
## 10 Alice Springs Holiday 31.9
excelTourism |> group_by(Region, Purpose) |> summarise(averageTrips = mean(Trips)) |> ungroup() |>slice_max(averageTrips)
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 1 × 3
## Region Purpose averageTrips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
The maximum number of overnight trips on average was to Sydney for Vising Purposes
Lastly, we are instructed to create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
totalTripsByStateTsibble <- excelTourismTsibble |> group_by(State) |> summarise(total_Trips = sum(Trips))
totalTripsByStateTsibble |> head(n=10)
## # A tsibble: 10 x 3 [1Q]
## # Key: State [1]
## State Quarter total_Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
We are instructed to explore various features for a variety of data sets
totalPrivate <- us_employment |> filter(Title == 'Total Private')
head(totalPrivate)
## # A tsibble: 6 x 4 [1M]
## # Key: Series_ID [1]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
Autoplot of Total Private
totalPrivate |> autoplot()
## Plot variable not specified, automatically selected `.vars = Employed`
gg_season of Total Private
totalPrivate |> gg_season(Employed)
gg_subseries of Total Private
totalPrivate |> gg_subseries(Employed)
gg_lag of Total Private
totalPrivate |> gg_lag(Employed, geom = 'point')
ACF - Total Private
totalPrivate |> ACF(Employed) |> autoplot()
Based on the above plots, I answer the below questions
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
head(bricks)
## # A tsibble: 6 x 2 [1Q]
## Quarter Bricks
## <qtr> <dbl>
## 1 1956 Q1 189
## 2 1956 Q2 204
## 3 1956 Q3 208
## 4 1956 Q4 197
## 5 1957 Q1 187
## 6 1957 Q2 214
Autoplot of Bricks
bricks |> autoplot()
## Plot variable not specified, automatically selected `.vars = Bricks`
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season of Bricks
bricks |> gg_season(Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries of Bricks
bricks |> gg_subseries(Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag of Bricks
bricks |> gg_lag(Bricks, geom = 'point')
## Warning: Removed 20 rows containing missing values (gg_lag).
ACF - Bricks
bricks |> ACF(Bricks) |> autoplot()
Based on the above plots, I answer the below questions
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
hare <- pelt |> select(Year, Hare)
hare <- as_tsibble(hare, index = Year)
head(hare)
## # A tsibble: 6 x 2 [1Y]
## Year Hare
## <dbl> <dbl>
## 1 1845 19580
## 2 1846 19600
## 3 1847 19610
## 4 1848 11990
## 5 1849 28040
## 6 1850 58000
Autoplot of Hare
hare |> autoplot(Hare)
gg_season of Hare
gg_season won’t work for this tsibble because the data is on a yearly interval. The function is meant to demonstrate seasonality within a year which we cannot do since we only have yearly data.
gg_subseries of Hare
given data is on a yearly basis, the below chart is similar to the autoplot chart, though it does display the average annual value.
hare |> gg_subseries(Hare)
gg_lag of Hare
hare |> gg_lag(Hare, geom = 'point')
ACF - Hare
hare |> ACF(Hare) |> autoplot()
Based on the above plots, I answer the below questions
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
h02 <- PBS |> filter(ATC2 =='H02')
head(h02)
## # A tsibble: 6 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [1]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-payme… H Systemic… H02 CORTICOS… 63261 317384
## 2 1991 Aug Concessional Co-payme… H Systemic… H02 CORTICOS… 53528 269891
## 3 1991 Sep Concessional Co-payme… H Systemic… H02 CORTICOS… 52822 269703
## 4 1991 Oct Concessional Co-payme… H Systemic… H02 CORTICOS… 54016 280418
## 5 1991 Nov Concessional Co-payme… H Systemic… H02 CORTICOS… 49281 268070
## 6 1991 Dec Concessional Co-payme… H Systemic… H02 CORTICOS… 51798 277139
Since the question is asking to analyze H02 without considering Concession and Type, I will tally up the total costs for H02s across categories
h02Total <- h02 |> summarise(totalCost = sum(Cost))
head(h02Total)
## # A tsibble: 6 x 2 [1M]
## Month totalCost
## <mth> <dbl>
## 1 1991 Jul 429795
## 2 1991 Aug 400906
## 3 1991 Sep 432159
## 4 1991 Oct 492543
## 5 1991 Nov 502369
## 6 1991 Dec 602652
Autoplot of H02
h02Total |> autoplot(totalCost)
gg_season of H02
h02Total |> gg_season(totalCost)
gg_subseries of H02
h02Total |> gg_subseries(totalCost)
gg_lag of H02
h02Total |> gg_lag(totalCost, geom = 'point')
ACF - H02
h02Total |> ACF(totalCost) |> autoplot()
Based on the above plots, I answer the below questions
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
head(us_gasoline)
## # A tsibble: 6 x 2 [1W]
## Week Barrels
## <week> <dbl>
## 1 1991 W06 6.62
## 2 1991 W07 6.43
## 3 1991 W08 6.58
## 4 1991 W09 7.22
## 5 1991 W10 6.88
## 6 1991 W11 6.95
Autoplot of barrels
us_gasoline |> autoplot(Barrels)
gg_season of barrels
us_gasoline |> gg_season(Barrels)
gg_subseries of barrels
since the data is weekly, this graph is a bit noisy, though it does show that there tends to be a dip in us gasoline supplied in the third quarter (or so) of the year compared to the prior weeks.
us_gasoline |> gg_subseries(Barrels)
gg_lag of barrels
us_gasoline |> gg_lag(Barrels, geom = 'point')
ACF - barrels
us_gasoline |> ACF(Barrels) |> autoplot()
Based on the above plots, I answer the below questions
Can you spot any seasonality, cyclicity and trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?