Assignment 1 involves answering questions 2.1, 2.2, 2.3, 2.4, 2.5, 2.8 from the textbook Forecasting: principles and practice by Rob J Hyndman and George Athanasopoulos.
Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
The aus_production dataset looks at the Quartery production of specific commordities in Australia. This includes Beer, Tobacco, Bricks, Cement, Electricity, and Gas.
The time interval is Quarterly.
aus_production
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
aus_production <- aus_production %>%
select(Quarter, Bricks)
autoplot(aus_production, Bricks) +
labs(y = "$ (hundreds)",
title = "Quartery Production of Bricks")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
The pelt dataset contains the Hudson Bay Company trading records for Snowshow Hare and Canadian Lynx furs.
The time interval is yearly.
pelt
## # A tsibble: 91 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
## 7 1851 74600 5560
## 8 1852 75090 5080
## 9 1853 88480 10170
## 10 1854 61280 19600
## # ℹ 81 more rows
pelt_l <- pelt %>%
select(Year, Lynx)
autoplot(pelt_l, Lynx) +
labs(y = "$ (thousands)",
title = "Trading Numbers for Canadian Lynx")
The gafa_stock dataset looks at stock prices from 2014-2018 for Google, Amazon, Facebook and Apple.
The time interval is daily.
gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
## 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
## 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
## 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
## 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
## # ℹ 5,022 more rows
gafa_stock <- gafa_stock %>%
select(Date, Close)
autoplot(gafa_stock, Close) +
labs(y = "Closing Price",
title = "Closing Price for GAFA Stocks")
Vic_elec contains half-hourly electricity demand for Victoria, Australia.
The time interval is half-hourly
vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # ℹ 52,598 more rows
vic_elec <- vic_elec %>%
select(Time, Demand)
autoplot(vic_elec, Demand) +
labs(y = "Demand in MWh",
title = "Electricity Demand for Victoria")
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
max_close <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close, na.rm = TRUE)) %>%
select(Symbol, Date, Close)
max_close
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents.
tute1 <- readr::read_csv("tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
When you don’t include facet_grid, all three variables share the same y-axis
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
The USgas package contains data on the demand for natural gas in the US.
Install the USgas package.
Create a tsibble from us_total with year as the index and state as the key.
c.Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island)
colnames(us_total)
## [1] "year" "state" "y"
us_total <- us_total %>%
as_tsibble(key = c(state),
index = year)
us_total_ne <- us_total %>%
filter(state == c("Maine","Vermont","New Hampshire","Massachusetts","Connecticut", "Rhode Island"))
autoplot(us_total_ne, y) +
labs(y = "Gas Consumption",
title = "Annual Gas Consumption ")
Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
Create a tsibble which is identical to the tourism tsibble from the tsibble package.
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
The combination of Melbourne (Region) and Visiting (Purpose) had the maximum number of overnight trips on average (985).
tourismxl <- readxl::read_excel("tourism.xlsx")
tourismxl <- tourismxl %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
tourismxl
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
tourismxl_2 <- tourismxl %>%
group_by(Region, Purpose) %>%
summarize(Average_trips = mean(Trips, na.rm = TRUE)) %>%
ungroup() %>%
filter(Average_trips == max(Average_trips))
print(tourismxl_2)
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter Average_trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
state_total_tsibble <- tourismxl %>%
group_by(State) %>%
summarize(Total_Trips = sum(Trips, na.rm = TRUE)) %>%
as_tsibble(index = Quarter, key = State)
print(state_total_tsibble)
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Total_Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Looking specifically at “Total Private” Employed individuals, I can observe the following:
Repeated spikes and drops on specific times of the year, there is over-all long term growth in the chart, and occasional cycles.
There is seasonality, trend and cyclic behavior observed. January, in particular appears to be a down year each year.
An unusual year is the dip in 2008 which corresponds to the recession of 2008.
glimpse(us_employment)
## Rows: 143,412
## Columns: 4
## Key: Series_ID [148]
## $ Month <mth> 1939 Jan, 1939 Feb, 1939 Mar, 1939 Apr, 1939 May, 1939 Jun, …
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed <dbl> 25338, 25447, 25833, 25801, 26113, 26485, 26481, 26848, 2746…
us_employment_tp <- us_employment %>%
filter(Title == "Total Private")
autoplot(us_employment_tp,Employed) +
labs(y = "Employment",
title = "US Employment Data")
us_employment_tp %>%
gg_season(Employed, labels ="both") +
labs(y = "Employment",
title = "Seasonal plot:US Employment Data")
us_employment_tp %>%
gg_subseries(Employed) +
labs(y = "Employment",
title = "US Employment Data")
us_employment_tp %>%
gg_lag(Employed, geom = "point") +
labs(x = "lag(Employed, k)")
us_employment_tp %>%
ACF(Employed, lag_max = 9)
## # A tsibble: 9 x 3 [1M]
## # Key: Series_ID [1]
## Series_ID lag acf
## <chr> <cf_lag> <dbl>
## 1 CEU0500000001 1M 0.997
## 2 CEU0500000001 2M 0.993
## 3 CEU0500000001 3M 0.990
## 4 CEU0500000001 4M 0.986
## 5 CEU0500000001 5M 0.983
## 6 CEU0500000001 6M 0.980
## 7 CEU0500000001 7M 0.977
## 8 CEU0500000001 8M 0.974
## 9 CEU0500000001 9M 0.971
There is seasonality present in the data. As there is a consistent pattern of fluctuations throughout the year. This could be linked to something like weather conditions or general demand.
There isn’t a clear trend as it’s unclear whether there is will be a decline or incline. From 1960 to 1980 however, we can see a clear upward trend however, afterwards there are periods sharp declines and some smaller upward movement.
Cyclic patterns are visible especially after 1980, with the declines and upward movement occuring at irregular intervals.
An usual year is around the early 1980s where there is a sharp decline in production.
glimpse(aus_production)
## Rows: 218
## Columns: 2
## $ Quarter <qtr> 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957 Q3,…
## $ Bricks <dbl> 189, 204, 208, 197, 187, 214, 227, 222, 199, 229, 249, 234, 20…
autoplot(aus_production,Bricks) +
labs(y = "Bricks",
title = "Brick Production in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_season(Bricks, labels ="both") +
labs(y = "Bricks",
title = "Brick Production in Australia")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).
aus_production %>%
gg_subseries(Bricks) +
labs(y = "Bricks",
title = "Brick Production in Australia")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
aus_production %>%
gg_lag(Bricks, geom = "point") +
labs(x = "lag(Employed, k)")
## Warning: Removed 20 rows containing missing values (gg_lag).
aus_production %>%
ACF(Bricks, lag_max = 9)
## # A tsibble: 9 x 2 [1Q]
## lag acf
## <cf_lag> <dbl>
## 1 1Q 0.900
## 2 2Q 0.815
## 3 3Q 0.813
## 4 4Q 0.828
## 5 5Q 0.720
## 6 6Q 0.642
## 7 7Q 0.655
## 8 8Q 0.692
## 9 9Q 0.609
There is no evidence for seasonality as there is no annual pattern of fluctuations. There are cycles that span multiple years.
The chart does not show a clear upward or downward trend.
There are however, noticeable cyclical patterns.
Something learned from the series is that the are long term peaks and sharp downward declines. This may be due to the availability of resources.
pelt_h <- pelt %>%
select(Year, Hare)
autoplot(pelt_h, Hare) +
labs(y = "Hare",
title = "Trading Records for Snowshoe Hare")
gg_subseries doesn’t work because it needs multiple categories.
pelt_h %>%
gg_subseries(Hare) +
labs(y = "Hares",
title = "Trading Records for Snowshoe Hare")
pelt_h %>%
gg_lag(Hare, geom = "point") +
labs(x = "lag(Hare, k)")
pelt_h %>%
ACF(Hare, lag_max = 9)
## # A tsibble: 9 x 2 [1Y]
## lag acf
## <cf_lag> <dbl>
## 1 1Y 0.658
## 2 2Y 0.214
## 3 3Y -0.155
## 4 4Y -0.401
## 5 5Y -0.493
## 6 6Y -0.401
## 7 7Y -0.168
## 8 8Y 0.113
## 9 9Y 0.307
All prescription types display seasonality as there are seasonal drops and rises that occur at times throughout the year.
There appears to be indications of an upward trend for both Concessional/Co-payments/H/H02, and Concessional/Safety net/H/H02.
There aren’t long term cycles outside of the seasonal patterns.
The upward trend for Concessional/Co-payments and Concessional/Safety net indicates there is an increase in growth and overall use. The Concessional category performs the best compared to the General category.
unique(PBS$ATC2)
## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A09" "A10" "A11" "A12" "A14"
## [13] "A15" "B01" "B02" "B03" "B05" "C01" "C02" "C03" "C04" "C05" "C07" "C08"
## [25] "C09" "C10" "D" "D01" "D02" "D04" "D05" "D06" "D07" "D08" "D10" "D11"
## [37] "G01" "G02" "G03" "G04" "H01" "H02" "H03" "H04" "H05" "J01" "J02" "J04"
## [49] "J05" "J06" "J07" "L01" "L02" "L03" "L04" "M01" "M02" "M03" "M04" "M05"
## [61] "N02" "N03" "N04" "N05" "N06" "N07" "P01" "P02" "P03" "R" "R01" "R03"
## [73] "R05" "R06" "S" "S01" "S02" "S03" "V01" "V03" "V04" "V06" "V07" "Z"
PBS_H0 <- PBS %>%
filter(ATC2 == "H02")
autoplot(PBS_H0,Cost) +
labs(y = "Cost",
title = "Monthly Medicare for Prescription in Australia")
PBS_H0 %>%
gg_season(Cost, labels ="both") +
labs(y = "Prescription Costs",
title = "Monthly Medicare for Prescription in Australia")
PBS_H0 %>%
gg_subseries(Cost) +
labs(y = "Prescription Costs",
title = "Monthly Medicare for Prescription in Australia")
PBS_H0 %>%
ACF(Cost, lag_max = 9)
## # A tsibble: 36 x 6 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Concession Type ATC1 ATC2 lag acf
## <chr> <chr> <chr> <chr> <cf_lag> <dbl>
## 1 Concessional Co-payments H H02 1M 0.834
## 2 Concessional Co-payments H H02 2M 0.679
## 3 Concessional Co-payments H H02 3M 0.514
## 4 Concessional Co-payments H H02 4M 0.352
## 5 Concessional Co-payments H H02 5M 0.264
## 6 Concessional Co-payments H H02 6M 0.219
## 7 Concessional Co-payments H H02 7M 0.253
## 8 Concessional Co-payments H H02 8M 0.337
## 9 Concessional Co-payments H H02 9M 0.464
## 10 Concessional Safety net H H02 1M 0.710
## # ℹ 26 more rows
There appears to be seasonality as there are periodic flunctuations ever year.
There is also an over upward growth trend.
There appears to be some cyclic behavior around the 2000s.
An unusual year is around 2008 where there is a big drop. This could be tied to the financial recession around this time.
autoplot(us_gasoline, Barrels) +
labs(y = "Barrels",
title = "US Finished Motor Gasoline Product Supplied")
us_gasoline %>%
gg_season(Barrels, labels ="both") +
labs(y = "Barrels",
title = "US Finished Motor Gasoline Product Supplied")
us_gasoline %>%
gg_subseries(Barrels) +
labs(y = "Barrels",
title = "US Finished Motor Gasoline Product Supplied")
us_gasoline %>%
gg_lag(Barrels, geom = "point") +
labs(x = "lag(Barrels, k)")
us_gasoline %>%
ACF(Barrels, lag_max = 9)
## # A tsibble: 9 x 2 [1W]
## lag acf
## <cf_lag> <dbl>
## 1 1W 0.893
## 2 2W 0.882
## 3 3W 0.873
## 4 4W 0.866
## 5 5W 0.847
## 6 6W 0.844
## 7 7W 0.832
## 8 8W 0.831
## 9 9W 0.822