Exercises from Section 2 of Forecasting: Principles & Practice at https://otexts.com/fpp3/graphics-exercises.html
#?gafa_stock
#?PBS
#?vic_elec
#?pelt
autoplot(gafa_stock)
autoplot(vic_elec)
gafa_stock is daily, with the omission of weekends and federal holidays
PBS is monthly
vic_elec is in half-hour (30 minute) increments
pelt is annual
gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close))
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
tute1 <- readr::read_csv("tute1.csv")
#View(tute1)
#formatting as a tsibble
mytimeseries <- tute1 %>%
mutate(Quarter = yearmonth(Quarter)) %>%
as_tsibble(index = Quarter)
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Without facet_grid() each of the 4 variables are plotted on the same axes. In a different datasat the scale of each variable could have made these overlap each other, but the nature of each of these variables is that their y-axis scales don’t appear to overlab since they are, of course, measured differently and measure different things.
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
library("USgas")
usgas <- us_total %>%
tsibble(
key = state,
index = year
)
autoplot(usgas %>%
filter(state == c("Maine", "Vermont", "New Hampshire", "massachusetts", "conneticut", "Rhode Island")))
tourism_raw <- readxl::read_excel("tourism.xlsx")
# make date class
tourism_raw$Quarter <- as.Date(tourism_raw$Quarter, "%Y-%m-%d")
# put in yyyy qq format
tourism_raw$Quarter <- yearquarter(tourism_raw$Quarter)
# put into tsibble
tourism_2 <- tourism_raw %>%
tsibble(key = c(Region, State, Purpose, Trips),
index = Quarter
)
# check tsibble from package to compare
#view(tourism)
# select needed variables, group by prompt, add trips_avg variable,
# drop Trips so you can de-dupe, then arrange by trips_avg in descending order
tourism_raw %>%
select(c(Region, Purpose, Trips)) %>%
group_by(Region, Purpose) %>%
mutate(trips_avg = mean(Trips)) %>%
select(-Trips) %>%
distinct() %>%
arrange(desc(trips_avg))
## # A tibble: 304 x 3
## # Groups: Region, Purpose [304]
## Region Purpose trips_avg
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
## 6 Gold Coast Holiday 528.
## 7 Melbourne Holiday 507.
## 8 South Coast Holiday 495.
## 9 Brisbane Visiting 493.
## 10 Melbourne Business 478.
## # ... with 294 more rows
I think I’m interpreting this prompt correctly,I’ve created a tsibble that maintains the Quarter variable as the index and for each quarter shows the total trips within that State with regard to the quarter and the purpose_region. Looking at the first few rows of data below, one could say that business trips to Canberra in the state of ACT decreased from Q1 to Q2 of 1998.
tourism_states <- tourism_raw %>%
select(c(Region, Purpose, Trips, State, Quarter)) %>%
group_by(State, Quarter) %>%
mutate(purpose_region = paste(Purpose, "-", Region),
state_tot_trips = sum(Trips)) %>%
select(-c(Region, Purpose, Trips)) %>%
distinct() %>%
as_tsibble(key = c(State, purpose_region),
index = Quarter)
head(tourism_states)
## # A tsibble: 6 x 4 [1Q]
## # Key: State, purpose_region [1]
## # Groups: State @ Quarter [6]
## State Quarter purpose_region state_tot_trips
## <chr> <qtr> <chr> <dbl>
## 1 ACT 1998 Q1 Business - Canberra 551.
## 2 ACT 1998 Q2 Business - Canberra 416.
## 3 ACT 1998 Q3 Business - Canberra 436.
## 4 ACT 1998 Q4 Business - Canberra 450.
## 5 ACT 1999 Q1 Business - Canberra 379.
## 6 ACT 1999 Q2 Business - Canberra 558.
autoplot(aus_production, Bricks)
autoplot(gafa_stock, Close)
autoplot(vic_elec, Demand)
autoplot(pelt, Lynx) +
labs(title = "Canadian Lynx furs trading records by year, 1845-1935",
subtitle = "from the Hudson Bay Company",
y = "# pelts traded")
set.seed(8675309)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)
gg_season(myseries)
gg_subseries(myseries)
gg_lag(myseries)
myseries %>%
ACF(Turnover) %>%
autoplot()
The autoplot() graph shows a definite upward trend in retail turnover, with a hint of some sort of repeated pattern but it’s hard to tell from this generic plot. In looking at the gg_season() plot we again see more recent years (pink and purple) higher on the y-axis of retail turnover. Further, there is appears to be a dip in many years in February, with an increase in December. The gg_subseries() plot is a bit hard to read with the year-span so thin, but seeing the blue lines of what I assume is a median does show the dip in February and the slight increase in December I noticed earlier.
The gg_lag() plot shows a high correlation across all lags, while the ACF and plot might show a very slight scalloped shape which hints at the upward trend and minimal seasonality playing off of each other in this chart.
I’d feel fairly safe, at this stage of exploration, saying there appears to be an upward trend of retail turnover increasing as well as consistent peaks in December, dropping in January, and dropping to it’s lowest in February. March-November appear to be relatively steady.