Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent. a.Use autoplot() to plot some of the series in these data sets.
#help allows us to explore each series in detail
help("gafa_stock")
## starting httpd help server ... done
help("PBS")
help("vic_elec")
help("pelt")
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key: Symbol [1]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
gafa_stock %>% autoplot(Open)
autoplot(vic_elec, Demand) +
labs(title = "Electricity Demand",
subtitle = "Victoria - Australia",
y = "MWTTS")
b.What is the time interval of each series?
interval(gafa_stock)
## <interval[1]>
## [1] !
interval(PBS)
## <interval[1]>
## [1] 1M
interval(vic_elec)
## <interval[1]>
## [1] 30m
interval(pelt)
## <interval[1]>
## [1] 1Y
gafa_stock: One day
PBS: One month
vic_elec: 30 minutes
pelt: One year
Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
the_output <- gafa_stock %>%
group_by(Symbol) %>%
filter(Close == max(Close)) %>%
arrange(desc(Close))
the_output
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 2 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
## 3 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 4 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
Peak closing price for AMZN is with price 2039.51 Peak closing price for GOOG is with price 1268.33
Peak closing price for AAPL is with price 232.07
Peak closing price for FB is with price 217.50
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
tute1 <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA624/main/tute1.csv")
head(tute1)
## Quarter Sales AdBudget GDP
## 1 1981-03-01 1020.2 659.2 251.8
## 2 1981-06-01 889.2 589.0 290.9
## 3 1981-09-01 795.0 512.5 290.8
## 4 1981-12-01 1003.9 614.1 292.4
## 5 1982-03-01 1057.7 647.2 279.1
## 6 1982-06-01 944.4 602.0 254.0
mytimeseries <- tute1 %>%
mutate(Quarter = yearmonth(Quarter)) %>%
as_tsibble(index = Quarter)
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
#facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include facet_grid().
without ‘facet_grid()’ all series are jumbled into one graph [notice value]
The USgas package contains data on the demand for natural gas in the US. a. Install the USgas package.
library(USgas)
us_total_tb <- us_total
us_total_tb <- us_total_tb %>%
as_tsibble(index = year, key = state)
head(us_total_tb)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
c.Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
newengland_gc <- us_total_tb %>%
filter(state == 'Maine' |
state == 'Vermont' |
state == 'New Hampshire' |
state == 'Massachusetts' |
state == 'Connecticut' |
state == 'Rhode Island') %>%
mutate(y = y/1e3)
#the above mutate y/1e3 is to help visualize in thousandths
head(newengland_gc)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <dbl>
## 1 1997 Connecticut 145.
## 2 1998 Connecticut 131.
## 3 1999 Connecticut 152.
## 4 2000 Connecticut 160.
## 5 2001 Connecticut 146.
## 6 2002 Connecticut 178.
autoplot(newengland_gc, y) +
labs(title = "The annual natural gas consumption by state",
subtitle = "New England Zone",
y = "Consumption in thousands")
#tourism_xlsx <- readxl::read_excel("C:/Users/Pc/Downloads/tourism.xlsx")
myxlsx = "https://raw.githubusercontent.com/johnm1990/DATA624/main/tourism.xlsx"
tourism_xlsx <- read.xlsx(myxlsx, sheet=1, startRow=1)
head(tourism_xlsx)
## Quarter Region State Purpose Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126
index(tourism)
## Quarter
key(tourism)
## [[1]]
## Region
##
## [[2]]
## State
##
## [[3]]
## Purpose
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
b.Create a tsibble which is identical to the tourism tsibble from the tsibble package.
tourism_xlsx_tb <- tourism_xlsx %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c(Region, State, Purpose)) -> tourism_xlsx
head(tourism_xlsx_tb)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
c.Find what combination of Region and Purpose had the maximum number of overnight trips on average.
Appears that output stating purpose as ‘Visting’ and ‘Region’ indicating Melbourne for most trips
tourism_xlsx_tb %>% group_by(Region, Purpose) %>%
summarise(Trips = mean(Trips)) %>%
ungroup() %>%
filter(Trips == max(Trips))
## # A tsibble: 1 x 4 [1Q]
## # Key: Region, Purpose [1]
## Region Purpose Quarter Trips
## <chr> <chr> <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4 985.
d.Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
t_by_state <- tourism_xlsx_tb %>%
group_by(State) %>%
summarise(Trips = sum(Trips)) %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = State)
head(t_by_state)
## # A tsibble: 6 x 3 [1Q]
## # Key: State [1]
## State Quarter Trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
head(aus_retail)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Cafes, restaurants~ A3349849A 1982 Apr 4.4
## 2 Australian Capital Territory Cafes, restaurants~ A3349849A 1982 May 3.4
## 3 Australian Capital Territory Cafes, restaurants~ A3349849A 1982 Jun 3.6
## 4 Australian Capital Territory Cafes, restaurants~ A3349849A 1982 Jul 4
## 5 Australian Capital Territory Cafes, restaurants~ A3349849A 1982 Aug 3.6
## 6 Australian Capital Territory Cafes, restaurants~ A3349849A 1982 Sep 4.2
set.seed(718212)
x <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(x)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 South Australia Electrical and electronic goods~ A3349361W 1982 Apr 16
## 2 South Australia Electrical and electronic goods~ A3349361W 1982 May 19
## 3 South Australia Electrical and electronic goods~ A3349361W 1982 Jun 18.1
## 4 South Australia Electrical and electronic goods~ A3349361W 1982 Jul 20.3
## 5 South Australia Electrical and electronic goods~ A3349361W 1982 Aug 19.6
## 6 South Australia Electrical and electronic goods~ A3349361W 1982 Sep 19.9
Explore your chosen retail time series using the following functions: autoplot(), gg_season(), gg_subseries(), gg_lag(),
ACF() %>% autoplot()
Using the exploration tools we see an increase in trend
autoplot(x, Turnover) +
labs(title = "Turnover for Electrical and electronic goods retailing",
subtitle = "Series: A3349361W",
y = "Turnover")
gg_season(x, Turnover) +
labs(title = "Turnover for Queensland Takeaway food services",
subtitle = "Series: A3349361W",
y = "Turnover")
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
From the initial visualization exploration graph we see a positive increasing trend from 1990 to 2020. Seasonality also may be witnessed as defined Seasonal A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known period. Cyclic A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions, and are often related to the “business cycle.” The duration of these fluctuations is usually at least 2 years.