gafa_stock, PBS, vic_elec and pelt represent.The help function (?) displays info about each series in the Help window. The descriptions for each series are copied into the comments below.
?gafa_stock # Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple.
# All prices are in $USD
# Variables include: Open, High, Low, Close, Adj_Close, Volume over
# one key (Symbol)
?PBS # Monthly Medicare Australia prescription data.
# Variables include: Scripts and Cost over
# four keys (Concession, Type, ATC1, ATC2)
?vic_elec # Half-hourly electricity demand for Victoria, Australia.
# Variables include: Demand, Temperature, Date, Holiday over
# one key (Datetime)
?pelt # Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx
# furs from 1845 to 1935.
# This data contains trade records for all areas of the company.
# Variables include: Hare and Lynx trades over
# one key (Year)
autoplot() to plot some of the series in these data sets.I interpret this task as selecting one or more series from each data set and plotting a time series.
The first plot is Amazon’s stock volumes over time (day).
AMZN <- gafa_stock %>%
filter(Symbol == "AMZN") %>% # filter for AMZN only
mutate(Vol = Volume/1e6) %>% # divide Volume by 1M for y-axis scaling
select(Date,Vol) %>% # select only the Date and Vol columns
mutate(Date = ymd(Date)) %>% # transform the Date text to date type
as_tsibble(index = Date) # set as a tsibble
autoplot(AMZN,Vol) +
labs(y = "Volume (M)",
title = "Amazon (AMZN) stock volume")
The second plot is for ACT2 series D01 total cost over time (month).
D01 <- PBS %>%
filter(ATC2 == "D01") %>% # filter for ATC2 type "D01"
select(Month,Concession,Type,Cost) %>% # select specific columns
summarize(TotalC = sum(Cost)) %>% # summarize the cost grouping by the selected columns
mutate(Cost = TotalC/1e5) # divide Cost by 100K for y-axis scaling
autoplot(D01,Cost) +
labs(y = "Total Cost ('000)",
title = "Total Cost of ATC2 D01")
The third plot is electricity demand in Victoria, Australia over time(half-hours).
e <- vic_elec %>%
filter(year(Time) == 2014) # filter for only 2014 data
autoplot(e,Demand) +
labs(y = "GW", title = "Half-hourly electricity demand: Victoria")
The fourth plot looks at volumes of Hare pelts over time (year).
Hares <- pelt %>%
select(Year,Hare)
autoplot(pelt,Hare) +
labs(y = "Hares", title = "Annual Hare pelt volumes")
The time interval for stock prices is daily, for pharmaceuticals is monthly, for electricity is half-hourly and for pelt volumes is annually.
filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.peak_close_days <- gafa_stock %>%
group_by(Symbol) %>% # group by Symbol so that the data set returns a High price for each stock
filter(High == max(High)) %>% # filter the data set for the High peak for each stock
select(Symbol,High) # select only the stock symbol and High price for each stock
# note that Date will also display because it is the key to the tsibble.
peak_close_days
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol High Date
## <chr> <dbl> <date>
## 1 AAPL 233. 2018-10-03
## 2 AMZN 2050. 2018-09-04
## 3 FB 219. 2018-07-25
## 4 GOOG 1274. 2018-07-27
tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.tute1 <- readr::read_csv("tute1.csv") # the data is imported as a tibble
## Rows: 100 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Quarter
## dbl (3): Sales, AdBudget, GDP
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(tute1)
## # A tibble: 6 x 4
## Quarter Sales AdBudget GDP
## <chr> <dbl> <dbl> <dbl>
## 1 3/1/1981 1020. 659. 252.
## 2 6/1/1981 889. 589 291.
## 3 9/1/1981 795 512. 291.
## 4 12/1/1981 1004. 614. 292.
## 5 3/1/1982 1058. 647. 279.
## 6 6/1/1982 944. 602 254
mytimeseries <- tute1 %>%
mutate(Quarter = yearmonth(Quarter)) %>% # mutate the Quarter <chr> to a monthly series <S3: yearmonth>
as_tsibble(index = Quarter) # set the Quarter field as the index to the tsibble
head(mytimeseries)
## # A tsibble: 6 x 4 [3M]
## Quarter Sales AdBudget GDP
## <mth> <dbl> <dbl> <dbl>
## 1 1981 Mar 1020. 659. 252.
## 2 1981 Jun 889. 589 291.
## 3 1981 Sep 795 512. 291.
## 4 1981 Dec 1004. 614. 292.
## 5 1982 Mar 1058. 647. 279.
## 6 1982 Jun 944. 602 254
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include facet_grid()
The series area plotted on a single plot with a single shared y-axis scale.
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
USgas package.if (!require("USgas",character.only = TRUE)) (install.packages("USgas",dep=TRUE))
library(USgas)
us_total with year as the index and state as the key.us_total <- us_total %>%
as_tsibble(key = state,
index = year)
head(us_total)
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
NEgas <- us_total %>%
filter(state == c('Maine','Vermont','New Hampshire','Massachusetts','Connecticut','Rhode Island')) %>% # select only the New England states
mutate(y = y/1000) # mutate the yearly totals for y-scaling
autoplot(NEgas,y) + labs(title = "Annual natural gas consumption by state",
subtitle = "New England area",
y = "yearly total (B)") # the scaling above mutated the M values into B
Although it is clear in the above plot that Massachusetts uses far more natural gas than Vermont, it is not clear if there is any variation in usage in the states with less utilization.
It turns out that it possible to apply a facet grid to autoplot(), which then enables the use free y-axis scaling. This permits us to see that there is indeed variation in Vermont’s usage as seen below, as well as for Rhode Island.
autoplot(NEgas,y) +
labs(title = "Annual natural gas consumption by state",
subtitle = "New England area",
y = "yearly total (B)") +
facet_grid(state ~ ., scales = "free_y")
tourism.xlsx from the book website and read it into R using readxl::read_excel()tourism <- readxl::read_excel("tourism.xlsx") # the data is imported as a tibble
head(tourism)
## # A tibble: 6 x 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
tourism tsibble from the tsibble packagetourism_ts <- tourism %>%
mutate(Quarter = yearquarter(Quarter)) %>% # mutate the Quarter <chr> to a quarterly series <S3: yearquarter>
as_tsibble(key = c("Region","State","Purpose"), # set the Quarter field as the index to the tsibble
index = Quarter)
head(tourism_ts)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
Region and Purpose had the maximum number of overnight trips on average.It appears that Sydney had the highest average number of trips for the purpose of Visiting.
max_trips <- tourism_ts %>%
group_by(Region,Purpose) %>% # group by Region and Purpose so that the data set returns the max trips for each group
mutate(Trips = mean(Trips)) %>% # find the average number of trips per group
distinct(Region,Purpose,Trips) %>% # get the distinct values
ungroup %>% # ungroup the data
filter(Trips == max(Trips)) # filter for the Region and Purpose with the highest average
max_trips
## # A tibble: 1 x 3
## Region Purpose Trips
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
tourism_totals <- tourism %>%
select(-Region,-Purpose) %>% # remove the Region and Purpose columns
group_by(State) %>% # group by State
mutate(Trips = sum(Trips)) %>% # sum the Trips by State
distinct(State,Trips) %>% # display the totals by each State
arrange(State) # arrange the States alphabetically
tourism_totals
## # A tibble: 8 x 2
## # Groups: State [8]
## State Trips
## <chr> <dbl>
## 1 ACT 41007.
## 2 New South Wales 557367.
## 3 Northern Territory 28614.
## 4 Queensland 386643.
## 5 South Australia 118151.
## 6 Tasmania 54137.
## 7 Victoria 390463.
## 8 Western Australia 147820.
aus_retail. Select one of the time series as follows (but choose your own seed value):set.seed(54321)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 South Australia Clothing retailing A3349360V 1982 Apr 19.1
## 2 South Australia Clothing retailing A3349360V 1982 May 21.6
## 3 South Australia Clothing retailing A3349360V 1982 Jun 18.3
## 4 South Australia Clothing retailing A3349360V 1982 Jul 18.6
## 5 South Australia Clothing retailing A3349360V 1982 Aug 17.1
## 6 South Australia Clothing retailing A3349360V 1982 Sep 18.2
Explore your chosen retail time series using the following functions:
autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() %>% autoplot()
Autoplot the Turnover by month.
autoplot(myseries,Turnover) +
labs(title = "Turnover by Month")
Do a gg_season plot of the Turnover by month.
myseries %>% gg_season(Turnover, period = "year") +
theme(legend.position = "right") +
labs(y = "Turnover", title="Turnover by Month")
Do a gg_subseries plot of the Turnover by month.
myseries %>% gg_subseries(Turnover) +
labs(y = "Turnover", title="Turnover by Month")
Do a gg_lag plot of the Turnover by month.
myseries %>% gg_lag(Turnover, geom = "point") +
labs(x = "lag(Turnover,k)", title="Turnover by Month")
Do an Autocorrelation plot of the Turnover by month.
myseries %>%
ACF(Turnover) %>%
autoplot() + labs(title = "Turnover by Month")
Can you spot any seasonality, cyclicity and trend?
The autoplot hints at some seasonality in the “heartbeat” pattern within each year, and there also appears that there was an upward trend between about 2000 and 2010. The gg_season plot confirms that there is seasonality with a spike in December followed by a low point in the following February of each year, perhaps influenced by the Christmas/Boxing holidays. Looking at the year-over-year trend in each month in the gg_subseries plot, we can also see the upward trend from 2000 to 2010, which was both preceded and followed by what may be cyclic behaviors. The gg_lag plot shows a strong positive relationship in turnover within every lag period, which confirms the annual seasonality in the data. Lastly, the autocorrelation plot displays both seasonality and trending with teh scalloped shape from year to year and the overall slow decrease in the trends.
What do you learn about the series?
Retail clothing turnover in South Australia follows a seasonal trend that displays cyclical patterns of rising and falling approximately every 10 years. Further inquiry is warranted to discover more about the data set.