DATA 624 - Homework #1

Question 2.1

Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent. a.Use autoplot() to plot some of the series in these data sets.

#help allows us to explore each series in detail

help("gafa_stock")
## starting httpd help server ... done
help("PBS")
help("vic_elec")
help("pelt")
head(gafa_stock)
## # A tsibble: 6 x 8 [!]
## # Key:       Symbol [1]
##   Symbol Date        Open  High   Low Close Adj_Close    Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
## 1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
## 2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
## 3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
## 4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
## 5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
## 6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
gafa_stock %>% autoplot(Open)

autoplot(vic_elec, Demand) +
  labs(title = "Electricity Demand",
       subtitle = "Victoria - Australia",
       y = "MWTTS")

b.What is the time interval of each series?

interval(gafa_stock)
## <interval[1]>
## [1] !
interval(PBS)
## <interval[1]>
## [1] 1M
interval(vic_elec)
## <interval[1]>
## [1] 30m
interval(pelt)
## <interval[1]>
## [1] 1Y

gafa_stock: One day

PBS: One month

vic_elec: 30 minutes

pelt: One year

Question 2.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

the_output <- gafa_stock %>% 
             group_by(Symbol) %>%
             filter(Close == max(Close)) %>%
             arrange(desc(Close))
the_output
## # A tsibble: 4 x 8 [!]
## # Key:       Symbol [4]
## # Groups:    Symbol [4]
##   Symbol Date        Open  High   Low Close Adj_Close   Volume
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 AMZN   2018-09-04 2026. 2050. 2013  2040.     2040.  5721100
## 2 GOOG   2018-07-26 1251  1270. 1249. 1268.     1268.  2405600
## 3 AAPL   2018-10-03  230.  233.  230.  232.      230. 28654800
## 4 FB     2018-07-25  216.  219.  214.  218.      218. 58954200

Peak closing price for AMZN is with price 2039.51 Peak closing price for GOOG is with price 1268.33
Peak closing price for AAPL is with price 232.07
Peak closing price for FB is with price 217.50

Question 2.3

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

  1. You can read the data into R with the following script:
tute1 <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA624/main/tute1.csv") 
head(tute1)
##      Quarter  Sales AdBudget   GDP
## 1 1981-03-01 1020.2    659.2 251.8
## 2 1981-06-01  889.2    589.0 290.9
## 3 1981-09-01  795.0    512.5 290.8
## 4 1981-12-01 1003.9    614.1 292.4
## 5 1982-03-01 1057.7    647.2 279.1
## 6 1982-06-01  944.4    602.0 254.0
  1. Convert the data to time series
mytimeseries <- tute1 %>%
  mutate(Quarter = yearmonth(Quarter)) %>%
  as_tsibble(index = Quarter)
  1. Construct time series plots of each of the three series
mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() 

  #facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

without ‘facet_grid()’ all series are jumbled into one graph [notice value]

Question 2.4

The USgas package contains data on the demand for natural gas in the US. a. Install the USgas package.

library(USgas)
  1. Create a tsibble from us_total with year as the index and state as the key.
us_total_tb <- us_total

us_total_tb <- us_total_tb %>%
  as_tsibble(index = year, key = state)

head(us_total_tb)
## # A tsibble: 6 x 3 [1Y]
## # Key:       state [1]
##    year state        y
##   <int> <chr>    <int>
## 1  1997 Alabama 324158
## 2  1998 Alabama 329134
## 3  1999 Alabama 337270
## 4  2000 Alabama 353614
## 5  2001 Alabama 332693
## 6  2002 Alabama 379343

c.Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

newengland_gc <- us_total_tb %>%
  filter(state == 'Maine' |
           state == 'Vermont' |
           state == 'New Hampshire' |
           state == 'Massachusetts' |
           state == 'Connecticut' |
           state == 'Rhode Island') %>%
  mutate(y = y/1e3)
#the above mutate y/1e3 is to help visualize in thousandths

head(newengland_gc)
## # A tsibble: 6 x 3 [1Y]
## # Key:       state [1]
##    year state           y
##   <int> <chr>       <dbl>
## 1  1997 Connecticut  145.
## 2  1998 Connecticut  131.
## 3  1999 Connecticut  152.
## 4  2000 Connecticut  160.
## 5  2001 Connecticut  146.
## 6  2002 Connecticut  178.
autoplot(newengland_gc, y) +
  labs(title = "The annual natural gas consumption by state",
       subtitle = "New England Zone",
       y = "Consumption in thousands")

Question 2.5

  1. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
#tourism_xlsx <- readxl::read_excel("C:/Users/Pc/Downloads/tourism.xlsx")
myxlsx = "https://raw.githubusercontent.com/johnm1990/DATA624/main/tourism.xlsx"
tourism_xlsx <- read.xlsx(myxlsx, sheet=1, startRow=1)
head(tourism_xlsx)
##      Quarter   Region           State  Purpose    Trips
## 1 1998-01-01 Adelaide South Australia Business 135.0777
## 2 1998-04-01 Adelaide South Australia Business 109.9873
## 3 1998-07-01 Adelaide South Australia Business 166.0347
## 4 1998-10-01 Adelaide South Australia Business 127.1605
## 5 1999-01-01 Adelaide South Australia Business 137.4485
## 6 1999-04-01 Adelaide South Australia Business 199.9126
index(tourism)
## Quarter
key(tourism)
## [[1]]
## Region
## 
## [[2]]
## State
## 
## [[3]]
## Purpose
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

b.Create a tsibble which is identical to the tourism tsibble from the tsibble package.

tourism_xlsx_tb <- tourism_xlsx %>% 
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = c(Region, State, Purpose)) -> tourism_xlsx
head(tourism_xlsx_tb)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

c.Find what combination of Region and Purpose had the maximum number of overnight trips on average.

Appears that output stating purpose as ‘Visting’ and ‘Region’ indicating Melbourne for most trips

tourism_xlsx_tb %>% group_by(Region, Purpose) %>%
 summarise(Trips = mean(Trips)) %>%
 ungroup() %>%
 filter(Trips == max(Trips))
## # A tsibble: 1 x 4 [1Q]
## # Key:       Region, Purpose [1]
##   Region    Purpose  Quarter Trips
##   <chr>     <chr>      <qtr> <dbl>
## 1 Melbourne Visiting 2017 Q4  985.

d.Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

t_by_state <- tourism_xlsx_tb %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips)) %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(index = Quarter, key = State)

head(t_by_state)
## # A tsibble: 6 x 3 [1Q]
## # Key:       State [1]
##   State Quarter Trips
##   <chr>   <qtr> <dbl>
## 1 ACT   1998 Q1  551.
## 2 ACT   1998 Q2  416.
## 3 ACT   1998 Q3  436.
## 4 ACT   1998 Q4  450.
## 5 ACT   1999 Q1  379.
## 6 ACT   1999 Q2  558.

Question 2.8

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

head(aus_retail)
## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State                        Industry            `Series ID`    Month Turnover
##   <chr>                        <chr>               <chr>          <mth>    <dbl>
## 1 Australian Capital Territory Cafes, restaurants~ A3349849A   1982 Apr      4.4
## 2 Australian Capital Territory Cafes, restaurants~ A3349849A   1982 May      3.4
## 3 Australian Capital Territory Cafes, restaurants~ A3349849A   1982 Jun      3.6
## 4 Australian Capital Territory Cafes, restaurants~ A3349849A   1982 Jul      4  
## 5 Australian Capital Territory Cafes, restaurants~ A3349849A   1982 Aug      3.6
## 6 Australian Capital Territory Cafes, restaurants~ A3349849A   1982 Sep      4.2
set.seed(718212)
x <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

head(x)
## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State           Industry                         `Series ID`    Month Turnover
##   <chr>           <chr>                            <chr>          <mth>    <dbl>
## 1 South Australia Electrical and electronic goods~ A3349361W   1982 Apr     16  
## 2 South Australia Electrical and electronic goods~ A3349361W   1982 May     19  
## 3 South Australia Electrical and electronic goods~ A3349361W   1982 Jun     18.1
## 4 South Australia Electrical and electronic goods~ A3349361W   1982 Jul     20.3
## 5 South Australia Electrical and electronic goods~ A3349361W   1982 Aug     19.6
## 6 South Australia Electrical and electronic goods~ A3349361W   1982 Sep     19.9

Explore your chosen retail time series using the following functions: autoplot(), gg_season(), gg_subseries(), gg_lag(),

ACF() %>% autoplot()

Using the exploration tools we see an increase in trend

autoplot(x, Turnover) +
  labs(title = "Turnover for Electrical and electronic goods retailing",
       subtitle = "Series: A3349361W",
       y = "Turnover")

gg_season(x, Turnover) +
  labs(title = "Turnover for Queensland Takeaway food services",
       subtitle = "Series: A3349361W",
       y = "Turnover")

Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

From the initial visualization exploration graph we see a positive increasing trend from 1990 to 2020. Seasonality also may be witnessed as defined Seasonal A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known period. Cyclic A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions, and are often related to the “business cycle.” The duration of these fluctuations is usually at least 2 years.