2.1: Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
Use ? (or help()) to find out about the data in each series.
Bricks
refers to the productions of clay bricks (in
millions) in Australia from 1956-2010 from the tsibble
aus_production
.
Lynx
refers to the number of Canadian Lynx pelts that were
traded by the Hudsun Bay Company form 1845 to 1935 from the tsibble
pelt
.
Close
refers to the stock closing price (in USD) for
Amazon, Google, Facebook, and Apple from 2014-2018 in the irregular
trading days tsibble gafa_stock
.
Demand
refers to the total electricity demand in Victoria,
Australia (in MWh) from the tsibble vic_elec
,
What is the time interval of each series?
Bricks from aus_production: Quarterly, with a total of 4 quarters
(every 3 months)
Lynx from pelt: Yearly/annual (every 12 months)
Close from gafa_stock: Irregular trading days
Demand from vic_elec: Half-hour (every 30 minutes)
?aus_production
## starting httpd help server ... done
?pelt
?gafa_stock
?vic_elec
Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.
autoplot(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
autoplot(pelt, Lynx)
autoplot(gafa_stock, Close)
autoplot(vic_elec, Demand) +
labs(title = "Electricity Demand in MWh",
subtitle = " Victoria, Australia",
y = "Total electricity demand (MWh)",
x = "Half-hour time interval (2012-2015)")
2.2 Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
glimpse(gafa_stock)
## Rows: 5,032
## Columns: 8
## Key: Symbol [4]
## $ Symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAP…
## $ Date <date> 2014-01-02, 2014-01-03, 2014-01-06, 2014-01-07, 2014-01-08,…
## $ Open <dbl> 79.38286, 78.98000, 76.77857, 77.76000, 76.97285, 78.11429, …
## $ High <dbl> 79.57571, 79.10000, 78.11429, 77.99429, 77.93714, 78.12286, …
## $ Low <dbl> 78.86000, 77.20428, 76.22857, 76.84571, 76.95571, 76.47857, …
## $ Close <dbl> 79.01857, 77.28286, 77.70428, 77.14857, 77.63715, 76.64571, …
## $ Adj_Close <dbl> 66.96433, 65.49342, 65.85053, 65.37959, 65.79363, 64.95345, …
## $ Volume <dbl> 58671200, 98116900, 103152700, 79302300, 64632400, 69787200,…
peak_close_gs <- gafa_stock %>%
select(Symbol, Date, Close) %>%
group_by(Symbol) %>%
filter(Close == max(Close))
peak_close_gs
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AAPL 2018-10-03 232.
## 2 AMZN 2018-09-04 2040.
## 3 FB 2018-07-25 218.
## 4 GOOG 2018-07-26 1268.
The days which correspond to the peak closing price for each of the four stocks in gafa_stock are:
Apple: 10/3/2018 with a closing price of 232.07
Amazon: 9/4/2018 with a closing price of 2039.51
Facebook: 7/25/2018 with a closing price of 217.50
Google: 7/26/2018 with a closing price of 1268.33
2.3 Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
a. You can read the data into R with the following script:
tute1 <- readr::read_csv("C:\\Users\\Kesha\\Desktop\\Spring 2024\\DATA 624\\tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(tute1)
b. Convert the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
c. Construct time series plots of each of the three series. Check what happens when you don’t include facet_grid().
Facet_grid creates a matrix of each variable (Sales, Adbudget, and GDP) into three plots but in one output. Without facet_wrap, we can see that the plots of each variable is now combined in the same plot without their own y-labels. Sales, Adbudget, and GDP also had to be rearranged to fit the y-axis values, therefore, they are not in the same order as the plot with facet_grid.
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
# Without face_grid
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
2.4 The USgas package contains data on the demand for natural gas in the US.
Install the USgas package.
#install.packages("USgas")
library(USgas)
Create a tsibble from us_total with year as the index and state as the key.
view(us_total)
us_total_tsibble <- us_total %>%
as_tsibble(key = state,
index = year)
us_total_tsibble
## # A tsibble: 1,266 x 3 [1Y]
## # Key: state [53]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## 7 2003 Alabama 350345
## 8 2004 Alabama 382367
## 9 2005 Alabama 353156
## 10 2006 Alabama 391093
## # ℹ 1,256 more rows
Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
us_total_tsibble %>%
filter(state %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts",
"Connecticut", "Rhode Island")) %>%
autoplot() +
labs(title = "Annual Natural Gas Consumption for the New England Area",
y = "Gas Consumption (MMcf)",
x = "Yearly time interval")
## Plot variable not specified, automatically selected `.vars = y`
2.5 Download tourism.xlsx from the book website and read it into R using readxl::read_excel().
tourism1 <- readxl::read_excel("C:\\Users\\Kesha\\Desktop\\Spring 2024\\DATA 624\\tourism.xlsx")
View(tourism1)
Create a tsibble which is identical to the tourism tsibble from the tsibble package.
data(tourism)
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
tourism2 <- tourism1 %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(key = c(Region, State, Purpose),
index = Quarter)
tourism2
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
## 7 1999 Q3 Adelaide South Australia Business 169.
## 8 1999 Q4 Adelaide South Australia Business 134.
## 9 2000 Q1 Adelaide South Australia Business 154.
## 10 2000 Q2 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
Find what combination of Region and Purpose had the maximum number of overnight trips on average.
The region Sydney
and the purpose Visiting
had the maximum number of overnight trips on average with 747.2699682
average trips.
tourism1 %>%
group_by(Region, Purpose)%>%
summarise(trip_avg = mean(Trips))%>%
arrange(desc(trip_avg))
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 304 × 3
## # Groups: Region [76]
## Region Purpose trip_avg
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 Sydney Business 602.
## 4 North Coast NSW Holiday 588.
## 5 Sydney Holiday 550.
## 6 Gold Coast Holiday 528.
## 7 Melbourne Holiday 507.
## 8 South Coast Holiday 495.
## 9 Brisbane Visiting 493.
## 10 Melbourne Business 478.
## # ℹ 294 more rows
Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism_new <- tourism2 %>%
group_by(State) %>%
summarise(total_trips = sum(Trips))
tourism_new
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter total_trips
## <chr> <qtr> <dbl>
## 1 ACT 1998 Q1 551.
## 2 ACT 1998 Q2 416.
## 3 ACT 1998 Q3 436.
## 4 ACT 1998 Q4 450.
## 5 ACT 1999 Q1 379.
## 6 ACT 1999 Q2 558.
## 7 ACT 1999 Q3 449.
## 8 ACT 1999 Q4 595.
## 9 ACT 2000 Q1 600.
## 10 ACT 2000 Q2 557.
## # ℹ 630 more rows
2.8 Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.
Can you spot any seasonality, cyclicity and
trend?
What do you learn about the series?
What can you say about the seasonal patterns?
Can you identify any unusual years?
“Total Private” Employed from us_employment:
There does not seem to be an obvious cyclicity or seasonality in this
time series, the gg_lag shows that each month almost overlap as they
consistently increase. However, looking closely at gg_season, there is a
very slight increase over the summer months every year, this shows that
there might be seasonality here. The overall trend appears to be upward,
a positive linear relationship between time and us_employment from
January 1939 to June 2019 for the private sector. From the series, it
looks like the US employment will continue to increase at a predictable
rate unless an event occurs that affects the economy, as I assume
COVID-19 did if we had data from those years. I would say an unusual
year for this time series was around 2009 from autoplot where we see
there was actually a very large decrease in employment which could be
due to the recession during that time in the US.
Bricks from aus_production:
There does seem to be some seasonality here for Australian brick
production, where Q1 (Jan-Mar) has a lot less brick production than Q2
(Apr-June) and especially Q3 (Jul-Sept). Then we start to see a decrease
again towards Q4 (Oct-Dec) - which is very apparent in gg_season for
almost every year. The overall trend looks like there was first an
upward trend then a downward trend. There was an increase in brick
production from the late 1950s to the 1980s, except for a large dip in
the mid-1970s. Then we see a decreasing trend with little peaks in
between continuing to the 2000s. From plotting this series, it does not
seem to be entirely predictable but there is seasonality which seems to
be somewhat consistent. It seems like 1983 was an unusual year, since we
see a significant drop in brick production, from my research since I did
not know much about Australia, it looks like their economy was suffering
during this time.
Hare from pelt:
The first thing I noticed from the pelt data is that because it is
annual, we actually cannot determine seasonality since we cannot see how
the number of Hare traded would behave in different seasons. However,
there does seem to be a cyclicity pattern as we look at autoplot, and
see intervals of peaks and dips throughout the years (1845-1935) of
Snowshoe Hare pelt trading. The trend seems to be this cycle of
increasing and decreasing, and from the entire series, while we see
cyclicity, it would be a little difficult to predict since the number of
pelts traded peaks and dips at different amounts in each cycle. An
unusual year was in the 1860s where we saw a dramatic increase in
Snowshoe Hare pelt trading.
“H02” Cost from PBS:
For PBS, I further filitered “concession” and “type” in the dataset to
use gg_lag. I ended up plotting the only ones with data, which was
co-payments for both concessions. There definitely seems to be a
seasonality, especially for “General - Safety Net”, “Concessional -
Safety Net” and “Concessional’ - Co-payments.” For “General - Safety
Net” and “Concessional - Safety Net,” the beginning of the year starts
off with Medicare prescription costs that were higher and then slowly
starts to decrease with very low costs during the Spring. Then the costs
start to pick up again increasing towards the rest of the year.
“Concessional’ - Co-payments,” on the other hand, starts off lower and
increases in the spring, then slowly decreases towards the end of the
year. The trend and from what I see in this time series, I would say
there is an overall upward trend with increasing Medicare Australia
prescription costs for all categories. An unusual year was around 1997
where “Concessional’ - Co-payments” decreased a bit.
Barrels from us_gasoline:
For us_gasoline, there does not appear to be a clear seasonality or
cyclicity since we see fluctuations all throughout each season from 1991
to 2017 in gg_season, without a visible cycle. In terms of trends of the
overall series, the US finished motor gasoline product supply has an
upward trend up until around 2003/2004 and then starts to slowly
decrease towards a downward tread, then picks back up again around 2013.
I would say the unusual years were around 2009 where we see the drop in
product supplied, but again, this could be related to the recession at
that time.
#"Total Private” Employed from us_employment
us_employment1 <- us_employment %>%
filter(Title == "Total Private")
autoplot(us_employment1, Employed)
gg_season(us_employment1, Employed)
gg_subseries(us_employment1, Employed)
gg_lag(us_employment1, Employed)
ACF(us_employment1, Employed)
## # A tsibble: 29 x 3 [1M]
## # Key: Series_ID [1]
## Series_ID lag acf
## <chr> <cf_lag> <dbl>
## 1 CEU0500000001 1M 0.997
## 2 CEU0500000001 2M 0.993
## 3 CEU0500000001 3M 0.990
## 4 CEU0500000001 4M 0.986
## 5 CEU0500000001 5M 0.983
## 6 CEU0500000001 6M 0.980
## 7 CEU0500000001 7M 0.977
## 8 CEU0500000001 8M 0.974
## 9 CEU0500000001 9M 0.971
## 10 CEU0500000001 10M 0.968
## # ℹ 19 more rows
#Bricks from aus_production
autoplot(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries(aus_production, Bricks)
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag(aus_production, Bricks)
## Warning: Removed 20 rows containing missing values (gg_lag).
ACF(aus_production, Bricks)
## # A tsibble: 22 x 2 [1Q]
## lag acf
## <cf_lag> <dbl>
## 1 1Q 0.900
## 2 2Q 0.815
## 3 3Q 0.813
## 4 4Q 0.828
## 5 5Q 0.720
## 6 6Q 0.642
## 7 7Q 0.655
## 8 8Q 0.692
## 9 9Q 0.609
## 10 10Q 0.556
## # ℹ 12 more rows
#Hare from pelt
pelt <- pelt
autoplot(pelt, Hare)
# gg_season(pelt, Hare) There are no "months" or "Quarters" in this variable for gg_season to show season trends
gg_subseries(pelt, Hare)
gg_lag(pelt, Hare)
ACF(pelt, Hare)
## # A tsibble: 19 x 2 [1Y]
## lag acf
## <cf_lag> <dbl>
## 1 1Y 0.658
## 2 2Y 0.214
## 3 3Y -0.155
## 4 4Y -0.401
## 5 5Y -0.493
## 6 6Y -0.401
## 7 7Y -0.168
## 8 8Y 0.113
## 9 9Y 0.307
## 10 10Y 0.340
## 11 11Y 0.296
## 12 12Y 0.206
## 13 13Y 0.0372
## 14 14Y -0.153
## 15 15Y -0.285
## 16 16Y -0.295
## 17 17Y -0.202
## 18 18Y -0.0676
## 19 19Y 0.0956
#H02” Cost from PBS
PBS1 <- PBS %>%
filter(ATC2 == "H02")
autoplot(PBS1, Cost)
gg_season(PBS1, Cost)
gg_subseries(PBS1, Cost)
pbs_cc <- PBS1 %>%
filter(Concession == "Concessional", Type == "Co-payments")
pbs_gc <- PBS1 %>%
filter(Concession == "General", Type == "Co-payments")
gg_lag(pbs_cc, Cost)
gg_lag(pbs_gc, Cost)
ACF(PBS1, Cost)
## # A tsibble: 92 x 6 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Concession Type ATC1 ATC2 lag acf
## <chr> <chr> <chr> <chr> <cf_lag> <dbl>
## 1 Concessional Co-payments H H02 1M 0.834
## 2 Concessional Co-payments H H02 2M 0.679
## 3 Concessional Co-payments H H02 3M 0.514
## 4 Concessional Co-payments H H02 4M 0.352
## 5 Concessional Co-payments H H02 5M 0.264
## 6 Concessional Co-payments H H02 6M 0.219
## 7 Concessional Co-payments H H02 7M 0.253
## 8 Concessional Co-payments H H02 8M 0.337
## 9 Concessional Co-payments H H02 9M 0.464
## 10 Concessional Co-payments H H02 10M 0.574
## # ℹ 82 more rows
#Barrels from us_gasoline
autoplot(us_gasoline, Barrels)
gg_season(us_gasoline, Barrels)
gg_subseries(us_gasoline, Barrels)
gg_lag(us_gasoline, Barrels)
ACF(us_gasoline, Barrels)
## # A tsibble: 31 x 2 [1W]
## lag acf
## <cf_lag> <dbl>
## 1 1W 0.893
## 2 2W 0.882
## 3 3W 0.873
## 4 4W 0.866
## 5 5W 0.847
## 6 6W 0.844
## 7 7W 0.832
## 8 8W 0.831
## 9 9W 0.822
## 10 10W 0.808
## # ℹ 21 more rows
```