DATA 624 RB HW1

https://otexts.com/fpp3/graphics-exercises.html Exercises 2.1, 2.2, 2.3, 2.4, 2.5 and 2.8 Rpubs link as well as attach the .PDF file with your code.

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.4.1
## ✔ lubridate   1.9.2     ✔ fable       0.4.1
## ✔ ggplot2     3.5.0

## Warning: package 'dplyr' was built under R version 4.3.3

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)
library(tsibble)

1. Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.
2. Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.
```r Peak_Close <- gafa_stock %>% select(Close, Symbol, Date) %>% group_by(Symbol) %>% filter(Close == max(Close, na.rm = TRUE))
print(Peak_Close) ```
`## # A tsibble: 4 x 3 [!] ## # Key: Symbol [4] ## # Groups: Symbol [4] ## Close Symbol Date ## <dbl> <chr> <date> ## 1 232. AAPL 2018-10-03 ## 2 2040. AMZN 2018-09-04 ## 3 218. FB 2018-07-25 ## 4 1268. GOOG 2018-07-26`

Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labeled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

You can read the data into R with the following script:

# Load data - change the file location if you're reviwing this code
tute1 <- readr::read_csv("C:/Users/Ron/OneDrive/Documents/tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (3): Sales, AdBudget, GDP
## date (1): Quarter
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(tute1)

## # A tibble: 6 × 4
##   Quarter    Sales AdBudget   GDP
##   <date>     <dbl>    <dbl> <dbl>
## 1 1981-03-01 1020.     659.  252.
## 2 1981-06-01  889.     589   291.
## 3 1981-09-01  795      512.  291.
## 4 1981-12-01 1004.     614.  292.
## 5 1982-03-01 1058.     647.  279.
## 6 1982-06-01  944.     602   254

Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

Without adding facet_grid(), R plots all 3 series on the same plot- which isn’t too useful at the moment, as the scale of Sales and AdBudget are far larger than that of GDP, making it harder to read. If we include facet_grid, we’re faceting and showing a different plot for each of the 3 series columns, giving each of them a different vertical axis, and making it easier to see seasonal or cyclic trends.

4. The USgas package contains data on the demand for natural gas in the US. Install the USgas package. Create a tsibble from us_total with year as the index and state as the key. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
5. Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). Create a tsibble which is identical to the tourism tsibble from the tsibble package. Find what combination of Region and Purpose had the maximum number of overnight trips on average. Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
`r # Load data - change the file location if you're reviwing this code tourism_data <- readxl::read_excel("C:/Users/Ron/OneDrive/Documents/tourism.xlsx") head(tourism_data)`
`## # A tibble: 6 × 5 ## Quarter Region State Purpose Trips ## <chr> <chr> <chr> <chr> <dbl> ## 1 1998-01-01 Adelaide South Australia Business 135. ## 2 1998-04-01 Adelaide South Australia Business 110. ## 3 1998-07-01 Adelaide South Australia Business 166. ## 4 1998-10-01 Adelaide South Australia Business 127. ## 5 1999-01-01 Adelaide South Australia Business 137. ## 6 1999-04-01 Adelaide South Australia Business 200.`
`r #?tourism head(tourism)`
`## # A tsibble: 6 x 5 [1Q] ## # Key: Region, State, Purpose [1] ## Quarter Region State Purpose Trips ## <qtr> <chr> <chr> <chr> <dbl> ## 1 1998 Q1 Adelaide South Australia Business 135. ## 2 1998 Q2 Adelaide South Australia Business 110. ## 3 1998 Q3 Adelaide South Australia Business 166. ## 4 1998 Q4 Adelaide South Australia Business 127. ## 5 1999 Q1 Adelaide South Australia Business 137. ## 6 1999 Q2 Adelaide South Australia Business 200.`
```r ################################################################################ # Create a tsibble which is identical to the tourism tsibble from the tsibble package. # I will make the Keys Region, State and Purpose. I’ll also convert the Quarter character, into a yearquarter tourism_tsibble <- tourism_data %>% mutate(Quarter = yearquarter(Quarter)) %>% as_tsibble(index = Quarter, key = c(Region, State, Purpose))
head(tourism_tsibble) ```
`## # A tsibble: 6 x 5 [1Q] ## # Key: Region, State, Purpose [1] ## Quarter Region State Purpose Trips ## <qtr> <chr> <chr> <chr> <dbl> ## 1 1998 Q1 Adelaide South Australia Business 135. ## 2 1998 Q2 Adelaide South Australia Business 110. ## 3 1998 Q3 Adelaide South Australia Business 166. ## 4 1998 Q4 Adelaide South Australia Business 127. ## 5 1999 Q1 Adelaide South Australia Business 137. ## 6 1999 Q2 Adelaide South Australia Business 200.`
`r head(tourism)`
`## # A tsibble: 6 x 5 [1Q] ## # Key: Region, State, Purpose [1] ## Quarter Region State Purpose Trips ## <qtr> <chr> <chr> <chr> <dbl> ## 1 1998 Q1 Adelaide South Australia Business 135. ## 2 1998 Q2 Adelaide South Australia Business 110. ## 3 1998 Q3 Adelaide South Australia Business 166. ## 4 1998 Q4 Adelaide South Australia Business 127. ## 5 1999 Q1 Adelaide South Australia Business 137. ## 6 1999 Q2 Adelaide South Australia Business 200.`
```r # Now they’re identical
################################################################################ # Find what combination of Region and Purpose had the maximum number of overnight trips on average. Region_Purpose_trips <- tourism_tsibble %>% group_by(Region, Purpose) %>% summarise(avg_overnight_trips = mean(Trips, na.rm = TRUE)) %>% arrange(desc(avg_overnight_trips)) ```
## Warning: Current temporal ordering may yield unexpected results. ## ℹ Suggest to sort by `Region`, `Purpose`, `Quarter` first.
```r # This gives me the max for each Region # Region_Purpose_trips_max <- Region_Purpose_trips %>% # filter(avg_overnight_trips == max(avg_overnight_trips))
max_avg_trips <- max(Region_Purpose_trips$avg_overnight_trips, na.rm = TRUE)
Region_Purpose_trips_max <- Region_Purpose_trips %>% filter(avg_overnight_trips == max_avg_trips)
Region_Purpose_trips_max ```
`## # A tsibble: 1 x 4 [1Q] ## # Key: Region, Purpose [1] ## # Groups: Region [1] ## Region Purpose Quarter avg_overnight_trips ## <chr> <chr> <qtr> <dbl> ## 1 Melbourne Visiting 2017 Q4 985.`
```r ################################################################################ # Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
# Group by State and sum trips total_trips_by_state <- tourism_tsibble %>% group_by(State) %>% summarise(total_trips = sum(Trips, na.rm = TRUE)) %>% ungroup() # To remove any remaining grouping structure
# I’ll convert it to a tibble to remove Quarter as an index total_trips_by_state <- as_tibble(total_trips_by_state)
total_trips_by_state <- total_trips_by_state %>% select(-Quarter)
total_trips_by_state <- total_trips_by_state %>% group_by(State) %>% summarise(Trips = sum(total_trips, na.rm = TRUE)) %>% ungroup() # Now we have the total trips by state. ```
#####The combination of Region and Purpose with the maximum number of overnight trips on average is Melbourne, Visiting, 2017Q4, with 985.2784 trips on average. #####The last section (total_trips_by_state) lists the total number of trips by state, although it also includes the Quarter which is needed as a time-index for a tsibble. I’ve also made a tibble of just states and trips, just to exclude the Quarter breakdown.

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?

[“Total Private” Employed from us_employment]

us_employment %>%
  filter(Title == 'Total Private') %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Employed`

us_employment %>%
  filter(Title == 'Total Private') %>%
  gg_season(Employed, labels = "both")

us_employment %>%
  filter(Title == 'Total Private') %>%
  gg_subseries(Employed, labels = "both")

## Warning in geom_line(...): Ignoring unknown parameters: `labels`

us_employment %>%
  filter(Title == 'Total Private') %>%
  gg_lag(Employed, labels = "both", geom="point")

## Warning in geom_point(...): Ignoring unknown parameters: `labels`

us_employment %>% ACF(Employed)

## # A tsibble: 3,838 x 3 [1M]
## # Key:       Series_ID [148]
##    Series_ID          lag   acf
##    <chr>         <cf_lag> <dbl>
##  1 CEU0500000001       1M 0.997
##  2 CEU0500000001       2M 0.993
##  3 CEU0500000001       3M 0.990
##  4 CEU0500000001       4M 0.986
##  5 CEU0500000001       5M 0.983
##  6 CEU0500000001       6M 0.980
##  7 CEU0500000001       7M 0.977
##  8 CEU0500000001       8M 0.974
##  9 CEU0500000001       9M 0.971
## 10 CEU0500000001      10M 0.968
## # ℹ 3,828 more rows

I see that the overall trend for Total Private employment has steadily increased from 1940 to 2020, albeit with occasional drops in every rise. Without fail, there is a steady growth period, followed by a short decrease. The main outliers I see are in the early 1940’s which was likely caused by economic uncertainty following WW2, and in 2010 following the Housing Market Bubble and Financial Crisis. These patterns are both seasonal and cyclic in nature, given that the rises/drops are certain to happen, but the time periods for these occurrences are not always fixed. The trend for every year shows that employment begins rising around March, hits a maximum in June/July, and then begins dropping in late September/ early October. This confirms that hiring is seasonal, and that the best time to find a new job starts in early Spring, and the time period where people are least likely to get hired is in late Autumn and early Winter. The average number of Total Private employment rises and maxes out in early June - September, and then drops. The lag plots have a highly positive relationship, meaning the trend does not follow specific seasonality.

[Bricks from aus_production]

aus_production %>%
  select(Bricks) %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Bricks`

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_season(Bricks, labels = "both")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_text()`).

## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_text()`).

aus_production %>%
  gg_subseries(Bricks, labels = "both")

## Warning in geom_line(...): Ignoring unknown parameters: `labels`

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

aus_production %>%
  gg_lag(Bricks, labels = "both", geom="point")

## Warning: Removed 20 rows containing missing values (gg_lag).

## Warning in geom_point(...): Ignoring unknown parameters: `labels`

aus_production %>% ACF(Bricks)

## # A tsibble: 22 x 2 [1Q]
##         lag   acf
##    <cf_lag> <dbl>
##  1       1Q 0.900
##  2       2Q 0.815
##  3       3Q 0.813
##  4       4Q 0.828
##  5       5Q 0.720
##  6       6Q 0.642
##  7       7Q 0.655
##  8       8Q 0.692
##  9       9Q 0.609
## 10      10Q 0.556
## # ℹ 12 more rows

The obvious trend is that quarterly demand for bricks increases from Q1 to Q2/Q3, and drops down in Q4, which makes sense given less construction will be done in colder seasons. However, the overall trend shows me that demand is cyclical in un-fixed time periods- there is a long period of sporadic increase with erratic rises/drops, followed by short periods of large decreases. There seem to be 7 rising periods followed by a large decline (1960-1962, 1962-1975, 1975-1982, 1982-1991, 1991-1995, 1995-2001, 2001-2005). These periods follow the pattern of irregular increases, followed by a sharp drop, which then begins the next period of Brick production. The quarterly seasonal plot is highly sporadic as the volume of Brick production seems to be random noise, although the Subseries plot does show that the average demand for Brick production is higher in Q2 and Q3, despite there being noise. The lag plots also demonstrate heteroscedasticity when the lag gets larger.

[Hare from pelt]

pelt %>%
  select(Hare) %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Hare`

pelt %>% gg_subseries(Hare)

pelt %>% gg_lag(Hare, geom="point")

pelt %>% ACF(Hare)

## # A tsibble: 19 x 2 [1Y]
##         lag     acf
##    <cf_lag>   <dbl>
##  1       1Y  0.658 
##  2       2Y  0.214 
##  3       3Y -0.155 
##  4       4Y -0.401 
##  5       5Y -0.493 
##  6       6Y -0.401 
##  7       7Y -0.168 
##  8       8Y  0.113 
##  9       9Y  0.307 
## 10      10Y  0.340 
## 11      11Y  0.296 
## 12      12Y  0.206 
## 13      13Y  0.0372
## 14      14Y -0.153 
## 15      15Y -0.285 
## 16      16Y -0.295 
## 17      17Y -0.202 
## 18      18Y -0.0676
## 19      19Y  0.0956

#####The hunting of hares and harvesting them for their pelts has occured from the mid 1840’s to the mid 1930’s, at which point it seems to have stopped. To no surprise, there appears to be a cyclical and seasonal trend, with peaks occuring once a decade, and periods of a few years before the hunting begins again. There are 2 outlier years during which hare pelt harvesting was at its highest, in 1862 and in 1896.

[Barrels from us_gasoline]

us_gasoline %>% autoplot(Barrels)

us_gasoline %>%
  gg_season(Barrels, labels = "both")

us_gasoline %>% gg_subseries(Barrels)

us_gasoline %>% gg_lag(Barrels, geom="point")

us_gasoline %>% ACF(Barrels)

## # A tsibble: 31 x 2 [1W]
##         lag   acf
##    <cf_lag> <dbl>
##  1       1W 0.893
##  2       2W 0.882
##  3       3W 0.873
##  4       4W 0.866
##  5       5W 0.847
##  6       6W 0.844
##  7       7W 0.832
##  8       8W 0.831
##  9       9W 0.822
## 10      10W 0.808
## # ℹ 21 more rows

This dataset represents the US. finished motor gasoline product supplied from Week 6, 1991 to Week 3, 2017 in million barrels per day. As we can see in the autoplot, the general trend shows growth until about 2003, at which point it dropped a bit before rising again in 2015. However, the trend is very cyclical with increases and drops happening at inconsistent time periods, where the volume drops before returning to its approximate prior level. We can confirm this with the seasonal yearly plot, which shows that the later years (mid 2010’s) have higher levels than the earlier years (late 1990’s and mid 2000’s). One outlier that sticks out is what caused the drop after 2010, and that’d likely be environmental protection and criticism over the BP oil spill that caused oil production to deflate for that time period.

[“H02” Cost from PBS]

H02 <- PBS %>%
  filter(ATC2 == "H02")

H02 %>% autoplot(Cost)

H02 %>% gg_season(Cost)

H02 %>% gg_subseries(Cost)

#H02 %>% gg_lag(Cost)

H02 %>% ACF(Cost)

## # A tsibble: 92 x 6 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [4]
##    Concession   Type        ATC1  ATC2       lag   acf
##    <chr>        <chr>       <chr> <chr> <cf_lag> <dbl>
##  1 Concessional Co-payments H     H02         1M 0.834
##  2 Concessional Co-payments H     H02         2M 0.679
##  3 Concessional Co-payments H     H02         3M 0.514
##  4 Concessional Co-payments H     H02         4M 0.352
##  5 Concessional Co-payments H     H02         5M 0.264
##  6 Concessional Co-payments H     H02         6M 0.219
##  7 Concessional Co-payments H     H02         7M 0.253
##  8 Concessional Co-payments H     H02         8M 0.337
##  9 Concessional Co-payments H     H02         9M 0.464
## 10 Concessional Co-payments H     H02        10M 0.574
## # ℹ 82 more rows

This time-series represents the Monthly Medicare Prescription Data ib Australia, which represents the cost of the prescriptions. The trend here appears to be seasonal, with increases in the Purple line and the Blue line happening with high correlation, although with the Blue one happening in much higher amplitude. There does not appear to be a trend for the Green Line, and the Red line seems to show a general increase over time, along with a seasonal trend of decreasing whenever Blue and Purple Increase, meaning they are negatively correlated. The time interval for these peaks and valleys seems to be every 2 years, given there are 5 of them between 1995 and 2000, and 5 between 2000 and 2005. The gg_season plot tells me that the costs for the Red line has generally decreased over the seasons, while the costs for the the Blue line drop in February before rising in July to December and plummet again in February. The Green line is far more sporadic and irregular, and the Purple line mimic the behavior of the Blue line by dropping from January to February, staying very low, and then rising from July to December. These 2 lines have a seasonal behavior, while Red and Green line are far more random. The gg_subseries plots also demonstrate the average costs for each kind of prescription throughout each month over the years, showing that costs have generally risen for the 1st, 2nd, and 4th category, while the 3rd is far more sporadic and random.