2.10 Exercises:

Exercise 2.10.1

Explore the following four time series:

‘Bricks’ from ‘aus_production’,
‘Lynx’ from ‘pelt’,
‘Close’ from ‘gafa_stock’,
‘Demand’ from ‘vic_elec’.

Answer:

Loading the library:

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.2     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'dplyr' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tsibbledata' was built under R version 4.2.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

Use ? (or help()) to find out about the data in each series.

?aus_production

## starting httpd help server ... done

aus_production: is a time series dataset that has quarterly estimates of selected indicators of manufacturing production in Australia. It is s a half-hourly tsibble with six values. ‘Bricks’ is one of the values. ‘Bricks’ in this dataset are clay brick production in millions of bricks.

?pelt

pelt: The pelt dataset contains Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. It includes trade records from all areas of the company. This dataset is an annual tsibble with two variables: ‘Hare’ and ‘Lynx’. The ‘Lynx’ variable represents the number of Canadian Lynx pelts traded.

?gafa_stock

gafa_stock: The gafa_stock dataset contains historical stock prices for Google, Amazon, Facebook, and Apple from 2014 to 2018. All prices are in USD ($). It is a tsibble with data recorded on irregular trading days, including the following variables: Open, High, Low, Close, Adj_Close, and Volume. The ‘Close’ variable represents the closing price of the stock.

?vic_elec

vic_elec: The vic_elec dataset contains half-hourly electricity demand data for Victoria, Australia. It is a half-hourly tsibble with three variables: Demand, Temperature, and Holiday. The ‘Demand’ variable represents the total electricity demand in megawatt-hours (MWh).

What is the time interval of each series?

We can use the ‘interval()’ to find out the time interval of each series:

interval(aus_production)

## <interval[1]>
## [1] 1Q

interval(pelt)

## <interval[1]>
## [1] 1Y

interval(gafa_stock)

## <interval[1]>
## [1] !

interval(vic_elec)

## <interval[1]>
## [1] 30m

The above code chunk demonstrates the time interval of each series as follows:

aus_production: Quarterly
pelt: Yearly
gafa_stock: Irregular intervals (based on trading days)
vic_elec: Half-hourly
Use autoplot() to produce a time plot of each series.

We will use the autoplot() to produce a time plot of each series for the follwing variables: ‘Bricks’ from aus_production, ‘Lynx’ from pelt, ‘Close’ from gafa_stock, and the ‘demand’ from vic_elec.

autoplot(aus_production, Bricks) + 
  ggtitle("Quarterly clay brick production in millions of bricks") + xlab("Year") + 
  ylab("Bricks Production (Millions)")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

The plot visualizes the Quarterly Bricks Production in Australia. It shows the time trend for Bricks over the years. we can see a long-term increasing trend in quarterly brick production in Australia from the 1950s to the 1980s, followed by fluctuations and a decline after 1990. This may indicate possible economic or industry shifts.

autoplot(pelt, Lynx) + 
  ggtitle("Canadian Lynx Pelts Trading Record")

The plot shows a cyclical pattern in Canadian Lynx pelts trading records, we can see pattern of rapid growth followed by a sharp decline.

colnames(gafa_stock)

## [1] "Symbol"    "Date"      "Open"      "High"      "Low"       "Close"    
## [7] "Adj_Close" "Volume"

autoplot(gafa_stock, Close) +
  ggtitle("Stock Closing Prices (2014-2018)")

The plot shows that Amazon and Google saw significant growth in their stock prices from 2014 to 2018, with sharp rises followed by declines. We can also see that Apple and Facebook had lower prices and more stable trends.

autoplot(vic_elec, Demand)

The plot shows electricity demand in Victoria, Australia, from 2012 to 2015, with strong seasonal patterns and high variability.

For the last plot, modify the axis labels and title.

autoplot(vic_elec, Demand) +
  ggtitle("Half-Hourly Electricity Demand in Victoria (2012-2015)") +
  xlab("Year") + 
  ylab("Electricity Demand (MWh)")

I added title as “Half-Hourly Electricity Demand in Victoria (2012-2015)”, added “Year” as the x-axis and “Electricity Demand (MWh)” as the y - axis.

Exercise 2.10.2

Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

Answer:

In this exercise, we must include the group_by() function to group the data by “Symbol” (stock ticker) to ensure that we find the peak closing price for each individual stock. If we only use filter() without group_by(), the result would return just one date—specifically, the single day when any one of the four stocks had the highest closing price in the entire dataset, rather than showing the peak for each stock separately.

library(knitr)

gafa_stock %>%
  group_by(Symbol) %>%
  filter(Close == max(Close)) %>%
  select(Symbol, Date, Close) %>%
  kable(col.names = c("Stock", "Peak Date", "Closing Price ($)"),
        caption = "Peak Closing Prices for GAFA Stocks (2014-2018)")

Peak Closing Prices for GAFA Stocks (2014-2018)
Stock	Peak Date	Closing Price ($)
AAPL	2018-10-03	232.07
AMZN	2018-09-04	2039.51
FB	2018-07-25	217.50
GOOG	2018-07-26	1268.33

Exercise 2.10.3

Download the file “tute1.csv” from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

a. You can read the data into R with the following script:

Answer:

tute1 <- readr::read_csv("https://raw.githubusercontent.com/FarhanaAkther23/DATA624/refs/heads/main/DATA624%20-%20HW1/tute1.csv")

## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Quarter
## dbl (3): Sales, AdBudget, GDP
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(tute1)

b. Convert the data to time series

mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

c. Construct time series plots of each of the three series

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")

Check what happens when you don’t include facet_grid().

mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line()

Without using ‘facet_grid()’, all three time series are plotted on the same graph and share the same y-axis scale. This makes it harder to distinguish their individual variations. As we can see in the first plot (with ‘facet_grid’), all three series shows similar seasonality patterns, but when plotted together without individual plots, differences in scale make them appear more similar than they actually are. This can lead to misinterpretation of the relationships between the variables.

Exercise 2.10.4

Answer:

The “USgas” package contains data on the demand for natural gas in the US.

a. Install the USgas package.

library(USgas)

## Warning: package 'USgas' was built under R version 4.2.3

b. Create a tsibble from us_total with year as the index and state as the key.

I used the examples from the book to construct the tsibble from us_total and made year the index and state the key.

us_gas_tsibble <- us_total |>
  as_tsibble(index = year, key = state)

c. Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

We can use dplyr functions such as “mutate()”, “filter()”, “select()” and “summarise(”) to work with tsibble objects. To filter for New England States let’s take a look at them with “unique()”, then we filter the dataset to include only of these states.

unique(us_gas_tsibble$state)

##  [1] "Alabama"                            "Alaska"                            
##  [3] "Arizona"                            "Arkansas"                          
##  [5] "California"                         "Colorado"                          
##  [7] "Connecticut"                        "Delaware"                          
##  [9] "District of Columbia"               "Federal Offshore -- Gulf of Mexico"
## [11] "Florida"                            "Georgia"                           
## [13] "Hawaii"                             "Idaho"                             
## [15] "Illinois"                           "Indiana"                           
## [17] "Iowa"                               "Kansas"                            
## [19] "Kentucky"                           "Louisiana"                         
## [21] "Maine"                              "Maryland"                          
## [23] "Massachusetts"                      "Michigan"                          
## [25] "Minnesota"                          "Mississippi"                       
## [27] "Missouri"                           "Montana"                           
## [29] "Nebraska"                           "Nevada"                            
## [31] "New Hampshire"                      "New Jersey"                        
## [33] "New Mexico"                         "New York"                          
## [35] "North Carolina"                     "North Dakota"                      
## [37] "Ohio"                               "Oklahoma"                          
## [39] "Oregon"                             "Pennsylvania"                      
## [41] "Rhode Island"                       "South Carolina"                    
## [43] "South Dakota"                       "Tennessee"                         
## [45] "Texas"                              "U.S."                              
## [47] "Utah"                               "Vermont"                           
## [49] "Virginia"                           "Washington"                        
## [51] "West Virginia"                      "Wisconsin"                         
## [53] "Wyoming"

new_england_gas <- us_gas_tsibble |>
  filter(state %in% c("Maine", "Vermont", "New Hampshire", 
                      "Massachusetts", "Connecticut", "Rhode Island"))

glimpse(new_england_gas)

## Rows: 138
## Columns: 3
## Key: state [6]
## $ year  <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007…
## $ state <chr> "Connecticut", "Connecticut", "Connecticut", "Connecticut", "Con…
## $ y     <int> 144708, 131497, 152237, 159712, 146278, 177587, 154075, 162642, …

new_england_gas |>
  ggplot(aes(x = year, y = y, colour = state)) +
  geom_line() +
  facet_wrap(~state, scales = "free_y") +
  ggtitle("Annual Natural Gas Consumption in New England (1997-2019)") +
  xlab("Year") +
  ylab("Natural Gas Consumption") +
 theme_minimal()

From the graph, we can see that natural gas consumption has been steadily increasing in Connecticut, Massachusetts, and Vermont, while Maine, Rhode Island, and New Hampshire have shown more fluctuations, with Maine experiencing a sharp peak before declining.

Exercise 2.10.5

Answer:

a. Download tourism.xlsx from the book website and read it into R using readxl::read_excel().

We use the the “?tourism” to explore the “tourism” tsibble. Then after loading the excel file, I followed the books examples to create tsibbles while identifying the “index” column and the “key” columns.

?tourism

library(readxl)

## Warning: package 'readxl' was built under R version 4.2.3

tourism <- readxl::read_excel("DATA624 - HW1/tourism.xlsx")
glimpse(tourism)

## Rows: 24,320
## Columns: 5
## $ Quarter <chr> "1998-01-01", "1998-04-01", "1998-07-01", "1998-10-01", "1999-…
## $ Region  <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State   <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips   <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…

B. Create a tsibble which is identical to the “tourism” tsibble from the “tsibble” package.

We need to match the format of the built-in ‘tourism’ dataset:

tourism_tsibble <- tourism |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter, key = c(Region, State, Purpose))
glimpse(tourism_tsibble) # check the result

## Rows: 24,320
## Columns: 5
## Key: Region, State, Purpose [304]
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,…
## $ Region  <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A…
## $ State   <chr> "South Australia", "South Australia", "South Australia", "Sout…
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B…
## $ Trips   <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16…

C. Find what combination of ‘Region’ and ‘Purpose’ had the maximum number of overnight trips on average.

The average number of trips for each Region - Purpose combination:

tourism_tsibble |>
  as_tibble() |>  
  group_by(Region, Purpose) |>
  summarise(Avg_Trips = mean(Trips, na.rm = TRUE), .groups = "drop") |>
  arrange(desc(Avg_Trips)) |> 
  head()

## # A tibble: 6 × 3
##   Region          Purpose  Avg_Trips
##   <chr>           <chr>        <dbl>
## 1 Sydney          Visiting      747.
## 2 Melbourne       Visiting      619.
## 3 Sydney          Business      602.
## 4 North Coast NSW Holiday       588.
## 5 Sydney          Holiday       550.
## 6 Gold Coast      Holiday       528.

The maximum number of overnight trips on average:

max_avg_trips <- tourism |>
  group_by(Region, Purpose) |>
  summarise(Avg_Trips = mean(Trips, na.rm = TRUE), .groups = "drop") |>
  arrange(desc(Avg_Trips)) |>
  slice(1) 
max_avg_trips

## # A tibble: 1 × 3
##   Region Purpose  Avg_Trips
##   <chr>  <chr>        <dbl>
## 1 Sydney Visiting      747.

The maximum number of overnight trips on average is 747.27

d. Create a new ‘tsibble’ which combines the Purposes and Regions, and just has total trips by State.

In this exercise, we need to compute the total number of trips by State while combining all Purposes and Regions. This means we should group by State and Quarter and sum up all trips. To achieve this, we used the group_by() function with State and Quarter to aggregate trips across all regions and purposes, followed by the summarise() function to compute the total trips per State per Quarter. Finally, we converted the resulting dataset into a tsibble to preserve the time series structure.

state_trips_tsibble <- tourism |>
  mutate(Quarter = yearquarter(Quarter)) |>  # Ensure Quarter is in correct format
  group_by(Quarter, State) |>  
  summarise(Total_Trips = sum(Trips, na.rm = TRUE), .groups = "drop") |> # Sum up trips
  as_tsibble(index = Quarter, key = State)
state_trips_tsibble

## # A tsibble: 640 x 3 [1Q]
## # Key:       State [8]
##    Quarter State Total_Trips
##      <qtr> <chr>       <dbl>
##  1 1998 Q1 ACT          551.
##  2 1998 Q2 ACT          416.
##  3 1998 Q3 ACT          436.
##  4 1998 Q4 ACT          450.
##  5 1999 Q1 ACT          379.
##  6 1999 Q2 ACT          558.
##  7 1999 Q3 ACT          449.
##  8 1999 Q4 ACT          595.
##  9 2000 Q1 ACT          600.
## 10 2000 Q2 ACT          557.
## # ℹ 630 more rows

Exercise 2.10.8

Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

- Can you spot any seasonality, cyclicity and trend?

- What do you learn about the series?

- What can you say about the seasonal patterns?

- Can you identify any unusual years?

Let’s understand the Time Series Data.

Total Private Employment → from us_employment
Bricks → from aus_production
Hare Population → from pelt
H02 Cost (Drug Sales) → from PBS
Barrels of Gasoline → from us_gasoline

Let’s also understand what are these graphics functions are (autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF()) and check the structure of the dataset to will help us confirm the available variables and how they are formatted.

autoplot(): First step in time series analysis to get a general sense of the data. It creates a basic time series plot and displays the trend, seasonality, and fluctuations over time as well as helps us to identify long-term growth, declines, or sudden changes.
gg_season(): Compares and helps spot seasonal patterns across years. It shows how the time series behaves within each season (e.g., months, quarters). If seasonality exists, we will see consistent peaks and troughs at the same points each year.
gg_subseries(): Helps see the relative strength of each season as it breaks the data into separate time series for each season. It displays the average trend for each season (month/quarter) over multiple years. If strong seasonality exists, months will show consistent differences (e.g., Dec always being high, Jan always being low).
gg_lag(): Provides a visual check for patterns & seasonality. It plots current values vs. past values at different lags. If points form a diagonal line, the series is highly correlated (trend-dominant). If circular or repeating patterns appear, it suggests seasonality.
ACF(): Quantifies how past values affect future values. Shows how current values are correlated with past values (lags). Strong peaks at regular lags suggest seasonality. If autocorrelation declines gradually, it indicates a trend rather than seasonality.

Now let’s explore features from the following time series:

us_employment:

glimpse(us_employment)

## Rows: 143,412
## Columns: 4
## Key: Series_ID [148]
## $ Month     <mth> 1939 Jan, 1939 Feb, 1939 Mar, 1939 Apr, 1939 May, 1939 Jun, …
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title     <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed  <dbl> 25338, 25447, 25833, 25801, 26113, 26485, 26481, 26848, 2746…

head(us_employment)

## # A tsibble: 6 x 4 [1M]
## # Key:       Series_ID [1]
##      Month Series_ID     Title         Employed
##      <mth> <chr>         <chr>            <dbl>
## 1 1939 Jan CEU0500000001 Total Private    25338
## 2 1939 Feb CEU0500000001 Total Private    25447
## 3 1939 Mar CEU0500000001 Total Private    25833
## 4 1939 Apr CEU0500000001 Total Private    25801
## 5 1939 May CEU0500000001 Total Private    26113
## 6 1939 Jun CEU0500000001 Total Private    26485

autoplot():

us_employment_private <- us_employment |>
  filter(Title == "Total Private")
autoplot(us_employment_private, Employed) +
  ggtitle("Total Private Employment in the U.S.")

gg_season():

us_employment_private |>
  gg_season(Employed) +
  ggtitle("Seasonal Patterns in Total Private Employment")

gg_subseries():

us_employment_private |>
  gg_subseries(Employed) +
  ggtitle("Subseries Plot: Total Private Employment")

gg_lag():

us_employment_private |>
  gg_lag(Employed, geom = "point") +
  ggtitle("Lag Plot: Total Private Employment")

ACF():

us_employment_private |>
  ACF(Employed) |>
  autoplot() +
  ggtitle("Autocorrelation of Total Private Employment")

Overall Analysis: The Total Private Employment data shows a steady increase over time, reflecting long-term economic growth in the U.S. While there are some ups and downs, especially during economic downturns like the 2008 financial crisis, these cycles don’t follow a regular pattern. Seasonal effects are minimal, as employment levels don’t show consistent fluctuations within a year. This was confirmed by the seasonal plots (gg_season() and gg_subseries()) and the autocorrelation function (ACF()). The lag plot (gg_lag()) and ACF plot indicate strong correlation between past and future values, meaning employment tends to follow a predictable growth trend.

aus_production:

glimpse(aus_production)

## Rows: 218
## Columns: 7
## $ Quarter     <qtr> 1956 Q1, 1956 Q2, 1956 Q3, 1956 Q4, 1957 Q1, 1957 Q2, 1957…
## $ Beer        <dbl> 284, 213, 227, 308, 262, 228, 236, 320, 272, 233, 237, 313…
## $ Tobacco     <dbl> 5225, 5178, 5297, 5681, 5577, 5651, 5317, 6152, 5758, 5641…
## $ Bricks      <dbl> 189, 204, 208, 197, 187, 214, 227, 222, 199, 229, 249, 234…
## $ Cement      <dbl> 465, 532, 561, 570, 529, 604, 603, 582, 554, 620, 646, 637…
## $ Electricity <dbl> 3923, 4436, 4806, 4418, 4339, 4811, 5259, 4735, 4608, 5196…
## $ Gas         <dbl> 5, 6, 7, 6, 5, 7, 7, 6, 5, 7, 8, 6, 5, 7, 8, 6, 6, 8, 8, 7…

head(aus_production)

## # A tsibble: 6 x 7 [1Q]
##   Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##     <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
## 1 1956 Q1   284    5225    189    465        3923     5
## 2 1956 Q2   213    5178    204    532        4436     6
## 3 1956 Q3   227    5297    208    561        4806     7
## 4 1956 Q4   308    5681    197    570        4418     6
## 5 1957 Q1   262    5577    187    529        4339     5
## 6 1957 Q2   228    5651    214    604        4811     7

autoplot():

aus_production |> 
  autoplot(Bricks) +
  ggtitle("Quarterly Brick Production in Australia")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_season():

aus_production |>
  gg_season(Bricks) +
  ggtitle("Seasonal Patterns in Brick Production")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_subseries()

aus_production |>
  gg_subseries(Bricks) +
  ggtitle("Subseries Plot: Brick Production by Quarter") +
  xlab("Year") +
  ylab("Bricks Produced (Millions)")

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

gg_lag:

aus_production |>
  gg_lag(Bricks, geom = "point") +
  ggtitle("Lag Plot: Brick Production")

## Warning: Removed 20 rows containing missing values (gg_lag).

ACF():

aus_production |>
  ACF(Bricks) |>
  autoplot() +
  ggtitle("Autocorrelation of Brick Production")

Overall Analysis: The brick production time series shows a clear upward trend from the 1950s to the 1980s, followed by a gradual decline over the next 25 years. Seasonality is present, as seen in the gg_season() plot, with production consistently peaking in Q3 and dropping in Q4. The lag plot (gg_lag()) and autocorrelation function (ACF()) confirm a strong quarterly seasonal cycle, with peaks at lags of 4, 8, and 12 quarters. Additionally, cyclical behavior is evident, as the long-term trend shifts direction around the mid-point of the dataset, suggesting economic or construction industry cycles influencing brick production.

pelt:

?pelt
glimpse(pelt)

## Rows: 91
## Columns: 3
## $ Year <dbl> 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855,…
## $ Hare <dbl> 19580, 19600, 19610, 11990, 28040, 58000, 74600, 75090, 88480, 61…
## $ Lynx <dbl> 30090, 45150, 49150, 39520, 21230, 8420, 5560, 5080, 10170, 19600…

head(pelt)

## # A tsibble: 6 x 3 [1Y]
##    Year  Hare  Lynx
##   <dbl> <dbl> <dbl>
## 1  1845 19580 30090
## 2  1846 19600 45150
## 3  1847 19610 49150
## 4  1848 11990 39520
## 5  1849 28040 21230
## 6  1850 58000  8420

autoplut()

pelt |> 
  autoplot(Hare) +
  ggtitle("Annual Hare Pelt Trading Records")

gg_season():

The pelt dataset is annual and each year has only one observation, so there’s no seasonal breakdown to compare across months or quarters. Therefore, gg_season() will not Work for this dataset. Movign on to gg_subseries().

gg_subseries():

pelt |>
  gg_subseries(Hare) +
  ggtitle("Subseries Plot: Snowshoe Hare Pelts")

gg_lag:

pelt |>
  gg_lag(Hare, geom = "point") +
  ggtitle("Lag Plot: Hare Pelts")

ACF():

pelt |>
  ACF(Hare) |>
  autoplot() +
  ggtitle("Autocorrelation of Hare Pelts")

Overall Analysis: The hare population goes through regular boom-and-bust cycles, peaking about every 10 years. After a peak, the population drops sharply and takes several years to recover. This pattern is likely due to predator-prey dynamics when hare numbers increase, predators (like lynx) also grow, leading to a decline in hares before the cycle starts again. The autocorrelation plot (ACF()) confirms this cycle, showing strong positive correlation every 10 years and negative correlation around 5 years, meaning a high hare population today usually leads to a low population in about five years.

source:

https://www2.nau.edu/lrm22/lessons/predator_prey/predator_prey.html#:~:text=This%20can%20lead%20to%20cyclical%20patterns%20of,prey%20numbers%20and%20then%20decrease%20as%20well.&text=While%20this%20is%20an%20indirect%20measure%20of,of%20hare%20and%20lynx%20in%20the%20wild.

https://www.youtube.com/watch?v=bgsZy2HAZVM

PBS:

?PBS # Monthly Medicare Australia prescription data

head(PBS)

## # A tsibble: 6 x 9 [1M]
## # Key:       Concession, Type, ATC1, ATC2 [1]
##      Month Concession   Type       ATC1  ATC1_desc ATC2  ATC2_desc Scripts  Cost
##      <mth> <chr>        <chr>      <chr> <chr>     <chr> <chr>       <dbl> <dbl>
## 1 1991 Jul Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   18228 67877
## 2 1991 Aug Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   15327 57011
## 3 1991 Sep Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   14775 55020
## 4 1991 Oct Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   15380 57222
## 5 1991 Nov Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   14371 52120
## 6 1991 Dec Concessional Co-paymen… A     Alimenta… A01   STOMATOL…   15028 54299

glimpse(PBS)

## Rows: 67,596
## Columns: 9
## Key: Concession, Type, ATC1, ATC2 [336]
## $ Month      <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991 Nov, 1991 Dec,…
## $ Concession <chr> "Concessional", "Concessional", "Concessional", "Concession…
## $ Type       <chr> "Co-payments", "Co-payments", "Co-payments", "Co-payments",…
## $ ATC1       <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
## $ ATC1_desc  <chr> "Alimentary tract and metabolism", "Alimentary tract and me…
## $ ATC2       <chr> "A01", "A01", "A01", "A01", "A01", "A01", "A01", "A01", "A0…
## $ ATC2_desc  <chr> "STOMATOLOGICAL PREPARATIONS", "STOMATOLOGICAL PREPARATIONS…
## $ Scripts    <dbl> 18228, 15327, 14775, 15380, 14371, 15028, 11040, 15165, 168…
## $ Cost       <dbl> 67877.00, 57011.00, 55020.00, 57222.00, 52120.00, 54299.00,…

view(PBS)

PBS |>
  distinct(ATC2, ATC2_desc)

## # A tibble: 107 × 2
##    ATC2  ATC2_desc                                      
##    <chr> <chr>                                          
##  1 A01   STOMATOLOGICAL PREPARATIONS                    
##  2 A02   DRUGS FOR ACID RELATED DISORDERS               
##  3 A03   DRUGS FOR FUNCTIONAL GASTROINTESTINAL DISORDERS
##  4 A04   ANTIEMETICS AND ANTINAUSEANTS                  
##  5 A05   BILE AND LIVER THERAPY                         
##  6 A06   LAXATIVES                                      
##  7 A07   ANTIDIARR ,INTEST  ANTIINFL /ANTIINFECT  AGENTS
##  8 A09   DIGESTIVES, INCL ENZYMES                       
##  9 A10   ANTIDIABETIC THERAPY                           
## 10 A11   VITAMINS                                       
## # ℹ 97 more rows

PBS |>
  filter(ATC2 == "H02") |>
  distinct(ATC2_desc)

## # A tibble: 1 × 1
##   ATC2_desc                       
##   <chr>                           
## 1 CORTICOSTEROIDS FOR SYSTEMIC USE

PBS_H02 <- PBS |>
  filter(ATC2 == "H02") # we only select H02 (Corticosteroids) data

autoplot():

PBS_H02 |>
  autoplot(Cost) +
  ggtitle("Monthly Cost of H02 (Corticosteroids) Prescriptions")

gg_season():

PBS_H02 |>
  gg_season(Cost) +
  ggtitle("Seasonal Patterns in H02 (Corticosteroids) Prescription Cost")

gg_subseries():

PBS_H02 |>
  gg_subseries(Cost) +
  ggtitle("Subseries Plot: H02 (Corticosteroids) Prescription Cost")

*gg_lag():

Since gg_lag() requires a single time series, we will to filter one category before plotting. let’s select one specific category like “Concessional Co-payments”.

PBS_H02_filtered <- PBS_H02 |>
  filter(Concession == "Concessional", Type == "Co-payments") # we selected one category

PBS_H02_filtered |>
  gg_lag(Cost, geom = "point")

ACF():

PBS_H02_filtered |>
  ACF(Cost) |>
  autoplot() +
  ggtitle("Autocorrelation of H02 (Corticosteroids) Prescription Cost")

Overall Alalysis: The cost of H02 (Corticosteroids) prescriptions has been steadily increasing over time, with clear seasonal patterns. Every year, costs tend to start lower in January and rise towards the end of the year, peaking around October to December. This trend is evident in the gg_season() and gg_subseries() plots. The autocorrelation analysis confirms this pattern, showing that past costs especially from 12 months ago strongly influence current costs, as indicated by gg_lag(). The ACF plot further proves this, with significant spikes at 12-month lags, confirms annual seasonality. Overall, the data suggests that prescription costs consistently increase towards year-end, possibly due to insurance policies, medical demand, or prescription renewals.

us_gasoline

?us_gasoline

glimpse(us_gasoline)

## Rows: 1,355
## Columns: 2
## $ Week    <week> 1991 W06, 1991 W07, 1991 W08, 1991 W09, 1991 W10, 1991 W11, 1…
## $ Barrels <dbl> 6.621, 6.433, 6.582, 7.224, 6.875, 6.947, 7.328, 6.777, 7.503,…

head(us_gasoline)

## # A tsibble: 6 x 2 [1W]
##       Week Barrels
##     <week>   <dbl>
## 1 1991 W06    6.62
## 2 1991 W07    6.43
## 3 1991 W08    6.58
## 4 1991 W09    7.22
## 5 1991 W10    6.88
## 6 1991 W11    6.95

view(us_gasoline)

us_gasoline |>
  autoplot(Barrels) +
  ggtitle("Weekly Gasoline Consumption in the US (1991-2017)")

us_gasoline |>
  gg_season(Barrels)+
  ggtitle("Seasonal Patterns in US Weekly Gasoline Consumption")

gg_subseries():

us_gasoline |>
  gg_subseries(Barrels) +
  ggtitle("Breakdown of Weekly Gasoline Consumption Trends")

gg_lag():

us_gasoline |>
  gg_lag(Barrels, geom = "point") +
  ggtitle("Lag Plot of Weekly Gasoline Consumption")

ACF():

us_gasoline |>
  ACF(Barrels) |>
  autoplot() +
  ggtitle("Autocorrelation of Weekly Gasoline Consumption")

Overall Analysis: There is a steady upward trend in weekly gasoline consumption from 1991 to around 2007, which indicates a growing fuel demand. However, around 2008-2009, we can see a noticeable drop and fluctuation. This is can be due to global financial crisis, which may have reduced the fuel demand. Seasonal patterns show higher consumption during summer months (June-August) and lower usage in winter (January-February). The autocorrelation analysis confirms strong consistency where past values significantly influence future consumption. Overall, the data suggests a long-term increasing trend with seasonal cycles and economic factors affecting fluctuations of gasoline consumption over time.

DATA 624: HW 1

Farhana Akther

2025-02-08

2.10 Exercises:

Exercise 2.10.1

Exercise 2.10.2

Exercise 2.10.3

Exercise 2.10.4

Exercise 2.10.5

Exercise 2.10.8