Forecasting: Principles and Practices,3rd Edition by Rob J Hyndman and George Athanasopoulos
This dataset describes quarterly estimates of selected indicators of manufacturing production in Australia. Format Time series of class tsibble. Details aus_production is a half-hourly tsibble with six values: Beer: Beer production in megalitres. Tobacco: Tobacco and cigarette production in tonnes. Bricks: Clay brick production in millions of bricks. Cement: Portland cement production in thousands of tonnes. Electricity: Electricity production in gigawatt hours. Gas: Gas production in petajoules.
Source:Australian Bureau of Statistics, catalogue number 8301.0.55.001 table 1.
aus_production
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
autoplot(aus_production )
## Plot variable not specified, automatically selected `.vars = Beer`
This dataset contains pelt trading records from Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company. Format Time series of class tsibble Details pelt is an annual tsibble with two values: Hare: The number of Snowshoe Hare pelts traded. Lynx: The number of Canadian Lynx pelts traded. Source Hudson Bay Company
pelt
## # A tsibble: 91 x 3 [1Y]
## Year Hare Lynx
## <dbl> <dbl> <dbl>
## 1 1845 19580 30090
## 2 1846 19600 45150
## 3 1847 19610 49150
## 4 1848 11990 39520
## 5 1849 28040 21230
## 6 1850 58000 8420
## 7 1851 74600 5560
## 8 1852 75090 5080
## 9 1853 88480 10170
## 10 1854 61280 19600
## # ℹ 81 more rows
autoplot(pelt)
## Plot variable not specified, automatically selected `.vars = Hare`
This dataset contains Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD.
Format Time series of class tsibble
Details gafa_stock is a tsibble containing data on irregular trading days:
Open: The opening price for the stock. High: The stock’s highest trading price. Low: The stock’s lowest trading price. Close: The closing price for the stock. Adj_Close: The adjusted closing price for the stock. Volume: The amount of stock traded. Each stock is uniquely identified by one key:
Symbol: The ticker symbol for the stock. Source Yahoo Finance historical data
gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
## 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
## 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
## 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
## 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
## # ℹ 5,022 more rows
autoplot(gafa_stock)
## Plot variable not specified, automatically selected `.vars = Open`
Description vic_elec is a half-hourly tsibble with three values:
Demand: Total electricity demand in MWh. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday. Format Time series of class tsibble.
Details This data is for operational demand, which is the demand met by local scheduled generating units, semi-scheduled generating units, and non-scheduled intermittent generating units of aggregate capacity larger than 30 MWh, and by generation imports to the region. The operational demand excludes the demand met by non-scheduled non-intermittent generating units, non-scheduled intermittent generating units of aggregate capacity smaller than 30 MWh, exempt generation (e.g. rooftop solar, gas tri-generation, very small wind farms, etc), and demand of local scheduled loads. It also excludes some very large industrial users (such as mines or smelters).
Source Australian Energy Market Operato
vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # ℹ 52,598 more rows
autoplot(vic_elec) +
labs (y= "Demand", x = "Every 30mn", title = "Half-hourly electricity demand for Victoria" )
## Plot variable not specified, automatically selected `.vars = Demand`
gafa_stock|>
group_by(Symbol)|>
filter(Close == max(Close))|>
ungroup() -> peak_closing_days
peak_closing_days
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
names(gafa_stock)
## [1] "Symbol" "Date" "Open" "High" "Low" "Close"
## [7] "Adj_Close" "Volume"
# reading the data into R
tute1 <- readr::read_csv("https://raw.githubusercontent.com/Heleinef/Data-Science-Master_Heleine/main/tute1.csv")
## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(tute1)
## # A tibble: 6 × 4
## Quarter Sales AdBudget GDP
## <date> <dbl> <dbl> <dbl>
## 1 1981-03-01 1020. 659. 252.
## 2 1981-06-01 889. 589 291.
## 3 1981-09-01 795 512. 291.
## 4 1981-12-01 1004. 614. 292.
## 5 1982-03-01 1058. 647. 279.
## 6 1982-06-01 944. 602 254
# Converting the data to time series
mytimeseries <- tute1 |>
mutate(Quarter = yearquarter(Quarter)) |>
as_tsibble(index = Quarter)
# Construct time series plots of each of the three series
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
# Result When facet_grid() is not included
mytimeseries |>
pivot_longer(-Quarter) |>
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
# Install the USgas package.
data("us_total", package = "USgas") # Load the data from USgas
glimpse(us_total) # Take a look at the structure
## Rows: 1,266
## Columns: 3
## $ year <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007…
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama"…
## $ y <int> 324158, 329134, 337270, 353614, 332693, 379343, 350345, 382367, …
# Create a tsibble from us_total with year as the index and state as the key
us_total_tsibble <- us_total |>
as_tsibble(index = year, key = state)
us_total_tsibble
## # A tsibble: 1,266 x 3 [1Y]
## # Key: state [53]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
## 7 2003 Alabama 350345
## 8 2004 Alabama 382367
## 9 2005 Alabama 353156
## 10 2006 Alabama 391093
## # ℹ 1,256 more rows
# Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
new_england_states <- c("Maine", "Vermont", "New Hampshire", "Massachusetts", "Connecticut", "Rhode Island")
us_total_tsibble |>
filter(state %in% new_england_states)->new_england_data
new_england_data
## # A tsibble: 138 x 3 [1Y]
## # Key: state [6]
## year state y
## <int> <chr> <int>
## 1 1997 Connecticut 144708
## 2 1998 Connecticut 131497
## 3 1999 Connecticut 152237
## 4 2000 Connecticut 159712
## 5 2001 Connecticut 146278
## 6 2002 Connecticut 177587
## 7 2003 Connecticut 154075
## 8 2004 Connecticut 162642
## 9 2005 Connecticut 168067
## 10 2006 Connecticut 172682
## # ℹ 128 more rows
# Reading the tourism file into R
tourism_new <- readr::read_csv("https://raw.githubusercontent.com/Heleinef/Data-Science-Master_Heleine/main/tourism.csv")
## Rows: 24320 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Region, State, Purpose
## dbl (1): Trips
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tourism_new
## # A tibble: 24,320 × 5
## Quarter Region State Purpose Trips
## <date> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
## 7 1999-07-01 Adelaide South Australia Business 169.
## 8 1999-10-01 Adelaide South Australia Business 134.
## 9 2000-01-01 Adelaide South Australia Business 154.
## 10 2000-04-01 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
# Create a tsibble which is identical to the tourism tsibble from the tsibble package.
tourism_tsibble <- tourism_new |>
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
tourism_tsibble
## # A tsibble: 24,320 x 5 [1D]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## <date> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
## 7 1999-07-01 Adelaide South Australia Business 169.
## 8 1999-10-01 Adelaide South Australia Business 134.
## 9 2000-01-01 Adelaide South Australia Business 154.
## 10 2000-04-01 Adelaide South Australia Business 169.
## # ℹ 24,310 more rows
# First, calculate avaerage trips by region and purpose
tourism_avg <- tourism_tsibble |>
group_by(Region, Purpose) |>
summarise(avg_trips = mean(Trips, na.rm = TRUE)) |>
ungroup()
tourism_avg
## # A tsibble: 24,320 x 4 [1D]
## # Key: Region, Purpose [304]
## Region Purpose Quarter avg_trips
## <chr> <chr> <date> <dbl>
## 1 Adelaide Business 1998-01-01 135.
## 2 Adelaide Business 1998-04-01 110.
## 3 Adelaide Business 1998-07-01 166.
## 4 Adelaide Business 1998-10-01 127.
## 5 Adelaide Business 1999-01-01 137.
## 6 Adelaide Business 1999-04-01 200.
## 7 Adelaide Business 1999-07-01 169.
## 8 Adelaide Business 1999-10-01 134.
## 9 Adelaide Business 2000-01-01 154.
## 10 Adelaide Business 2000-04-01 169.
## # ℹ 24,310 more rows
# Then, find what combination of Region and Purpose had the maximum number of overnight trips on average.
max_avg_trips <- tourism_avg |>
filter(avg_trips == max(avg_trips))
max_avg_trips
## # A tsibble: 1 x 4 [1D]
## # Key: Region, Purpose [1]
## Region Purpose Quarter avg_trips
## <chr> <chr> <date> <dbl>
## 1 Melbourne Visiting 2017-10-01 985.
# Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
new_tsibble <- tourism_tsibble |>
group_by(State) |>
summarise(Total_Trips = sum(Trips, na.rm = TRUE)) %>% # Summarize total trips by state
as_tsibble(key = State) # Convert the result to a tsibble
new_tsibble
## # A tsibble: 640 x 3 [1D]
## # Key: State [8]
## State Quarter Total_Trips
## <chr> <date> <dbl>
## 1 ACT 1998-01-01 551.
## 2 ACT 1998-04-01 416.
## 3 ACT 1998-07-01 436.
## 4 ACT 1998-10-01 450.
## 5 ACT 1999-01-01 379.
## 6 ACT 1999-04-01 558.
## 7 ACT 1999-07-01 449.
## 8 ACT 1999-10-01 595.
## 9 ACT 2000-01-01 600.
## 10 ACT 2000-04-01 557.
## # ℹ 630 more rows
# The dataset
aus_arrivals
## # A tsibble: 508 x 3 [1Q]
## # Key: Origin [4]
## Quarter Origin Arrivals
## <qtr> <chr> <int>
## 1 1981 Q1 Japan 14763
## 2 1981 Q2 Japan 9321
## 3 1981 Q3 Japan 10166
## 4 1981 Q4 Japan 19509
## 5 1982 Q1 Japan 17117
## 6 1982 Q2 Japan 10617
## 7 1982 Q3 Japan 11737
## 8 1982 Q4 Japan 20961
## 9 1983 Q1 Japan 20671
## 10 1983 Q2 Japan 12235
## # ℹ 498 more rows
# Using the autoplot function
autoplot(aus_arrivals)
## Plot variable not specified, automatically selected `.vars = Arrivals`
# Using the gg_season()
gg_season(aus_arrivals)
## Plot variable not specified, automatically selected `y = Arrivals`
# Using the gg_subseries()
gg_subseries(aus_arrivals)
## Plot variable not specified, automatically selected `y = Arrivals`
# The dataset
aus_retail
## # A tsibble: 64,532 x 5 [1M]
## # Key: State, Industry [152]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Apr 4.4
## 2 Australian Capital Territory Cafes, restaurant… A3349849A 1982 May 3.4
## 3 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jun 3.6
## 4 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Jul 4
## 5 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Aug 3.6
## 6 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Sep 4.2
## 7 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Oct 4.8
## 8 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Nov 5.4
## 9 Australian Capital Territory Cafes, restaurant… A3349849A 1982 Dec 6.9
## 10 Australian Capital Territory Cafes, restaurant… A3349849A 1983 Jan 3.8
## # ℹ 64,522 more rows
set.seed(13101917)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries
## # A tsibble: 441 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 South Australia Department stores A3349658K 1982 Apr 48.9
## 2 South Australia Department stores A3349658K 1982 May 52.2
## 3 South Australia Department stores A3349658K 1982 Jun 48.9
## 4 South Australia Department stores A3349658K 1982 Jul 48.3
## 5 South Australia Department stores A3349658K 1982 Aug 49.4
## 6 South Australia Department stores A3349658K 1982 Sep 48.5
## 7 South Australia Department stores A3349658K 1982 Oct 46.1
## 8 South Australia Department stores A3349658K 1982 Nov 58.5
## 9 South Australia Department stores A3349658K 1982 Dec 88.9
## 10 South Australia Department stores A3349658K 1983 Jan 43.5
## # ℹ 431 more rows
# 1. Autoplot of the selected series
autoplot(myseries, Turnover) +
ggtitle("Autoplot of Retail Turnover") +
ylab("Turnover") + xlab("Time")
# 2. Seasonal plot to visualize seasonality
gg_season(myseries, Turnover) +
ggtitle("Seasonal Plot of Retail Turnover") +
ylab("Turnover") + xlab("Season")
# 3. Subseries plot for further seasonality exploration
gg_subseries(myseries, Turnover) +
ggtitle("Subseries Plot of Retail Turnover") +
ylab("Turnover") + xlab("Month")
# 4. Lag plot to examine correlation between lags
gg_lag(myseries, Turnover, geom = "point") +
ggtitle("Lag Plot of Retail Turnover")
# 5. Autocorrelation plot to visualize cyclicity and correlation in the data
ACF(myseries, Turnover) %>%
autoplot() +
ggtitle("ACF Plot of Retail Turnover") +
ylab("ACF")
Indeed, the various plots helped me visualize the behavior of the selected time series,and allowed me to detect seasonality, trends, or cyclic patterns in the retail turnover data, depending of the selected function.
autoplot(myseries, Turnover): provided an overview of the retail sales over time, allowing me to observe trends and potential patterns in the data.
gg_season(myseries, Turnover): Revealed the seasonal variation by decomposing the data into seasons (e.g., months). It helps identify repeating patterns.
gg_subseries(myseries, Turnover): This plot also helps in observing seasonal behavior by breaking down the data by individual months or seasons.It can highlight if a particular month/season has consistently high or low sales.
gg_lag(myseries, Turnover, geom = “point”): A lag plot helps visualize the correlation between observations at different time lags.Cyclic behavior: If there’s a repeating cycle or seasonality, the lag plot might show a certain correlation structure.
ACF() |> autoplot(): The autocorrelation function (ACF) plot shows the correlation of the time series with its own lags. Peaks in the ACF at regular intervals can indicate seasonality, while slow decay might suggest a trend or cyclicity.
us_employment
## # A tsibble: 143,412 x 4 [1M]
## # Key: Series_ID [148]
## Month Series_ID Title Employed
## <mth> <chr> <chr> <dbl>
## 1 1939 Jan CEU0500000001 Total Private 25338
## 2 1939 Feb CEU0500000001 Total Private 25447
## 3 1939 Mar CEU0500000001 Total Private 25833
## 4 1939 Apr CEU0500000001 Total Private 25801
## 5 1939 May CEU0500000001 Total Private 26113
## 6 1939 Jun CEU0500000001 Total Private 26485
## 7 1939 Jul CEU0500000001 Total Private 26481
## 8 1939 Aug CEU0500000001 Total Private 26848
## 9 1939 Sep CEU0500000001 Total Private 27468
## 10 1939 Oct CEU0500000001 Total Private 27830
## # ℹ 143,402 more rows
colnames(us_employment)
## [1] "Month" "Series_ID" "Title" "Employed"
aus_production
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
PBS
## # A tsibble: 67,596 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [336]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-payme… A Alimenta… A01 STOMATOL… 18228 67877
## 2 1991 Aug Concessional Co-payme… A Alimenta… A01 STOMATOL… 15327 57011
## 3 1991 Sep Concessional Co-payme… A Alimenta… A01 STOMATOL… 14775 55020
## 4 1991 Oct Concessional Co-payme… A Alimenta… A01 STOMATOL… 15380 57222
## 5 1991 Nov Concessional Co-payme… A Alimenta… A01 STOMATOL… 14371 52120
## 6 1991 Dec Concessional Co-payme… A Alimenta… A01 STOMATOL… 15028 54299
## 7 1992 Jan Concessional Co-payme… A Alimenta… A01 STOMATOL… 11040 39753
## 8 1992 Feb Concessional Co-payme… A Alimenta… A01 STOMATOL… 15165 54405
## 9 1992 Mar Concessional Co-payme… A Alimenta… A01 STOMATOL… 16898 61108
## 10 1992 Apr Concessional Co-payme… A Alimenta… A01 STOMATOL… 18141 65356
## # ℹ 67,586 more rows
# 1. Explore "Total Private" Employed from us_employment
total_private_employment <- us_employment |>
filter(Title == "Total Private")
# 2. Autoplot of the series
autoplot(total_private_employment, Employed) +
ggtitle("Autoplot of 'Total Private' Employed") +
ylab("Employed") + xlab("Time")
# 3. Seasonal plot to visualize seasonality
gg_season(total_private_employment, Employed) +
ggtitle("Seasonal Plot of 'Total Private' Employed") +
ylab("Employed") + xlab("Season")
# 4. Subseries plot for further seasonality exploration
gg_subseries(total_private_employment, Employed) +
ggtitle("Subseries Plot of 'Total Private' Employed") +
ylab("Employed") + xlab("Month")
# 5. Lag plot to examine correlation between lags
gg_lag(total_private_employment, Employed, geom = "point") +
ggtitle("Lag Plot of 'Total Private' Employed")
# 6. Autocorrelation plot to visualize cyclicity and correlation in the data
ACF(total_private_employment, Employed) %>%
autoplot() +
ggtitle("ACF of 'Total Private' Employed")
# 2. Explore Bricks from aus_production
autoplot(aus_production, Bricks) +
ggtitle("Autoplot of Bricks Production") +
ylab("Bricks Produced") + xlab("Time")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_season(aus_production, Bricks) +
ggtitle("Seasonal Plot of Bricks Production") +
ylab("Bricks Produced") + xlab("Season")
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_subseries(aus_production, Bricks) +
ggtitle("Subseries Plot of Bricks Production") +
ylab("Bricks Produced") + xlab("Month")
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
gg_lag(aus_production, Bricks, geom = "point") +
ggtitle("Lag Plot of Bricks Production")
## Warning: Removed 20 rows containing missing values (gg_lag).
ACF(aus_production, Bricks) %>%
autoplot() +
ggtitle("ACF of Bricks Production")
# 3. Explore Hare from pelt
# Convert to tsibble if necessary
pelt_tsibble <- pelt |>
as_tsibble(index = Year) # Assuming 'Year' is the index
# Autoplot of Hare Pelts
autoplot(pelt_tsibble, Hare) +
ggtitle("Autoplot of Hare Pelts") +
ylab("Number of Pelts") + xlab("Time")
# Subseries plot for further seasonality exploration
gg_subseries(pelt_tsibble, Hare) +
ggtitle("Subseries Plot of Hare Pelts") +
ylab("Number of Pelts") + xlab("Month")
# Lag plot to examine correlation between lags
gg_lag(pelt_tsibble, Hare, geom = "point") +
ggtitle("Lag Plot of Hare Pelts")
# Autocorrelation plot to visualize cyclicity and correlation in the data
ACF(pelt_tsibble, Hare) %>%
autoplot() +
ggtitle("ACF of Hare Pelts")
PBS_H02 <- PBS %>%
filter(ATC2 == "H02")
H02_summary <-PBS_H02 %>%
index_by(Month) %>% # Replace with the appropriate time index like Quarter or Year
summarise(total_cost = sum(Cost, na.rm = TRUE))
H02_summary
## # A tsibble: 204 x 2 [1M]
## Month total_cost
## <mth> <dbl>
## 1 1991 Jul 429795
## 2 1991 Aug 400906
## 3 1991 Sep 432159
## 4 1991 Oct 492543
## 5 1991 Nov 502369
## 6 1991 Dec 602652
## 7 1992 Jan 660119
## 8 1992 Feb 336220
## 9 1992 Mar 351348
## 10 1992 Apr 379808
## # ℹ 194 more rows
ggplot(H02_summary, aes(x = Month, y = total_cost)) + # Adjust `Month` to the appropriate time index
geom_line() +
labs(
title = "Total Cost of H02 (Corticosteroids) Over Time",
x = "Time",
y = "Total Cost"
) +
theme_minimal()
# Filter for "H02" category
PBS_H02 <- PBS %>%
filter(ATC2 == "H02")
PBS_H02
## # A tsibble: 816 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
## <mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1991 Jul Concessional Co-paym… H Systemic… H02 CORTICOS… 63261 317384
## 2 1991 Aug Concessional Co-paym… H Systemic… H02 CORTICOS… 53528 269891
## 3 1991 Sep Concessional Co-paym… H Systemic… H02 CORTICOS… 52822 269703
## 4 1991 Oct Concessional Co-paym… H Systemic… H02 CORTICOS… 54016 280418
## 5 1991 Nov Concessional Co-paym… H Systemic… H02 CORTICOS… 49281 268070
## 6 1991 Dec Concessional Co-paym… H Systemic… H02 CORTICOS… 51798 277139
## 7 1992 Jan Concessional Co-paym… H Systemic… H02 CORTICOS… 42436 221772
## 8 1992 Feb Concessional Co-paym… H Systemic… H02 CORTICOS… 52913 272345
## 9 1992 Mar Concessional Co-paym… H Systemic… H02 CORTICOS… 62908 325700
## 10 1992 Apr Concessional Co-paym… H Systemic… H02 CORTICOS… 68499 349271
## # ℹ 806 more rows
# Autoplot of Cost for "H02"
autoplot(PBS_H02, Cost) +
ggtitle("Autoplot of H02 Pharmaceutical Costs") +
ylab("Cost") + xlab("Time")
autoplot(H02_summary, total_cost) +
ggtitle("Autoplot of H02 Pharmaceutical Costs") +
ylab("Cost") + xlab("Time")
autoplot(H02_summary, total_cost) +
ggtitle("Autoplot of H02 Pharmaceutical Costs") +
ylab("Cost") + xlab("Time")
PBS_H02_filled <- PBS_H02 %>%
fill_gaps(Cost = NA)
# 1. gg_season() - Seasonal plot
gg_season(PBS_H02_filled, Cost) +
ggtitle("Seasonal Plot of H02 Pharmaceutical Costs") +
ylab("Cost") + xlab("Season")
# 2. gg_subseries() - Subseries plot
gg_subseries(PBS_H02_filled, Cost) +
ggtitle("Subseries Plot of H02 Pharmaceutical Costs") +
ylab("Cost") + xlab("Subseries")
# 4. ACF() - Autocorrelation function
PBS_H02_filled %>%
ACF(Cost) %>%
autoplot() +
ggtitle("ACF of H02 Pharmaceutical Costs") +
ylab("ACF") + xlab("Lag")
# 5. Explore Barrels from us_gasoline
autoplot(us_gasoline, Barrels) +
ggtitle("Autoplot of Gasoline Barrels") +
ylab("Barrels") + xlab("Time")
gg_season(us_gasoline, Barrels) +
ggtitle("Seasonal Plot of Gasoline Barrels") +
ylab("Barrels") + xlab("Season")
gg_subseries(us_gasoline, Barrels) +
ggtitle("Subseries Plot of Gasoline Barrels") +
ylab("Barrels") + xlab("Month")
gg_lag(us_gasoline, Barrels, geom = "point") +
ggtitle("Lag Plot of Gasoline Barrels")
ACF(us_gasoline, Barrels) %>%
autoplot() +
ggtitle("ACF of Gasoline Barrels")