library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.5
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.3.2
## ✔ lubridate 1.9.3 ✔ fable 0.3.4
## ✔ ggplot2 3.5.1 ✔ fabletools 0.4.2
## Warning: package 'tidyr' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tsibble' was built under R version 4.3.3
## Warning: package 'tsibbledata' was built under R version 4.3.3
## Warning: package 'feasts' was built under R version 4.3.3
## Warning: package 'fabletools' was built under R version 4.3.3
## Warning: package 'fable' was built under R version 4.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(latex2exp)
## Warning: package 'latex2exp' was built under R version 4.3.3
library(seasonal)
## Warning: package 'seasonal' was built under R version 4.3.3
##
## Attaching package: 'seasonal'
##
## The following object is masked from 'package:tibble':
##
## view
#loaded relevant libraries
Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code
Q1.Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
#very large figure needed for this many countries
#unique(global_economy$Country) #263 countries in global_economy
head(global_economy) #to display column names
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
global_economy |>
ggplot(aes(x = Year, y = GDP/Population, color = Country)) + #plot the GDP per capita by country
geom_line(na.rm = TRUE)+ #ignore missing values
labs(title= "GDP per capita", y = "$US") + # providing the axis labels
guides(color = guide_legend(ncol = 3)) #making the legend not have a large width
global_gdp_capita <- global_economy |> #adding GDP per capita to global economy and putting it in global_gdp_capita dataframe
mutate(gdp_capita = GDP / Population) |> # adding column of gdp per capita
arrange(desc(gdp_capita)) # place the highest GDP per capita at top of dataframe
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
head(global_gdp_capita) #display the highest GDP per capita.
## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.
## # A tsibble: 6 x 10 [1Y]
## # Key: Country [2]
## Country Code Year GDP Growth CPI Imports Exports Population gdp_capita
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7.06e9 7.18 NA NA NA 38132 185153.
## 2 Monaco MCO 2008 6.48e9 0.732 NA NA NA 35853 180640.
## 3 Liechte… LIE 2014 6.66e9 NA NA NA NA 37127 179308.
## 4 Liechte… LIE 2013 6.39e9 NA NA NA NA 36834 173528.
## 5 Monaco MCO 2013 6.55e9 9.57 NA NA NA 37971 172589.
## 6 Monaco MCO 2016 6.47e9 3.21 NA NA NA 38499 168011.
print(global_gdp_capita)
## # A tsibble: 15,150 x 10 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population gdp_capita
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7.06e9 7.18 NA NA NA 38132 185153.
## 2 Monaco MCO 2008 6.48e9 0.732 NA NA NA 35853 180640.
## 3 Liecht… LIE 2014 6.66e9 NA NA NA NA 37127 179308.
## 4 Liecht… LIE 2013 6.39e9 NA NA NA NA 36834 173528.
## 5 Monaco MCO 2013 6.55e9 9.57 NA NA NA 37971 172589.
## 6 Monaco MCO 2016 6.47e9 3.21 NA NA NA 38499 168011.
## 7 Liecht… LIE 2015 6.27e9 NA NA NA NA 37403 167591.
## 8 Monaco MCO 2007 5.87e9 14.4 NA NA NA 35111 167125.
## 9 Liecht… LIE 2016 6.21e9 NA NA NA NA 37666 164993.
## 10 Monaco MCO 2015 6.26e9 4.94 NA NA NA 38307 163369.
## # ℹ 15,140 more rows
In most recent years Monoco has had the highest GDP per capita. Liechtenstein seems to be number 2 and sometimes crosses Monoco. Other countries with high GDP per capita are Norway, Luxembourg, Bermuda. The highest GDP per capita has exchanged several times throughout the years.
Q2. For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.
global_economy |>
filter(Country == "United States") |>
ggplot(aes(x = Year, y = GDP)) + #plot the GDP of the US
geom_line(na.rm = TRUE)+ #ignore missing values
labs(title= "GDP for US", y = "$US")
global_economy |>
filter(Country == "United States") |>
ggplot(aes(x = Year, y = GDP/Population, color = Country)) + #plot the GDP per capita of the US
geom_line(na.rm = TRUE)+ #ignore missing values
labs(title= "GDP per capita for US", y = "$US")
Changing the GDP to GDP per capita for the united states yields similar results to the GDP graph. dividing by the population is a better indicator of the wealth per person in a country.
head(aus_livestock)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
unique(aus_livestock$State) #Victoria is Victorian
## [1] Australian Capital Territory New South Wales
## [3] Northern Territory Queensland
## [5] South Australia Tasmania
## [7] Victoria Western Australia
## 8 Levels: Australian Capital Territory New South Wales ... Western Australia
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State =="Victoria") |>
ggplot(aes(x = Month, y = Count)) + #plot the count of bulls
geom_line(na.rm = TRUE) #ignore missing values
bulls_victoria <- aus_livestock |> # create a new data frame with the bulls already filtered
filter(Animal == "Bulls, bullocks and steers", State =="Victoria")
dcmp <- bulls_victoria |>
model(stl = STL(Count))
components(dcmp) |> autoplot() #decompose using STL and graph
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State =="Victoria") |> gg_season() #decomposition of bull slaughter
## Plot variable not specified, automatically selected `y = Count`
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State =="Victoria") |> gg_subseries() # stacks slaughter count of each year on top of each other
## Plot variable not specified, automatically selected `y = Count`
aus_livestock |>
filter(Animal == "Bulls, bullocks and steers", State =="Victoria") |>
mutate(days_month = days_in_month(Month), # Get number of days in the month
daily_average = Count / days_month) |>
ggplot(aes(x = Month, y = daily_average)) + #plot the average number per day slaughtered
geom_line(na.rm = TRUE) #ignore missing values
Breaking the livestock up into seasons shows the overall decreasing
trend of the count slaughtered as the years increase.The subseries shows
the overall decrease per for the same month over years as well.
Transforming the data by dividing the count by the number of days in the
month helps to avoid having less count due to a month having fewer days.
Overall the data looks the same.
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
vic_elec |>
ggplot(aes(x = ymd(Date), y = Demand)) + #plot the Demand
geom_line(na.rm = TRUE) #ignore missing values
vic_elec |> gg_season(Demand, period = "week") # week by week
vic_elec |> gg_season(Demand, period = "year")
vic_elec |> gg_season(Demand, period = "day")
vic_elec |> #attempting a decomposition
model(
classical_decomposition(Demand, type = "additive")
) |>
components() |>
autoplot() +
labs(title = "Classical additive decomposition of victoria electricity demand")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
avg_vic_elec <- vic_elec |>
mutate(
`48-MA` = slider::slide_dbl(Demand, mean,
.before = 24, .after = 23, .complete = TRUE)
)
avg_vic_elec |> ggplot() +
geom_line(aes(y = `48-MA`, x = Date), colour = "#D55E00") +
labs(y = "average Demand",
title = "electricy Demand moving average")
## Warning: Removed 47 rows containing missing values or values outside the scale range
## (`geom_line()`).
showing the demand each hour per day can help with day to day electric
plant operations. Seeing the overall trend for a year helps with
seasonal operations. Decomposition simply shows how noisy the data is.
Utilizing a moving average can show the electricity demand where the
average is roughly a 24 hour period. This helps clean up the data
significantly.
aus_production |>
ggplot(aes(x = Quarter, y = Gas)) + #plot the Gas production
geom_line(na.rm = TRUE)
lambda <- 10 #try a "large" lambda
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
lambda2 <- -10 # tried a negative lambda
aus_production |>
autoplot(box_cox(Gas, lambda2)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda2,2))))
lambda3 <- 0.1
aus_production |>
autoplot(box_cox(Gas, lambda3)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda3,2))))
The example provided in the text book uses box cox transformations to make the seasonal variation uniform. It normalizes the data for statistical analysis or regression. at the extreme lambda values (-10, 10) the transformation can make the seasonal variation worse. Around lambda = 0 the seasonal variation becomes more uniform.
Q3.Why is a Box-Cox transformation unhelpful for the canadian_gas data?
head(canadian_gas)
## # A tsibble: 6 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
canadian_gas |>
ggplot(aes(x = Month, y = Volume)) + #plot the Gas production
geom_line(na.rm = TRUE)
lambda <- canadian_gas |> #test to see if any lambda would be helpful
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed canadian gas production with $\\lambda$ = ",
round(lambda,2))))
For the canadian gas data, even when applying the guerrero method to find an appropriate lambda for the box cox method, the seasonal variation does not become uniform. The 80s seasonal variation stays large and the post 90s and pre 80s seasonal variation remains smaller.
Q4. What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(987654321)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
head(myseries)
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Victoria Other recreational goods retailing A3349415T 1982 Apr 15.6
## 2 Victoria Other recreational goods retailing A3349415T 1982 May 15.8
## 3 Victoria Other recreational goods retailing A3349415T 1982 Jun 15.2
## 4 Victoria Other recreational goods retailing A3349415T 1982 Jul 15.2
## 5 Victoria Other recreational goods retailing A3349415T 1982 Aug 14.5
## 6 Victoria Other recreational goods retailing A3349415T 1982 Sep 15.1
myseries |>
ggplot(aes(x = Month, y = Turnover)) + #plot the retail turnover
geom_line(na.rm = TRUE)
lambda <- myseries |> #test to see if any lambda would be helpful
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries |>
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed canadian gas production with $\\lambda$ = ",
round(lambda,2))))
The guerrero method chooses -0.27 for a lambda value for the box cox transformation. Seasonal variation is more uniform than in the original plot.
Q5. For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
aus_production |>
autoplot(Tobacco) + #plot the tobacco production
geom_line(na.rm = TRUE)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda <- aus_production |> #test to see if any lambda would be helpful
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco production with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Lambda for Tobacco production is close to 1 (1 being no transformation)
which indicates a box cox transformation does not change the seasonal
variation much.
?ansett
## starting httpd help server ... done
head(ansett)
## # A tsibble: 6 x 4 [1W]
## # Key: Airports, Class [1]
## Week Airports Class Passengers
## <week> <chr> <chr> <dbl>
## 1 1989 W28 ADL-PER Business 193
## 2 1989 W29 ADL-PER Business 254
## 3 1989 W30 ADL-PER Business 185
## 4 1989 W31 ADL-PER Business 254
## 5 1989 W32 ADL-PER Business 191
## 6 1989 W33 ADL-PER Business 136
unique(ansett$Airports) #find how melborn and sydney airports are labeled picked MEL-SYD
## [1] "ADL-PER" "MEL-ADL" "MEL-BNE" "MEL-OOL" "MEL-PER" "MEL-SYD" "SYD-ADL"
## [8] "SYD-BNE" "SYD-OOL" "SYD-PER"
ansett |> filter(Airports == "MEL-SYD", Class == "Economy") |>
autoplot(Passengers) + #plot the economy passengers over time
geom_line(na.rm = TRUE)
mel_syd_eco <- ansett |> filter(Airports == "MEL-SYD", Class == "Economy")
lambda <- mel_syd_eco |> #test to see which lambda is selected by guerrero method
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
mel_syd_eco |>
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Passengers (MEL-SYD economy) with $\\lambda$ = ",
round(lambda,2))))
due to the extreme lack of flights during one point in time the flight
data needs a larger lambda for it’s box cox transformation. It skews the
data. the variation during this period decreases.
head(pedestrian)
## # A tsibble: 6 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [1]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
south_ped <- pedestrian |> filter(Sensor == "Southern Cross Station")
south_ped |> autoplot(Count) + #plot the southern cross station pedestrian counts over time
geom_line(na.rm = TRUE)
lambda <- south_ped |> #test to see which lambda is selected by guerrero method
features(Count, features = guerrero) |>
pull(lambda_guerrero)
south_ped |>
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Southern cross station count with $\\lambda$ = ",
round(lambda,2))))
The extremely noisy data becomes more noisy. though the seasonal
variation after the transformation is much more uniform.
Consider the last five years of the Gas data from aus_production.
Q7. gas <- tail(aus_production, 5*4) |> select(Gas) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part a? Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas <- tail(aus_production, 5*4) |> select(Gas)
gas |> autoplot()
## Plot variable not specified, automatically selected `.vars = Gas`
every year the overall gas production keeps increasing. The general seasonal fluctuations are as follows: The gas production is lowest at the beginning of quarter 1 each year. the production raises to its maximum in the middle of the year and lowers for end of quarter 4.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot() +
labs(title = "Classical multiplitive decomposition of Gas")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
the decomposition supports the statements made.
dcmp <- gas |>
model(stl = STL(Gas)) # STL decomposition used
components(dcmp) |>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas",
title = "Total Gas")
gas1 <- gas
gas1$Gas[1] <- gas$Gas[1] + 300 # add 300 to the first value of gas and update the dataframe of gas1(created new dataframe for modifications)
head(gas1)
## # A tsibble: 6 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 521 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
gas10 <- gas
gas10$Gas[10] <- gas$Gas[10] + 300
dcmp <- gas1 |>
model(stl = STL(Gas)) # STL decomposition used
components(dcmp) |>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas",
title = "Total Gas with 1st point outlier")
dcmp <- gas10 |>
model(stl = STL(Gas)) # STL decomposition used
components(dcmp) |>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas",
title = "Total Gas with 10th point outlier")
adding an outlier vastly changes the seasonally adjusted data. the
seasonally adjusted data is heavy biased or follows the outlier. putting
the outlier at the end vs the middle does not seem to make a difference.
the decomposition follows the outlier whether it is at the end or the
middle.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
myseries |> autoplot(Turnover)
x11_dcmp <- myseries |> #applying a x11 decomp to my retail series "myseries"
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Retail using X-11.")
some of the highest irregularities occur after january 2010this is possibly due to the trend switching from a positive to a negative slope. it also shows an increasing seasonal variation, but towards 2020 the seasonal variation remains consistant.
Q9. Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
a.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
The decomposition shows that the trend is increasing. There is a seasonality to the data, however the variation is much smaller than the general trend. The seasonality shows the data rises, lowers, rises, lower, rises. “///” As the years progress the end of the year had progressively higher values in the seasonal plot.
Is the recession of 1991/1992 visible in the estimated components?
The recession of 1991/1992 is visible in the estimated components. with the exception of a few months such as january and februrary the 1991/1992 had lower values than the previous year. it also shows slower growth in january and februrary and september. the slope decreased (but was not negative) during those months.