Consider the GDP information in global_economy.
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
Plot the GDP per capita for each country over time.
global_economy %>%
autoplot(GDP / Population, show.legend = FALSE) +
labs(title= "GDP per capita", y = "USD")
gp2011 <- global_economy %>%
mutate(GDPpercp = GDP/Population) %>%
group_by(Country) %>%
arrange(desc(GDPpercp)) %>%
filter(Year >= 1990, Code %in% c("MAC","USA", "LUX","SGP","QAT","DEU"))
head(gp2011,25)
## # A tsibble: 25 x 10 [1Y]
## # Key: Country [3]
## # Groups: Country [3]
## Country Code Year GDP Growth CPI Imports Exports Population GDPpercp
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Luxembo… LUX 2014 6.63e10 5.77 109. 174. 208. 556319 119225.
## 2 Luxembo… LUX 2011 6.00e10 2.54 103. 145. 178. 518347 115762.
## 3 Luxembo… LUX 2008 5.58e10 -1.28 97.4 156. 187. 488650 114294.
## 4 Luxembo… LUX 2013 6.17e10 3.65 108. 159. 191. 543360 113625.
## 5 Luxembo… LUX 2012 5.67e10 -0.353 106. 155. 186. 530946 106749.
## 6 Luxembo… LUX 2007 5.09e10 8.35 94.2 150. 183. 479993 106018.
## 7 Luxembo… LUX 2010 5.32e10 4.86 100 142. 175. 506953 104965.
## 8 Luxembo… LUX 2017 6.24e10 2.30 111. 194. 230. 599449 104103.
## 9 Luxembo… LUX 2009 5.14e10 -4.36 97.8 132. 164. 497783 103199.
## 10 Luxembo… LUX 2015 5.78e10 2.86 109. 187. 223. 569604 101447.
## # ℹ 15 more rows
gp2011%>%
autoplot(GDPpercp)
#summary(gp2017)
library(dplyr)
# Process the dataset
gp1960 <- global_economy %>%
mutate(GDP_per_capita = GDP / Population) %>%
filter(Code %in% c("MAC", "USA", "LUX", "SGP", "QAT", "DEU")) %>%
filter(Year >= as.Date("1960-01-01") & Year <= as.Date("1990-12-31")) %>%
arrange(Country, Year)
# View the processed data
print(gp1960)
## # A tsibble: 348 x 10 [1Y]
## # Key: Country [6]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Germany DEU 1960 NA NA 24.6 NA NA 72814900
## 2 Germany DEU 1961 NA NA 25.2 NA NA 73377632
## 3 Germany DEU 1962 NA NA 25.9 NA NA 74025784
## 4 Germany DEU 1963 NA NA 26.7 NA NA 74714353
## 5 Germany DEU 1964 NA NA 27.3 NA NA 75318337
## 6 Germany DEU 1965 NA NA 28.2 NA NA 75963695
## 7 Germany DEU 1966 NA NA 29.2 NA NA 76600311
## 8 Germany DEU 1967 NA NA 29.7 NA NA 76951336
## 9 Germany DEU 1968 NA NA 30.2 NA NA 77294314
## 10 Germany DEU 1969 NA NA 30.7 NA NA 77909682
## # ℹ 338 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>
# Plot GDP per capita over time
ggplot(gp1960, aes(x = Year, y = GDP_per_capita, color = Country)) +
geom_line() +
labs(title = "GDP per Capita (1960-1990)", x = "Year", y = "GDP per Capita") +
theme_minimal()
global_highest <- global_economy %>%
group_by(Country, GDP, Population) %>%
summarise(GDP_per_capita = GDP / Population) %>%
arrange(desc(GDP_per_capita))
global_highest
## # A tsibble: 15,150 x 5 [1Y]
## # Key: Country, GDP, Population [15,149]
## # Groups: Country, GDP [11,965]
## Country GDP Population Year GDP_per_capita
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco 7060236168. 38132 2014 185153.
## 2 Monaco 6476490406. 35853 2008 180640.
## 3 Liechtenstein 6657170923. 37127 2014 179308.
## 4 Liechtenstein 6391735894. 36834 2013 173528.
## 5 Monaco 6553372278. 37971 2013 172589.
## 6 Monaco 6468252212. 38499 2016 168011.
## 7 Liechtenstein 6268391521. 37403 2015 167591.
## 8 Monaco 5867916781. 35111 2007 167125.
## 9 Liechtenstein 6214633651. 37666 2016 164993.
## 10 Monaco 6258178995. 38307 2015 163369.
## # ℹ 15,140 more rows
Monaco has the highest GDPPC.
data(global_economy)
library(ggplot2)
global_economy %>%
filter(Country == "Monaco") %>%
autoplot(GDP/Population) +
labs(title= "GDP per capita for Monaco", y = "USD")
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy
# Step 1: Filter Data for the United States
us_data <- global_economy %>%
filter(Country == "United States") %>%
mutate(GDP_per_capita = GDP / Population)
us_data
## # A tsibble: 58 x 10 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United States USA 1960 5.43e11 NA 13.6 4.20 4.97 180671000
## 2 United States USA 1961 5.63e11 2.30 13.7 4.03 4.90 183691000
## 3 United States USA 1962 6.05e11 6.10 13.9 4.13 4.81 186538000
## 4 United States USA 1963 6.39e11 4.40 14.0 4.09 4.87 189242000
## 5 United States USA 1964 6.86e11 5.80 14.2 4.10 5.10 191889000
## 6 United States USA 1965 7.44e11 6.40 14.4 4.24 4.99 194303000
## 7 United States USA 1966 8.15e11 6.50 14.9 4.55 5.02 196560000
## 8 United States USA 1967 8.62e11 2.50 15.3 4.63 5.05 198712000
## 9 United States USA 1968 9.42e11 4.80 16.0 4.94 5.08 200706000
## 10 United States USA 1969 1.02e12 3.10 16.8 4.95 5.09 202677000
## # ℹ 48 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>
# Step 2: Convert Data to a Time Series Object
us_tsibble <- us_data %>%
as_tsibble(index = Year) # Convert to a tsibble
# Step 3: Plot Using autoplot()
autoplot(us_tsibble, GDP_per_capita) +
labs(title = "US GDP per Capita Over Time", y = "GDP per Capita (USD)") +
theme_minimal()
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Details: Meat production in Australia for human consumption
aus_livestock is a monthly tsibble with one value:
Count: Number of animals slaughtered. Each series is uniquely identified using two keys:
Animal: The animal slaughtered. State: The Australian state (or territory).
head(aus_livestock)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
view(aus_livestock)
#data("aus_livestock")
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(Count) +
labs(title = "Slaughter of Victorian Bulls, bullocks and steers", y = "Number of animals slaughtered")
Victorian Electricity Demand from vic_elec Description: vic_elec is a half-hourly tsibble with three values:
Demand: Total electricity demand in MWh. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday.
data(vic_elec)
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
data("vic_elec")
vic_elec_new <- vic_elec %>%
group_by(Date) %>%
mutate(Demand = sum(Demand)) %>%
distinct(Date, Demand)
vic_elec_new %>%
as_tsibble(index = Date) %>%
autoplot(Demand) +
labs(title= "Daily Victorian Electricity Demand", y = "$US (in trillions)")
is_tsibble(vic_elec)
## [1] TRUE
Gas production from aus_production.
aus_production %>%
autoplot(Gas)+
labs("gas production")
#install.packages("latex2exp")
library(latex2exp)
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
Data Description: Monthly Canadian gas production, billions of cubic metres, January 1960 - February 2005
canadian_gas %>%
autoplot(Volume)+
labs(title = "Monthly Canadian gas production without transformation")
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda)) +
labs(y = "", title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
(from Exercise 7 in Section 2.10: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):)
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries,Turnover)+
labs(title = "Retial Turnover", y = "Turnover")
library(latex2exp)
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries |>
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
For this seed, the Guerrero feature selected a lambda of 0.08 in order
to make the variance stable.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
autoplot(aus_production, Tobacco)+
labs(title = "Tobacco and Cigarette Production")
lambda <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
Graph didn’t change here compare to previous one and also the lamda is almost 1. So, Box Cox is not usefull in very effective in this dataset.
Economy class passengers between Melbourne and Sydney from ansett
mel_syd <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD")
autoplot(mel_syd, Passengers)+
labs(title = "Economy class Passengers Between Melbourne and Sydney")
lambda <- mel_syd %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
mel_syd %>%
autoplot(box_cox(Passengers, lambda)) +
labs(y = "", title = latex2exp::TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
round(lambda,2))))
Here we can the variation clearly with transformed data with lamda = 2.
Pedestrian counts at Southern Cross Station from pedestrian.
data("pedestrian")
southern_cross <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
southern_cross <- southern_cross %>%
mutate(Week = yearweek(Date)) %>%
index_by(Week) %>%
summarise(Count = sum(Count))
autoplot(southern_cross, Count)+
labs(title = "Weekly Pedestrian Counts at Southern Cross Station")
lambda <- southern_cross %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
southern_cross %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "", title = latex2exp::TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
round(lambda,2))))
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas <- tail(aus_production, 5*4) |> dplyr::select(Gas)
# Plot using autoplot from fable
autoplot(gas, Gas) +
labs(title = "Gas Production Time Series",
x = "Year",
y = "Gas Production") +
theme_minimal()
# Decompose the time series
decomposition <- gas |>
model(STL(Gas ~ season() + trend()))
# Extract components
components <- components(decomposition)
# Plot components
autoplot(components) +
labs(title = "Decomposition of Gas Production Time Series",
y = "Gas Production") +
theme_minimal()
So, I see here seasonal fluctuations in the above chart.
(Note:Seasonal Fluctuations: Look for regular patterns that repeat at consistent intervals (e.g., monthly or yearly). In the plot, these will appear as repeating cycles.) (Note:Trend-Cycle: Identify the underlying trend and long-term movement in the time series. The trend component will show if there’s a general upward or downward direction over time.)
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas_decomp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas_decomp) %>%
autoplot() +
labs(title = "Classical Multiplicative Decomposition of Gas Production")
# Inspect the components
print(components)
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend + season_year + remainder
## .model Quarter Gas trend season_year remainder season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 STL(Gas ~ season() +… 2005 Q3 221 193. 26.9 0.856 194.
## 2 STL(Gas ~ season() +… 2005 Q4 180 197. -16.7 -0.109 197.
## 3 STL(Gas ~ season() +… 2006 Q1 171 200. -25.7 -3.59 197.
## 4 STL(Gas ~ season() +… 2006 Q2 224 204. 15.4 4.86 209.
## 5 STL(Gas ~ season() +… 2006 Q3 233 207. 27.0 -1.03 206.
## 6 STL(Gas ~ season() +… 2006 Q4 192 210. -16.7 -1.33 209.
## 7 STL(Gas ~ season() +… 2007 Q1 187 213. -25.5 -0.550 213.
## 8 STL(Gas ~ season() +… 2007 Q2 234 216. 15.1 2.60 219.
## 9 STL(Gas ~ season() +… 2007 Q3 245 219. 27.0 -0.730 218.
## 10 STL(Gas ~ season() +… 2007 Q4 205 219. -16.6 2.55 222.
## 11 STL(Gas ~ season() +… 2008 Q1 194 219. -25.3 0.562 219.
## 12 STL(Gas ~ season() +… 2008 Q2 229 219. 14.8 -4.59 214.
## 13 STL(Gas ~ season() +… 2008 Q3 249 219. 27.0 2.98 222.
## 14 STL(Gas ~ season() +… 2008 Q4 203 220. -16.6 -0.834 220.
## 15 STL(Gas ~ season() +… 2009 Q1 196 222. -25.0 -0.740 221.
## 16 STL(Gas ~ season() +… 2009 Q2 238 223. 14.5 0.341 223.
## 17 STL(Gas ~ season() +… 2009 Q3 252 225. 27.1 -0.132 225.
## 18 STL(Gas ~ season() +… 2009 Q4 210 226. -16.5 0.986 227.
## 19 STL(Gas ~ season() +… 2010 Q1 205 226. -24.8 4.25 230.
## 20 STL(Gas ~ season() +… 2010 Q2 236 225. 14.2 -3.62 222.
Do the results support the graphical interpretation from part a? Yes the results do support graph a as we still see the same cycle pattern.
Compute and plot the seasonally adjusted data.
components(gas_decomp) %>%
as_tsibble() %>%
autoplot(Gas) +
geom_line(aes(y=season_adjust), linetype = "dashed") +
labs(title = "Gas Production (Seasonally Adjusted)")
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas %>%
mutate(Gas = ifelse(Gas == 249, Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(Gas) +
geom_line(aes(y=season_adjust), linetype = "dashed") +
labs(title = "Gas Production (Seasonally Adjusted + outlier)")
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas %>%
mutate(Gas = ifelse(Gas == 236, Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(Gas) +
geom_line(aes(y=season_adjust), linetype = "dashed") +
labs(title = "Gas Production (Seasonally Adjusted + outlier)")
The placement of the outlier still caused the data to have a spike, one thing to note is that the seasonally adjusted data’s spike has increased compared to when the outlier was in the middle.
Recall your retail time series data (from Exercise 7 in Section 2.10).Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
#install.packages("seasonal")
library(seasonal)
Using the X-11 decomposition method, we see the that line is less smooth. This method captured seasonality better and is displaying more irregularities than previous one.
x11_decomp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_decomp) +
labs(title = "Decomposition of Retail Turnover (X-11)")
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
knitr::include_graphics("figure 3.19.png")
knitr::include_graphics("figure 3.20.png")
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
We can say that increase in the number of persons in the civilian labor force in Australia.
Seasonality is also present despite its scale being of a lesser degree to that of the remainder component, indicating that seasonality is not as significant. They do share similarities as in both graphs we see dips in the early 1990s (which was due to the recession).
Is the recession of 1991/1992 visible in the estimated components? Yes, the recession of 1991/1992 is visible in the estimated components and we can see Incredible dips in the rest of the area.