library(fpp3)
library(tidyverse)
library(cowplot)Consider the GDP information in global_economy. Plot the
GDP per capita for each country over time. Which country has the highest
GDP per capita? How has this changed over time?
length(unique(global_economy$Country))## [1] 263
There are 263 Countries in this dataset.
global_economy$GDP_Per_Capita <- global_economy$GDP / global_economy$Population
global_economy %>%
autoplot(GDP / Population, show.legend = FALSE) +
labs(title= "GDP per capita", y = "$US")## Warning: Removed 3242 rows containing missing values (`geom_line()`).
The country with he highest GDP per capita is Monaco. This was achieved in 2014. The GDP per capita has an increasing trend when compare with majority of the countries.
global_economy %>%
mutate(GDP_per_capita = GDP / Population) %>%
filter(GDP_per_capita == max(GDP_per_capita, na.rm = TRUE)) %>%
select(Country, GDP_per_capita)## # A tsibble: 1 x 3 [1Y]
## # Key: Country [1]
## Country GDP_per_capita Year
## <fct> <dbl> <dbl>
## 1 Monaco 185153. 2014
global_economy %>%
filter(Country == "Monaco") %>%
autoplot(.vars = GDP_Per_Capita) +
labs(title = "Monaco GDP per Capita")## Warning: Removed 11 rows containing missing values (`geom_line()`).
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
global_economy.aus_livestock.vic_elec.aus_production.global_economy.After i transformed the data to look at GDP based on the population (GDP per capita) for the US. This adjustment did not change the overall shape of the graph, just the y scaling. It didn’t have any effect on the GDP.
Before Transformation
global_economy %>%
filter(Country == "United States") %>%
autoplot(.vars = GDP) +
labs(title = "United States GDP")After Transformation
global_economy %>%
filter(Country == "United States") %>%
autoplot(.vars = GDP_Per_Capita) +
labs(title = "United States GDP")aus_livestock.A zig-zag downward trend is noticed in the graph.
aus_livestock %>%
filter(State == "Victoria",
Animal == "Bulls, bullocks and steers") %>%
autoplot(.vars = Count) +
labs(title = "Slaughter of Victorian Bulls, bullocks and steers")vic_elec.The data was transformed such to reflect daily, weekly and monthly electricity demand instead of half-hourly demand. It is somehow difficult to discern what is occuring in the non tranformed data. The aggregate transformations make it easier to see the underlying seasonality. There appears to be an increase in electricity in the winter and summer months / weeks. Summer is when electricity demand is at its highest.
Before Transformation
vic_elec %>%
autoplot(.vars = Demand)After Transformation
daily <- vic_elec %>%
group_by(Date) %>%
mutate(Daily_Demand = sum(Demand)) %>%
distinct(Date, Daily_Demand)
weekly <- vic_elec %>%
group_by(Date) %>%
mutate(Daily_Demand = sum(Demand)) %>%
distinct(Date, Daily_Demand) %>%
mutate(week = yearweek(Date)) %>%
group_by(week) %>%
mutate(Weekly_Demand = sum(Daily_Demand)) %>%
distinct(week, Weekly_Demand)
monthly <- vic_elec %>%
group_by(Date) %>%
mutate(Daily_Demand = sum(Demand)) %>%
distinct(Date, Daily_Demand) %>%
mutate(month = yearmonth(Date)) %>%
group_by(month) %>%
mutate(Monthly_Demand = sum(Daily_Demand)) %>%
distinct(month, Monthly_Demand)p1 <- daily %>%
as_tsibble(index = Date) %>%
autoplot(Daily_Demand) +
labs(title = "Daily Victorian Electricity Demand")
p2 <- weekly %>%
as_tsibble(index = week) %>%
autoplot(Weekly_Demand) +
labs(title = "Weekly Victorian Electricity Demand")
p3 <- monthly %>%
as_tsibble(index = month) %>%
autoplot(Monthly_Demand) +
labs(title = "Monthly Victorian Electricity Demand")
plot_grid(p1, p2, p3,
nrow = 2, ncol = 2)aus_production.The seasonal variation changes in the original plot. To adjust for this a boxcox transformation can be used. The transformed plot has a seasonal variation that is similar to the rest of series.
Before Transformation
aus_production %>%
autoplot(.vars = Gas) +
labs(title = "Gas Production")After Transformation
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))Why is a Box-Cox transformation unhelpful for the canadian_gas data?
The box-cox transformation is unhelpful because it does not make the seasonal variation about the same across he whole series.
p1 <- canadian_gas %>%
autoplot(.vars = Volume) +
labs(title = "Non-Transformed")
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
p2 <- canadian_gas %>%
autoplot(box_cox(Volume, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed with $\\lambda$ = ",
round(lambda,2))))
plot_grid(p1, p2,
nrow = 1, ncol = 2)What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
With the seed I choose to use, a good box-cox transformation \(\lambda\) would be 0.23.
set.seed(227)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))p1 <- myseries %>%
autoplot(Turnover) +
labs(title = "Non-Transformed")
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
p2 <- myseries %>%
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed with $\\lambda$ = ",
round(lambda,2))))
plot_grid(p1, p2,
nrow = 1, ncol = 2)For the following series, find an appropriate Box-Cox
transformation in order to stabilise the variance. Tobacco from
aus_production, Economy class passengers between Melbourne
and Sydney from ansett, and Pedestrian counts at Southern
Cross Station from pedestrian.
aus_productionIdeal box-cox transformation value is 0.93.
p1 <- aus_production %>%
autoplot(.vars = Tobacco) +
labs(title = "Non-Transformed")
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
p2 <- aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed with $\\lambda$ = ",
round(lambda,2))))
plot_grid(p1, p2,
nrow = 1, ncol = 2)## Warning: Removed 24 rows containing missing values (`geom_line()`).
## Removed 24 rows containing missing values (`geom_line()`).
ansettIdeal box-cox transformation value is 2.
p1 <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD") %>%
autoplot(.vars = Passengers) +
labs(title = "Non-Transformed")
lambda <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
p2 <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD") %>%
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed with $\\lambda$ = ",
round(lambda,2))))
plot_grid(p1, p2,
nrow = 1, ncol = 2)pedestrianIdeal box-cox transformation value is -0.25.
p1 <- pedestrian %>%
filter(Sensor=='Southern Cross Station') %>%
autoplot(.vars = Count) +
labs(title = "Non-Transformed")
lambda <- pedestrian %>%
filter(Sensor=='Southern Cross Station') %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
p2 <- pedestrian %>%
filter(Sensor=='Southern Cross Station') %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed with $\\lambda$ = ",
round(lambda,2))))
plot_grid(p1, p2,
nrow = 1, ncol = 2)Consider the last five years of the Gas data from
aus_production.
gas <- tail(aus_production, 5*4) %>% select(Gas)**
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
There is a seasonality fluctation that occurs every year. The cycle is always lowest at quarter 1 and peaks at quarter 3. There is an positive overall trend in this timeseries.
gas <- tail(aus_production, 5*4) %>% select(Gas)
autoplot(gas, .vars = Gas)Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of Gas")## Warning: Removed 2 rows containing missing values (`geom_line()`).
Do the results support the graphical interpretation from part a?
The results do support the graphical interpretation from part a. The trend shows that there is a positive increase over time and the seasonality repeats every year.
Compute and plot the seasonally adjusted data.
gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
select(season_adjust) %>%
autoplot() +
labs(title = "Seasonally Adjusted Gas")## Plot variable not specified, automatically selected `.vars = season_adjust`
q3.7 <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components()autoplot(q3.7, .vars = season_adjust) +
labs(title = "Seasonally Adjusted Classical Multiplicative Decomposition")Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
The outlier creates a large spike in the timeseries. The outlier also changes the seasonally adjusted values compared to the original data.
gas[10,1] <- gas[10,1] + 300
gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
select(season_adjust) %>%
autoplot() +
labs(title = "Seasonally Adjusted Gas with Outlier") +
geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
scale_color_manual(name='',
breaks=c('Outlier', 'Original'),
values=c('Outlier'='black', 'Original'='blue'))## Plot variable not specified, automatically selected `.vars = season_adjust`
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
The outlier’s location appears to make a difference in the end results of the seasonality adjustment calculations.
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas[2,1] <- gas[2,1] + 300
p1 <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
select(season_adjust) %>%
autoplot() +
labs(title = "Outlier Near Front") +
geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
scale_color_manual(name='',
breaks=c('Outlier', 'Original'),
values=c('Outlier'='black', 'Original'='blue'))## Plot variable not specified, automatically selected `.vars = season_adjust`
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas[10,1] <- gas[10,1] + 300
p2 <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
select(season_adjust) %>%
autoplot() +
labs(title = "Outlier Near Middle") +
geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
scale_color_manual(name='',
breaks=c('Outlier', 'Original'),
values=c('Outlier'='black', 'Original'='blue'))## Plot variable not specified, automatically selected `.vars = season_adjust`
gas <- tail(aus_production, 5*4) %>% select(Gas)
gas[18,1] <- gas[18,1] + 300
p3 <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
select(season_adjust) %>%
autoplot() +
labs(title = "Outlier Near Back") +
geom_line(aes(y = q3.7$season_adjust, color = "Original")) +
scale_color_manual(name='',
breaks=c('Outlier', 'Original'),
values=c('Outlier'='black', 'Original'='blue'))## Plot variable not specified, automatically selected `.vars = season_adjust`
plot_grid(p1, p2, p3,
nrow = 2, ncol = 2)Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
There appears to be a huge spike in the early 2000s, indicating the presence of an outlier.
set.seed(227)
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of total AUS retail employment using X-11.")Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
Is the recession of 1991/1992 visible in the estimated components?
It is observe that the civilian labor force in Australia has had a consistent upward trend for the entire duration of the timeseries (Feb 1978 to Aug 1995). The seasonality shows to be slightly constant, though it should be noted that the variability has increased in the later years in the series. There are large outliers in the remainder portion of the STL decomposition in the early 1990s. This could be due to a recession in Australia in 1991. The seasonal plot shows that a major recession occurred in the early 1990s as most months have a large dip at the 1990 mark.
Due to the outliers observe in the period 1991/1992, the recession of that period is quite visible due to the component observe