# Import required R libraries
library(fpp3)
Consider the GDP information in global_economy
. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy %>%
autoplot(GDP/Population, show.legend=FALSE) +
labs(title= "GDP per capita", y = "$US")
The plot above shows all the GDP per capita data from global_economy
without the legend. In general the trend is increasing, but that doesn’t take into account inflation.
global_economy %>%
filter(Year == "2017") %>%
mutate(GdpPerPop = GDP/Population) %>%
arrange(desc(GdpPerPop))
## # A tsibble: 262 x 10 [1Y]
## # Key: Country [262]
## Country Code Year GDP Growth CPI Imports Exports Population GdpPerPop
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Luxemb… LUX 2017 6.24e10 2.30 111. 194. 230. 599449 104103.
## 2 Macao … MAC 2017 5.04e10 9.10 136. 32.0 79.4 622567 80893.
## 3 Switze… CHE 2017 6.79e11 1.09 98.3 53.9 65.0 8466017 80190.
## 4 Norway NOR 2017 3.99e11 1.92 115. 33.1 35.5 5282223 75505.
## 5 Iceland ISL 2017 2.39e10 3.64 122. 42.8 47.0 341284 70057.
## 6 Ireland IRL 2017 3.34e11 7.80 105. 87.9 120. 4813608 69331.
## 7 Qatar QAT 2017 1.67e11 1.58 116. 37.3 51.0 2639211 63249.
## 8 United… USA 2017 1.94e13 2.27 112. NA NA 325719178 59532.
## 9 North … NAC 2017 2.10e13 2.35 NA NA NA 362492702 58070.
## 10 Singap… SGP 2017 3.24e11 3.62 113. 149. 173. 5612253 57714.
## # … with 252 more rows
Luxembourg has the highest GDP per capita for 2017 the final year of global_economy
.
global_economy %>%
filter(Year == "2014") %>%
mutate(GdpPerPop = GDP/Population) %>%
arrange(desc(GdpPerPop))
## # A tsibble: 262 x 10 [1Y]
## # Key: Country [262]
## Country Code Year GDP Growth CPI Imports Exports Population GdpPerPop
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Monaco MCO 2014 7.06e 9 7.18 NA NA NA 38132 185153.
## 2 Liecht… LIE 2014 6.66e 9 NA NA NA NA 37127 179308.
## 3 Luxemb… LUX 2014 6.63e10 5.77 109. 174. 208. 556319 119225.
## 4 Norway NOR 2014 4.99e11 1.98 106. 29.8 38.8 5137232 97200.
## 5 Macao … MAC 2014 5.53e10 -1.20 126. 31.6 84.9 588781 94004.
## 6 Isle o… IMN 2014 7.43e 9 5.00 NA NA NA 82590 89942.
## 7 Qatar QAT 2014 2.06e11 3.98 110. 31.0 68.0 2374419 86853.
## 8 Switze… CHE 2014 7.09e11 2.45 99.3 52.6 64.3 8188649 86606.
## 9 Denmark DNK 2014 3.53e11 1.62 107. 47.7 54.6 5643475 62549.
## 10 Austra… AUS 2014 1.46e12 2.56 110. 21.5 21.1 23504138 62328.
## # … with 252 more rows
There is something peculiar about this data, though. Monaco and Liechtenstein don’t appear to have any data listed for 2017, but those two countries have the highest GDP per capita starting around 1985. Of all data points in the plot, Monaco has the highest GDP per capita with 185152.5272 in 2014. It would appear that since 1970, Monaco has the highest GDP per capita every year save for 3 exceptions.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP/Population) +
labs(title= "GDP per capita", y = "$US")
Applied population adjustment to account for the effect of the population total on the United Stated GDP, thus giving the GDP a more honest value for each year.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
.
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" &
State == "Victoria") %>%
mutate(DailyAvgByMonth = Count / days_in_month(Month)) %>%
autoplot(DailyAvgByMonth) +
labs(title= "Slaughter of Victorian Bulls, bullocks and steers", y = "Daily Average by Month")
# Help from: https://stackoverflow.com/questions/30037722/daily-average-to-monthly-total-in-r
Applied calendar adjustment with days_in_month
from lubridate library in order to calculate the daily average for each month instead of the raw monthly totals. This adjustment accounts for the differing number of days in each month.
Victorian Electricity Demand from vic_elec
.
# Convert to tibble from tsibble and drop time
vc_tib <- as_tibble(vic_elec) %>%
select(-c(Time))
# Convert back to tsibble grouping and summing Total Daily Demand by date
vc_tsib <- vc_tib %>%
group_by(Date) %>%
summarise(DailyTotal = sum(Demand)) %>%
mutate(Date = as_date(Date)) %>%
as_tsibble(index = Date, key = DailyTotal)
# Plot new tsibble, Daily Total Demand by Date
vc_tsib %>%
ggplot(aes(x = Date, y = DailyTotal)) +
geom_line() +
labs(title = "Electricity Demand",
subtitle = "Victoria, Australia",
y = "Daily Total (in megawatts)")
Applied a calendar adjustment by summing the daily demand totals for each date instead of using the default 30-minute intervals. With so many intervals plotted on one chart, I couldn’t make sense of the data, and given the small increments, the plot was a daily cycle over three years … just not working for me. In order to provide a more contextualized view, I summed all the increments for a given day, and thus plotted the total daily demand over the course of three years. I believe this plot gives a better view of the seasonal changes in electricity demand and also an outlier in early 2014. I would suspect some sort of freak weather event occurred in Victoria, Australia at that time.
Gas production from aus_production
.
# From section 3.1
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
Applied Box-Cox transformation (mathematical transformation) following the example in section 3.1 of the Hyndman textbook. The transformation evens out the seasonal variation across the whole series.
Why is a Box-Cox transformation unhelpful for the canadian_gas
data?
canadian_gas %>%
autoplot(Volume) +
labs(title= "Monthly Canadian Gas Production", y = "Billions of cubic meters")
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
The Box-Cox transformations is unhelpful for canadian_gas
because the seasonal variation is already about the same across the whole series except for the years 1978-1990 in which the seasonal variation increases. As seen above in the initial plot without transformation and the second plot with a Box-Cox transformation, the transformation doesn’t necessarily tease out the season variation. If anything, the transformation diminishes the impact of the large seasonal swings between 1978 through 1990.
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
set.seed(8675309)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries, Turnover) +
labs(title = "Turnover in Queensland Takeaway food services",
subtitle = "Series ID: A3349767W",
y = "Turnover")
Initial plot above.
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed food services turnover with $\\lambda$ = ",
round(lambda,2))))
Using the guerrero
feature, as following the textbook’s example, then the recommended lambda value is 0.24, which indicates a power transformation is used. As seen in the above plot (with transformation applied), the seasonal variance is much more even across the whole series.
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production
, Economy class passengers between Melbourne and Sydney from ansett
, and Pedestrian counts at Southern Cross Station from pedestrian
.
# Tobacco
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed tobacco production with $\\lambda$ = ",
round(lambda,2))))
# Economy class passengers between Melbourne and Sydney
lambda <- ansett %>%
filter(Airports == 'MEL-SYD' &
Class == 'Economy') %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Airports == 'MEL-SYD' &
Class == 'Economy') %>%
autoplot(box_cox(Passengers, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed economy passengers between Mel and Syd with $\\lambda$ = ",
round(lambda,2))))
# Pedestrian counts at Southern Cross Station
lambda <- pedestrian %>%
filter(Sensor == 'Southern Cross Station') %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == 'Southern Cross Station') %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed pedestrian count at Southern Cross Station with $\\lambda$ = ",
round(lambda,2))))
To be honest, I wasn’t real pleased with the above result. Given the data represents hourly counts over the course of two years, I figured the seasonal nature would follow a weekly pattern. Also, I noticed a clear decrease at the end of 2015, which I wasn’t sure was related to holidays or some other one-off event in Australia. Given these observations, I calculated lambda using the guerrero feature with only data from 2015 before December (code below), thus avoiding the noticeable decrease, and the lambda result was the same: -0.23. With the same result using this abbreviated dataset, I will go with the initial result of lambda equal to -0.23 as calculated on the whole series.
lambda <- pedestrian %>%
filter(Sensor == 'Southern Cross Station' &
Date < '2015-12-01') %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == 'Southern Cross Station') %>%
autoplot(box_cox(Count, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed pedestrian count at Southern Cross Station with $\\lambda$ = ",
round(lambda,2))))
Consider the last five years of the Gas data from aus_production
.
gas <- tail(aus_production, 5*4) %>% select(Gas)
head(gas)
## # A tsibble: 6 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas %>%
autoplot(Gas)
Trend-cycle shows an increase over the past five years. And the seasonal variance shows lows in Q1 and highs in Q3.
Use classical_decomposition
with type=multiplicative
to calculate the trend-cycle and seasonal indices.
# From section 3.4
dc <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
dc %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of gas production in Australia in petajoules")
Do the results support the graphical interpretation from part A?
Yes, the trend line shows an increase from left to right with a near constant in the middle. The seasonal indices shows an almost perfect seasonal variance over the five-year window.
Compute and plot the seasonally adjusted data.
dc %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Petajoules",
title = "Gas production in Australia") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
# Outlier in beginning
gas_OutFront <- gas
gas_OutFront$Gas[1] <- gas_OutFront$Gas[1] + 300
of_dc <- gas_OutFront %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
of_dc %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of gas production in Australia in petajoules")
of_dc %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Petajoules",
title = "Gas production in Australia") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
# Outlier in middle
gas_OutMid <- gas
gas_OutMid$Gas[11] <- gas_OutMid$Gas[11] + 300
om_dc <- gas_OutMid %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
om_dc %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of gas production in Australia in petajoules")
om_dc %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Petajoules",
title = "Gas production in Australia") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
# Outlier in back
gas_OutBack <- gas
gas_OutBack$Gas[20] <- gas_OutBack$Gas[20] + 300
ob_dc <- gas_OutBack %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components()
ob_dc %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of gas production in Australia in petajoules")
ob_dc %>%
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
labs(y = "Petajoules",
title = "Gas production in Australia") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Yes, based on the plots above, the first outlier at the beginning, clearly shows an anomaly at the beginning of the seasonally adjusted plot, but otherwise, the rest of the seasonally adjusted plot matches that of the original plot. Same for the outlier at the end, except now the anomaly appears at the end, otherwise, the seasonally adjusted plot nearly matches the original seasonally adjusted plot. In the case of the outlier in the middle, the seasonally adjusted plot more closely follows the data line itself, almost replicated the seasonal nature of the data. The outlier in the middle, at least in the example, appears to discredit the value of the seasonally adjusted calculations and plot.
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
library(seasonal)
set.seed(8675309)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Turnover in Queensland Takeaway food services using X-11.")
Yes, the seasonal variance flips over time. The seasonal chart shows spikes of increased turnover whereas later in the plot, the spikes are for the lower turnover values. The plot indicates a few outliers as identified from the “irregular” chart. Overall, the trend plot does not uncover any unusual aspect of the data.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
The results of the decomposition does show a trend line that does closely represent the true plot of the data. The interesting aspect of the decomposition is found in the scales of the season_year and the remainder decompositions. Still being a novice, but I would normally expect the season_year chart to have a smaller gray bar and thus play a larger factor in the data, whereas the decomposition in 3.19 indicates the remainder chart impacts the data to a larger degree than the season_year data. The impact of the remainder plot would appear due to the recession of 1991/92, as that downward turn so impacted the data, that the season_year data couldn’t account for the decrease. The sub-seasonal chart (3.20) does appear to show large changes over time in the months of March, July, August, November and December.
Is the recession of 1991/1992 visible in the estimated components?
Yes, as noted in the answer to section A, the remainder component shows very clear outlier data in 1991 and 1992, which actually give the remainder decomposition more impact than the season_year decomposition based on the scales. The overall data and the trend, show a slight decrease in these years, but the remainder chart shows the drastic decrease during this timeframe.