Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
ge <-global_economy |>
mutate(gdpByPop = GDP/Population) |>
mutate( color = case_when(
Code=='MCO' ~ 'red',
Code=='LIE' ~ 'blue',
Code=='ARE' ~ 'green',
TRUE ~ 'grey') #everything else is grey
)
ge |>
autoplot(gdpByPop) +
theme_minimal()+
theme(legend.position="none")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
ggplot(ge, aes(x = Year, y = gdpByPop, color = color)) +
geom_line() +
scale_color_identity() +
theme_minimal()+
theme(legend.position="none")
## Warning: Removed 37 rows containing missing values or values outside the scale range
## (`geom_line()`).
This highest GDP per capita is Monaco (red line in the second chart) for most of the years. Some time in the late 1970s (oil crisis?), United Arab Emirates (green) was briefly on top. In the 1980s, United Arab Emirates lost ground. Sometime in the 2010s, Liechtenstein (blue) was on top. Though Monaco has been pretty consistently on top, the gap with Liechtenstein has been closing.
A note on the charts: The first gave me a sense of what was was what, and then I mutated the data to be able to see what I wanted to highlight. The second shows the 3 countries mentioned, but I wanted all the other lines for the other countries to be grey, and that is not what happen. The chart did not distinguish between the countries that all had color grey. Felt like I had to give this a try since I suggested it in slack ;)
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
ge |>
filter(Country == 'United States') |>
autoplot(GDP) +
scale_y_continuous(labels = label_number(suffix = "B", scale = 1e-9))+
theme_minimal()
ge |>
filter(Country == 'United States') |>
mutate (adjGDP = GDP /(CPI*100)) |>
autoplot(adjGDP) +
scale_y_continuous(labels = label_number(suffix = "B", scale = 1e-9))+
theme_minimal()
Adjusting the US GDP by factoring CPI not only changes the shape of the curve, but also significantly changes the scale of the Y-axis.
al<-aus_livestock
al |>
filter(Animal == 'Bulls, bullocks and steers') |>
summarise(tot = sum(Count)) |>
autoplot(tot) +
scale_y_continuous(labels = label_number(suffix = "K", scale = 1e-3))+
theme_minimal()
al |>
filter(Animal == 'Bulls, bullocks and steers') |>
summarise(tot = sum(Count)) |> # total across all states
model(
STL(tot ~ trend(window = 11) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()+
theme_minimal()
Did not see anything to transform here, so just tried an STL decomposition and it looks pretty good.
ve <-vic_elec
ve |>
autoplot (Demand)+
theme_minimal()
ve |>
filter(Demand < 7000) |>
autoplot (Demand)+
theme_minimal()
Looks to be some outliers, which should be further explored, but interesting to see the how it looks with out them (filtered where demand >7000).
ap<-aus_production
ap |>
autoplot (Gas)+
theme_minimal()
ap |>
autoplot (log(Gas))+
theme_minimal()
By using log(Gas), we can see the similarities in the range that are
hidden by the change in scale prior to the transformation.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
cg<-canadian_gas
cg |> autoplot(Volume)+
theme_minimal()
Box-Cox transformation would be unhelpful here, as there is not a single pattern that varies in scale. There seem to be 3 patterns: From 1960 - ~1975; ~1975-1990; and >1990.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
ar<-aus_retail |>
filter (`Series ID`=='A3349849A')
ar |> autoplot(Turnover)+
theme_minimal()
lambda <- ar |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
ar |>
autoplot(box_cox(Turnover, lambda)) +
theme_minimal()+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed turnover with $\\lambda$ = ",
round(lambda,2))))
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
ap |> autoplot(Tobacco)+
theme_minimal()
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda2 <- ap |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
ap |>
autoplot(box_cox(Tobacco, lambda2)) +
theme_minimal()+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco with $\\lambda$ = ",
round(lambda2,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
anst<-ansett |>
filter(Class =='Economy' & Airports =='MEL-SYD')
anst |> autoplot(Passengers)+
theme_minimal()
lambda3 <- anst |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
anst |>
autoplot(box_cox(Passengers, lambda3)) +
scale_y_continuous(labels = label_number(suffix = "M", scale = 1e-6))+
theme_minimal()+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Passengers with $\\lambda$ = ",
round(lambda3,2))))
ped<-pedestrian |>
filter(Sensor =='Southern Cross Station')
ped |> autoplot(Count)+
theme_minimal()
lambda4 <- ped |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
ped |>
autoplot(box_cox(Count, lambda4)) +
theme_minimal()+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed pedestrians with $\\lambda$ = ",
round(lambda4,2))))
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
gas |> autoplot(Gas)+
theme_minimal()
Seasonality: Down Q1, up Q2 & 3, back down Q4.
Trend: Overall upward trend.
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot() +
theme_minimal() +
labs(title = "Classical multiplicative decomposition of gas production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Do the results support the graphical interpretation from part
a?
Yes, both seasonality and trend look as described in part a.
Compute and plot the seasonally adjusted data.
x11_gas <- gas |>
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) |>
components()
x11_gas |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
theme_minimal()+
labs(y = "Production",
title = "Total Gas Production") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_out <- gas
gas_out$Gas[[10]]<- gas_out$Gas[[10]]+300
x11_gas_out <- gas_out |>
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) |>
components()
x11_gas_out |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
theme_minimal()+
labs(y = "Production",
title = "Total Gas Production (with outlier in middle)") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
The effect of the outlier in row 10 (2007 Q4) is to smooth out the drop between that quarter and 2008 Q8 in both trend and seasonlly adjusted.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_out2 <- gas
gas_out2$Gas[[20]]<- gas_out2$Gas[[20]]+300
x11_gas_out2 <- gas_out2 |>
model(x11 = X_13ARIMA_SEATS(Gas ~ x11())) |>
components()
x11_gas_out2 |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Gas, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
theme_minimal()+
labs(y = "Production",
title = "Total Gas Production (with outlier at end)") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Putting the outlier in row 20 (2010 Q2) smooths out the trend even further. On the seasonality, we see a small bump around 2008 Q3, and a steep increase at the end, where the outlier is.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
x11_ar <- ar |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
x11_ar |>
ggplot(aes(x = Month)) +
geom_line(aes(y = Turnover, colour = "Data")) +
geom_line(aes(y = season_adjust,
colour = "Seasonally Adjusted")) +
geom_line(aes(y = trend, colour = "Trend")) +
theme_minimal()+
labs(y = "Turnover",
title = "Total Turnover") +
scale_colour_manual(
values = c("gray", "#0072B2", "#D55E00"),
breaks = c("Data", "Seasonally Adjusted", "Trend")
)
Yes, once can more clearly see some outliers here. For example, the two peaks at the right end of the chart.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the
decomposition. Pay particular attention to the scales of the graphs in
making your interpretation.
Looking at Figure 3.19, showing the decomposing of the number of persons
in the civilian labor force in Australia monthly from Feb 1978 to Aug
1995, on sees a definite upward trend. The seasonality is pretty
consistent up until the late 1980s, when the pattern gets a bit more
complex. The scale of the seasonality is quite small compared to the
trend, and also smaller than the remainder. The remainder shows a
significant event in the early 1990s.
Is the recession of 1991/1992 visible in the estimated components?
The 1991/92 recession is visible in most months in the estimated components in Figure 3.20. Jan, Feb, Sept and Oct do not appear down in those years, but all other months do, with particularly steep declines in Mar, Apr, Aug and Nov.