library(fpp3)
library(USgas)
library(gridExtra)
library(ggplot2)
library(cowplot)
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time? Dataset Overview
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
Overview of the global GDP per capita over time.
labels = ylab("GDP per capita")
gdpPlot <- global_economy |> filter(!is.na(GDP)) |> ggplot(aes(x=Year,y = GDP/Population,colour = Country))+ geom_line() + theme(legend.position = "none" ) + labels + ggtitle("Global GDP per Capita Over time")
Since it is not possible to display the data with the legend for each country, let’s display a subset of the data that corresponds to the top 5 countries with the largest DGP per capital.
gl_eco_m <- global_economy |> as_tibble() |> filter(!is.na(GDP)) |> group_by(Country) |> mutate(GDP_CAPITA = GDP/Population) |> summarise(GDP_TOTAL = sum(GDP_CAPITA)) |> top_n(5)
top5 <- global_economy |> filter(Country %in% gl_eco_m$Country) |> ggplot(aes(x=Year,y = GDP/Population,colour = Country))+ geom_line() + labels + ggtitle("Top 5 Global GDP per Capita Over Time")
plot_grid(gdpPlot,top5,ncol = 1)
Since 1970, Monaco has had the highest GDP per capita. The yearly data shows that Monaco generally follows an upward trend with some periodic fluctuations.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy.
global_economy |> filter(Country =="United States") |> autoplot(.vars = GDP)
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
victoria <- aus_livestock |> filter(Animal == "Bulls, bullocks and steers", State == "Victoria")
victoria |> autoplot(Count)
Victorian Electricity Demand from vic_elec
vic_elec |> autoplot(.vars = Demand )
Gas production from aus_production.
aus_production |> features(Gas,features = "guerrero") |> pull(lambda_guerrero) -> lambda
aus_production |> autoplot(box_cox(Gas,lambda)) + ggtitle(paste("Transformed Gas Production with lambda",as.character(round(lambda,2))))
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
head(canadian_gas)
## # A tsibble: 6 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
label <- labs(title= "Monthly Canadian Gas Production", y = "Billions of cubic meters")
canadian_gas |> features(Volume,features = "guerrero") |> pull(lambda_guerrero) -> lambda
label2 <- labs(title= paste("Monthly Canadian Gas Production with Lambda", round(lambda,2) ), y = "Billions of cubic meters")
rawCanGas <- canadian_gas |> autoplot(Volume) + label
coxBoxCanGas <- canadian_gas |> autoplot(box_cox(Volume,lambda)) + label2
plot_grid(rawCanGas,coxBoxCanGas,ncol = 2)
The Box-Cox transformation may not be helpful for the canadian_gas dataset because the data likely exhibits strong seasonality or trends, which the transformation does not address. Additionally, if the variance in the dataset is already stable, applying the transformation would offer no benefit. The transformation is more suited for data with multiplicative structures or non-constant variance, and if the data follows an additive pattern, the Box-Cox transformation would not improve model performance or interpretability. Therefore, its application is unnecessary for this dataset.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(123)
ausRetailSamp <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
plot1 <- ausRetailSamp |> autoplot(Turnover)+ ggtitle("Turnover by Month")
ausRetailSamp |> features(Turnover,features = "guerrero") |> pull(lambda_guerrero) -> lambda
plot2 <- ausRetailSamp |> autoplot(box_cox(Turnover,lambda)) + ggtitle(paste("Turnover by Month (Box-Cox Transformation) with Lambda = ",round(lambda,2)))
plot3 <- ausRetailSamp |> autoplot(sqrt(Turnover)) + ggtitle("Turnover by Month (Square Root Transformation)")
plot4 <- ausRetailSamp |> autoplot(Turnover^(1/3)) + ggtitle("Turnover by Month (Cubic Root Transformation)")
plot5 <- ausRetailSamp |> autoplot(log(Turnover)) + ggtitle("Turnover by Month (Log Transformation)")
plot1
plot2
plot3
plot4
plot5
I would choose the box-cox transformation since other transformations have almost the same effect on the data.
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
plot1 <- aus_production |> autoplot(.vars = Tobacco)
plot1
aus_production |> features(Tobacco,features = "guerrero") |> pull(lambda_guerrero) -> lambda
plot2 <- aus_production |> autoplot(box_cox(Tobacco,lambda)) + ggtitle(paste("Turnover by Quarter (Box-Cox Transformation) with Lambda = ",round(lambda,2)))
plot_grid(plot1,plot2,ncol = 1)
Since lambda is close to 1, the transformation barely alters the data because the variance is already considered stable, or no transformation is needed to normalize the data.
Economy class passengers between Melbourne and Sydney
ecoClass <- ansett |> filter(Class == "Economy",Airports == "MEL-SYD")
ecoClass |> features(Passengers,features = "guerrero") |> pull(lambda_guerrero) -> lambda
plt1 <- ecoClass |> autoplot(Passengers) +ggtitle("Passengers Count")
plt2 <- ecoClass |> autoplot(box_cox(Passengers,lambda)) + ggtitle(paste("Passengers Count (Box-Cox Transformation) with Lambda =",round(lambda,2)))
plot_grid(plt1,plt2,ncol = 1)
The Box-Cox transformation with a lambda of 2 appears to have compressed
one specific zone of the chart (around 1988-1989), particularly where
the values are smaller, while maintaining the overall shape of the data.
This occurs because the nonlinear transformation affects smaller values
less dramatically, while larger values are stretched more
significantly.
Pedestrian counts at Southern Cross Station
pedestrianF <- pedestrian |> filter(Sensor == "Southern Cross Station")
# pedestrianF
pedestrianF |> features(Count,features = "guerrero") |> pull(lambda_guerrero) -> lambda
plt1 <- pedestrianF |> autoplot(Count) +ggtitle("Pedestrian Count")
plt2 <- pedestrianF |> autoplot(box_cox(Count,lambda)) + ggtitle(paste("Pedestrian Count (Box-Cox Transformation) with Lambda = ",round(lambda,2)))
plot_grid(plt1,plt2,ncol = 1)
The Box-Cox transformation with a lambda of -0.25 had a minimal impact
on the overall data, as the shape remained largely unchanged. However,
values that were previously zero or near zero have shifted close to one.
This occurs because the transformation prevents zero values by slightly
adjusting them upward, while the mild effect of the small negative
lambda has only slightly compressed larger values.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas
## # A tsibble: 20 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
## 7 187 2007 Q1
## 8 234 2007 Q2
## 9 245 2007 Q3
## 10 205 2007 Q4
## 11 194 2008 Q1
## 12 229 2008 Q2
## 13 249 2008 Q3
## 14 203 2008 Q4
## 15 196 2009 Q1
## 16 238 2009 Q2
## 17 252 2009 Q3
## 18 210 2009 Q4
## 19 205 2010 Q1
## 20 236 2010 Q2
Can you identify seasonal fluctuations and/or a trend-cycle?
The last five years of the data show clear seasonality, with a noticeable spike in gas production in Q1 each year, reaching a peak in Q3, and then decreasing until the next Q1. The data also indicates an upward trend, where the trough at the beginning of each Q1 is higher than the trough of the previous year.
gas |> autoplot(Gas) +scale_x_yearquarter(breaks = scales::breaks_width("3 month")) + theme(axis.text.x = element_text(angle = 45, hjust = 1))
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gasDec <- gas |> model(classical_decomposition(Gas,type="multiplicative")) |> components()
gasDec |> autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Do the results support the graphical interpretation from part a? They do, the trend component shows the upwards trend and a strong seasonal pattern.
Compute and plot the seasonally adjusted data.
gas |> autoplot(Gas) + autolayer(gasDec,season_adjust,color="red")
# gasDec |> autoplot(season_adjust)
The red trend line represents the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data.
gasDecOutlier <- gas |> mutate(Gas=if_else(Quarter == yearquarter("2008Q1"), Gas + 300, Gas)) |> model(classical_decomposition(Gas,type="multiplicative")) |> components()
gasDecOutlier |> autoplot()
What is the effect of the outlier? The outlier changes the overall patterns in the data.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gasDecOutlierEnd <- gas |> mutate(Gas = if_else(Quarter == yearquarter("2006Q1"), Gas + 1000, Gas)) |> model(classical_decomposition(Gas,type="multiplicative")) |> components()
gasDecOutlierBe <- gas |> mutate(Gas = if_else(Quarter == yearquarter("2009Q1"), Gas + 1000, Gas)) |> model(classical_decomposition(Gas,type="multiplicative")) |> components()
gasDecOutlierEnd |> autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
gasDecOutlierBe |> autoplot()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
As shown in the plots, the outliers at the beginning and end of the series significantly altered the overall shape of the components—trend, seasonality, and random fluctuations. Interestingly, while the seasonal pattern has changed due to the outliers, the data still exhibits a different but consistent cyclical pattern. This indicates that although the timing or intensity of seasonality has been affected, the seasonal component continues to follow a regular, though modified, cycle.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
ausRetailSamp |> model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |> components() -> TurnoverX11dcmp
TurnoverX11dcmp |> autoplot() + ggtitle("Retail Turnover using X-11 Decomposition")
The X-11 decomposition residuals indicate the presence of outliers
around 1982-1983 and 2001-2002, which are not apparent in the standard
time series plot. Aside from this, as the original data suggests, there
is seasonality following an upward trend.
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Figure 3.19:
Decomposition of the number of persons in the civilian labour force in
Australia each month from February 1978 to August 1995.
Figure
3.20: Seasonal component from the decomposition shown in the previous
figure.
1.Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. The STL decomposition breaks the time series into trend, seasonal, and remainder components. The ‘value’ graph shows an overall upward trend with seasonal fluctuations. The ‘trend’ graph confirms a positive trend over time, while the ‘season_year’ graph captures stable seasonal patterns within each year. Finally, the ‘remainder’ graph displays the residuals after removing the trend and seasonal components.
2.Is the recession of 1991/1992 visible in the estimated components? The recession around 1991-1992 is mildly evident in the standard time series chart, where troughs stand out compared to the usual oscillations in the data. However, the residuals make this recession much more apparent, clearly revealing two significant troughs during that period.