Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. .
Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
The global_economy dataset contains GDP and
Population.
We will calculate GDP per capita by dividing GDP by Population.
Identify the country with the highest GDP per capita overall
## # A tsibble: 7 x 3 [1Y]
## # Key: Country [2]
## Country Year max_gdp_per_capita
## <fct> <dbl> <dbl>
## 1 Monaco 2014 185153.
## 2 Monaco 2008 180640.
## 3 Liechtenstein 2014 179308.
## 4 Liechtenstein 2013 173528.
## 5 Monaco 2013 172589.
## 6 Monaco 2016 168011.
## 7 Liechtenstein 2015 167591.
To investigate this inquiry in greater depth and ascertain which country possesses the highest GDP per capita, we will focus on nations within the upper echelon, applying a filter for those with a GDP per capita exceeding 80,000 USD.
global_economy %>%
filter(GDP_per_capita > 80000) %>%
autoplot(GDP_per_capita) +
labs(title= "GDP per capita", y = "$US")
The chart above shows that Monaco had significant GDP growth from 2000 to 2014, culminating in a record GDP per capita of 185152.5 in 2014.
monaco_gdp <- global_economy %>%
filter(Country == "Monaco") %>%
select(Year, GDP_per_capita) # Select only the Year and GDP columns
monaco_gdp_ts <- monaco_gdp %>%
as_tsibble(index = Year)
monaco_gdp_ts %>%
gg_lag(GDP_per_capita) +
labs(title = "Lag Plot of Monaco's GDP",
x = "Lagged GDP",
y = "GDP",
caption = "Source: global_economy dataset from fpp3")
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
- United States GDP from global_economy.
- Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
- Victorian Electricity Demand from vic_elec.
- Gas production from aus_production.
data("global_economy")
us_gdp <- global_economy %>% filter(Country == "United States")
vic_bulls <- aus_livestock %>% filter(Animal == "Bulls, bullocks and steers", State == "Victoria")
vic_elec_demand <- vic_elec
gas_production <- aus_production
1. Analyze United States GDP from global_economy
us_gdp %>%
ggplot(aes(x = Year, y = GDP)) +
geom_line() +
labs(title = "U.S. GDP Over Time",
x = "Year",
y = "GDP (in $USD)")
The GDP series shows exponential growth, suggesting a log transformation might be useful.
us_gdp_transformed <- us_gdp %>%
autoplot(log(GDP)) + ggtitle("Log Transformed USA GDP")
us_gdp_transformed
The log transformation makes the growth appear more linear and stabilizes the variance.
2. Analyze Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
vic_bulls %>%
autoplot(Count) +
ggtitle("Slaughter of Victorian Bulls, bullocks and steers")
The series is showing more and more variation as time goes on, which hints that a log transformation could be helpful.
# Apply log transformation
vic_bulls_transformed <- vic_bulls %>%
mutate(log_Count = log(Count))
vic_bulls_transformed %>%
autoplot(log_Count) +
ggtitle("Log Transformed Slaughter of Victorian Bulls, bullocks and steers")
The log transformation stabilizes the variance in the slaughter series.
3. Victorian Electricity Demand from vic_elec.
vic_elec_demand %>%
autoplot(Demand) +
ggtitle("Victorian Electricity Demand")
The series seems to have a pretty consistent variance, but we could try a log transformation to check if it makes the series better.
# Apply log transformation
vic_elec_demand_transformed <- vic_elec_demand %>%
mutate(log_Demand = log(Demand))
vic_elec_demand_transformed %>%
autoplot(log_Demand) +
ggtitle("Log Transformed Victorian Electricity Demand")
4. Gas production from aus_production.
The log transformation doesn’t drastically change the series, but it might help in stabilizing variance slightly.
gas_production %>%
autoplot(Gas) +
ggtitle("Gas Production")
The gas production series shows exponential growth, suggesting a log transformation might be useful.
# log transf
gas_production_transformed <- gas_production %>%
mutate(log_Gas = log(Gas))
# Plot
gas_production_transformed %>%
autoplot(log_Gas) +
ggtitle("Log Transformed Gas Production")
Using a log transformation helps to make the growth look more like a straight line and keeps the variance steady.
Why is a Box-Cox transformation unhelpful for the
canadian_gas data?
gas_data <- canadian_gas
head(gas_data)
## # A tsibble: 6 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
# Calculate optimal lambda
lambda <- gas_data %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
#Box-Cox transformation
gas_transformed <- gas_data %>%
mutate(box_cox = box_cox(Volume, lambda))
gas_transformed %>%
autoplot() + labs(title = " Monthly Canadian gas production", x = "Months", y = "volume")
## Plot variable not specified, automatically selected `.vars = Volume`
gas_transformed %>%
autoplot() + labs(title = "Monthly Canadian gas production Transformed", x = "Months", y = "box_cox")
## Plot variable not specified, automatically selected `.vars = Volume`
The Box-Cox transformation isn’t really useful for the canadian_gas data since it doesn’t tackle the main seasonal trends and could make things harder to interpret, all without offering any real benefits for forecasting.
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
retail_series <- aus_retail %>%
filter(Industry == "Food retailing")
head(retail_series,7)
## # A tsibble: 7 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 Australian Capital Territory Food retailing A3349688X 1982 Apr 15.5
## 2 Australian Capital Territory Food retailing A3349688X 1982 May 15.1
## 3 Australian Capital Territory Food retailing A3349688X 1982 Jun 15.5
## 4 Australian Capital Territory Food retailing A3349688X 1982 Jul 16.1
## 5 Australian Capital Territory Food retailing A3349688X 1982 Aug 15.8
## 6 Australian Capital Territory Food retailing A3349688X 1982 Sep 16
## 7 Australian Capital Territory Food retailing A3349688X 1982 Oct 16.6
retail_series%>%autoplot() + ggtitle("Australian Retail Trade Turnover")
## Plot variable not specified, automatically selected `.vars = Turnover`
We apply the guerrero method to find the best lambda
value for the Box-Cox transformation. This approach picks the lambda
that reduces the coefficient of variation for the transformed data.
Determine the optimal Box-Cox transformation parameter (lambda)
# Determine the optimal Box-Cox transformation parameter (lambda)
lambda <- retail_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.30276595 0.08786620 -0.06280256 0.07636928 0.08473322 0.09023149
## [7] 0.05068016 0.11791154
For box-cox transformation I will use lambda from the Guerrero feature with value of it’s 0.30276595.
lambda<-0.08786620
# Apply the Box-Cox transformation
retail_series_transformed <- retail_series %>%
mutate(Turnover_Transformed = box_cox(Turnover, lambda))
retail_series_transformed%>%autoplot(box_cox(Turnover,lambda)) + ggtitle("Transformed Retail Series")
By displaying the original series next to the transformed one, we can get a good look at the effects of the Box-Cox transformation. The transformed series is expected to show a more uniform variance over time than the original series does.
For the following series, find an appropriate Box-Cox transformation
in order to stabilise the variance. Tobacco from
aus_production, Economy class passengers between Melbourne
and Sydney from ansett, and Pedestrian counts at Southern
Cross Station from pedestrian.
tobacco_data <- aus_production %>%
select(Quarter, Tobacco)
head(tobacco_data,3)
## # A tsibble: 3 x 2 [1Q]
## Quarter Tobacco
## <qtr> <dbl>
## 1 1956 Q1 5225
## 2 1956 Q2 5178
## 3 1956 Q3 5297
economy_data <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
select(Week, Passengers)
head(economy_data,3)
## # A tsibble: 3 x 2 [1W]
## Week Passengers
## <week> <dbl>
## 1 1987 W26 20167
## 2 1987 W27 20161
## 3 1987 W28 19993
pedestrian_data <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
select(Date, Count)
head(pedestrian_data,3)
## # A tsibble: 3 x 3 [1h] <Australia/Melbourne>
## Date Count Date_Time
## <date> <int> <dttm>
## 1 2015-01-01 746 2015-01-01 00:00:00
## 2 2015-01-01 312 2015-01-01 01:00:00
## 3 2015-01-01 180 2015-01-01 02:00:00
Calculate optimal lambda values
lambda_tobacco <- aus_production %>%
select(Tobacco) %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
lambda_economy <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
select(Passengers) %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
lambda_pedestrian <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
select(Count) %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
#lambda_tobacco
lambda_tobacco
## [1] 0.9264636
create_comparison_chart <- function(data, original_col, time_col, lambda, title) {
# Original series
original_chart <- ggplot(data, aes_string(x = time_col, y = original_col)) +
geom_line(color = 'blue') +
labs(title = paste("Original", title),
x = time_col,
y = original_col) +
theme_minimal()
# Transformed series
transformed_chart <- ggplot(data, aes_string(x = time_col, y = paste("box_cox(", original_col, ", ", lambda, ")", sep = ""))) +
geom_line(color = 'red') +
labs(title = paste("Box-Cox Transformed", title),
x = time_col,
y = paste("Transformed", original_col)) +
theme_minimal()
return(list(original = original_chart, transformed = transformed_chart))
}
# Tobacco charts
tobacco_charts <- create_comparison_chart(tobacco_data, "Tobacco", "Quarter",
lambda_tobacco, "Tobacco Production")
tobacco_charts
## $original
##
## $transformed
A lambda value λ = 0.9264636 which is close to 1 suggests that the original data is already nearly normally distributed, and the Box-Cox transformation may not significantly alter the data’s shape. When lambda equals one, the data shifts down, but the shape does not change. If the optimal value for lambda is one, the data is already normally distributed, making the Box-Cox transformation unnecessary.
#lambda_economy
lambda_economy
## [1] 1.999927
# Economy passenger charts
economy_charts <- create_comparison_chart(economy_data, "Passengers", "Week",
lambda_economy, "Economy Passengers")
economy_charts
## $original
##
## $transformed
A lambda value λ = 1.999927 which is greater than 1 suggests that the original data is left-skewed, meaning more observations are around higher values. In this case, the Box-Cox transformation applies a power transformation with lambda > 1 to achieve a more normal distribution.
#lambda_pedestrian
lambda_pedestrian
## [1] -0.2501616
# Pedestrian charts
pedestrian_charts <- create_comparison_chart(pedestrian_data, "Count", "Date",
lambda_pedestrian, "Pedestrian Count")
pedestrian_charts
## $original
##
## $transformed
A negative lambda value λ = -0.2501616 indicates that the data is right-skewed and that the Box-Cox transformation involves an inverse or reciprocal transformation.
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) |> select(Gas)
gas_ts <- ts(gas, frequency = 4, start = c(2017, 1)) |> as_tsibble()
head(gas_ts)
## # A tsibble: 6 x 3 [1Q]
## # Key: key [1]
## index key value
## <qtr> <chr> <dbl>
## 1 2017 Q1 Gas 221
## 2 2017 Q2 Gas 180
## 3 2017 Q3 Gas 171
## 4 2017 Q4 Gas 224
## 5 2018 Q1 Gas 233
## 6 2018 Q2 Gas 192
gas_ts %>%
autoplot() + ggtitle("Quarterly Gas production")+
labs(title = "Gas Production Time Series", x = "Quarter", y = "Gas Production")
## Plot variable not specified, automatically selected `.vars = value`
The graph presented above illustrates distinct seasonal fluctuations alongside a production trend in the data pertaining to Australian gas production over the past five years.
decomp <- gas_ts %>%
model(classical_decomposition(value, type = "multiplicative")) %>%
components()
decomp %>%
autoplot() + labs(title = "Classical multiplicative decomposition of Gas Production in Australia")
The components validate the conclusions drawn in Part A concerning the upward trajectory and the seasonal variations observed. Furthermore, the data does not exhibit significant randomness.
adjusted_gas <- gas_ts |>
mutate(
adj_gas = value / decomp$seasonal
)
ggplot(adjusted_gas, aes(x = index, y = adj_gas)) +
geom_line() +
labs(title = "Seasonally Adjusted Gas Production",
x = "Quarter",
y = "Gas Production (Seasonally Adjusted)") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(figure.figsize = c(5, 5))
e.Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_ts_outlier <- gas_ts
gas_ts_outlier$value[10] <- gas_ts_outlier$value[10] + 300
decomp_outlier <- gas_ts_outlier |>
model(classical_decomposition(value, type = "multiplicative")) |>
components()
adjusted_gas_outlier <- gas_ts_outlier |>
mutate(
adj_gas = value / decomp_outlier$seasonal
)
ggplot(adjusted_gas_outlier, aes(x = index, y = adj_gas)) +
geom_line() +
labs(title = "Seasonally Adjusted Gas Production with Outlier",
x = "Quarter",
y = "Gas Production (Seasonally Adjusted)") +
theme(plot.title = element_text(hjust = 0.5), # Center the title
figure.figsize = c(7, 5)) # Adjust figure size (width, height in inches)
The presence of the outlier significantly alters the seasonally adjusted data. The advantages of utilizing seasonally adjusted figures are negated, as the underlying trend becomes indiscernible. This outlier, positioned centrally within the dataset, manifests prominently at the onset of the seasonally adjusted data visualization.
gas_ts_outlier_end <- gas_ts
gas_ts_outlier_end$value[20] <- gas_ts_outlier_end$value[20] + 300
decomp_outlier_end <- gas_ts_outlier_end |>
model(classical_decomposition(value, type = "multiplicative")) |>
components()
adjusted_gas_outlier_end <- gas_ts_outlier_end |>
mutate(
adj_gas = value / decomp_outlier_end$seasonal
)
ggplot(adjusted_gas_outlier_end, aes(x = index, y = adj_gas)) +
geom_line() +
labs(title = "Seasonally Adjusted Gas Production with Outlier at End",
x = "Quarter",
y = "Gas Production (Seasonally Adjusted)") +
theme(plot.title = element_text(hjust = 0.5)) + # Center the title
theme(aspect.ratio = 1) # Ensure the plot is square (height = width)
Comparing the results with the outlier in the middle, we can determine if the outlier’s position affects the seasonally adjusted data differently. Classical decomposition methods are sensitive to outliers, and their impact can vary depending on the outlier’s location within the time series.
The effect of the outlier is that it distorts the seasonal indices, which in turn affects the seasonally adjusted data. The impact of the outlier depends on its position within the time series.
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11 <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11)
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
The figures presented illustrate the decomposition of the Australian civilian labor force from February 1978 to August 1995, employing the STL method. This decomposition categorizes the data into three distinct components: trend, seasonal, and remainder.
In Figure 3.19, the original data (value) is depicted alongside the long-term trend, seasonal variations (season_year), and the residual fluctuations (remainder). The trend component indicates a consistent upward trajectory in the labor force over the observed period. The seasonal component highlights predictable annual patterns, which likely correspond to variations in seasonal employment. The remainder component captures short-term, irregular fluctuations.
Figure 3.20 emphasizes the seasonal component, providing a clearer representation of the cyclical patterns. The interpretation of these graphs is significantly influenced by the differing scales on the y-axis between the original data and the decomposed components.
The recession of 1991/1992 is not distinctly observable within the estimated components. Although the remainder component exhibits some negative fluctuations during this timeframe, these variations are not markedly larger than other fluctuations throughout the series. The overarching trend continues to rise, thereby obscuring the recession’s impact on the decomposed components.