library(tidyverse)
library(fpp3)
library(seasonal)
library(plotly)

EXERCISE 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

head(global_economy,5)

glimpse(global_economy)

## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…

# Find the top 20 countries with the highest GDP

country_codes_df <- read_csv("https://raw.githubusercontent.com/D-hartog/DATA624/refs/heads/main/HW2/wikipedia-iso-country-codes.csv")

## Rows: 246 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): English short name lower case, Alpha-2 code, Alpha-3 code, Numeric ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

codes <- unique(country_codes_df$`Alpha-3 code`)

global_economy_filtered <- global_economy %>% filter(Code %in% codes)

global_economy_filtered <- global_economy_filtered %>% mutate(GDP_per_capita = GDP/Population)

top20 <- global_economy_filtered %>% filter(Year == 2017) %>% arrange(desc(GDP_per_capita)) %>% head(20)

unique(top20$Country)[1:20]

##  [1] Luxembourg           Macao SAR, China     Switzerland         
##  [4] Norway               Iceland              Ireland             
##  [7] Qatar                United States        Singapore           
## [10] Denmark              Australia            Sweden              
## [13] San Marino           Netherlands          Austria             
## [16] Hong Kong SAR, China Finland              Canada              
## [19] Germany              Belgium             
## 263 Levels: Afghanistan Albania Algeria American Samoa Andorra ... Zimbabwe

# Plot time series for the top 20 countries

global_economy_filtered %>% filter(Country %in% unique(top20$Country)[1:20]) %>% autoplot(GDP_per_capita)

Since there are over 200 countries, I thought that this would be difficult to visualize. My thought was to plot the top 20 countries with the highest GDP for the last year data was recorded (2017). The country with the highest recorded GDP per capita in 2017 was Luxembourg. Comparing this to the beginning of the recorded data in 1960, the US had the highest GDP per capita. Over time the GDP per capita of non European countries like Qatar, Special Administrative Regions of China (Hong Kong and Macao), and Singapore have also surpassed many countries.

top20

global_economy_filtered %>% filter(Year == 1960) %>% arrange(desc(GDP_per_capita)) %>% head(20)

EXERCISE 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

a. United States GDP from global_economy.

# Filter global economy data for the the United States
us_economy <- global_economy %>% filter(Country == "United States")

# Add a column with GDP per capita calculation 
us_economy <- us_economy %>% mutate(GDP_per_capita = GDP/Population)

#Plot of the time series of US GDP 
us_economy %>% autoplot(GDP) + labs(title = "US GDP", x = "Year")

The transformation that would be appropriate here is to visualize the time series of the GDP per capita to adjust for the population increase over time. This does not have a large effect on the shape of the trend line but we can see that the trend does become slight more linear than just plotting GDP.

# Plot of the time series of US GDP per capita
us_economy %>% autoplot(GDP_per_capita) + labs(title = "US GDP Per Capita", x = "Year")

b. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

aus_livestock %>% filter((Animal == "Bulls, bullocks and steers") & (State == "Victoria"))

# plot time series of Slaughter of Victorian “Bulls, bullocks and steers”
aus_livestock %>% filter((Animal == "Bulls, bullocks and steers") & (State == "Victoria")) %>% autoplot(Count) + labs(title = "Number of Slaughtered Bulls, Bullocks, Steers in Victoria, AU", x = "Month")

c. Victorian Electricity Demand from vic_elec.

head(vic_elec, 5)

vic_elec %>% autoplot(Demand) + labs(title = "Vicotria Electricity Demand", y = "Demand (MWh)", x = "Time (30 minute intervals)")

d. Gas production from aus_production.

When plotting the original data we can see that there is a consistent increase in variation as time goe on. In order to make the variation in the series more homogenous, an appropriate strategy might involve a log transformation.

aus_production %>% autoplot(Gas) + labs(title = "Australia Gas Production", y = "Gas (petajoules)")

aus_production %>% autoplot(log(Gas)) + labs(title = "Australia Gas Production: Log Transformed", y = "Gas (petajoules)")

Applying the log transformation, we can now see more consistent variation between the first 15 years of the time series and the last 30 years.

EXERCISE 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

head(canadian_gas)

# Canadian Gas Data 
canadian_gas %>% autoplot() + labs(title = "Monthly Canadian Gas Production", y = "Billions (cubic meters)")

## Plot variable not specified, automatically selected `.vars = Volume`

A Box-Cox transformation would not be helpful for the Canadian Gas data as the variation in the data does not consistently only increase or only decrease throughout the series. We see that the variation increases until about 1993 and then begins to decrease.

lambda <- canadian_gas |>
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)

canadian_gas |>
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "",
       title = paste0(
         "Transformed Monthly Canadian Gas Production with lambda = ", round(lambda, 4)))

Implementing the code to find the optimal lambda value and applying a Box-Cox transformation to the Canadian Gas data we don’t see much change in the heterogeneity of the variation across the series.

EXERCISE 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):

set.seed(123)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>% autoplot(Turnover) + labs(title = "Victoria Household and Goods Retail Turnover")

The Box-Cox transformaiton that I would choose for this series would be a log transformation as the variation in the data consistently increases as time goes on. In the graph below we can see how applying a log transformation evens out the seasonal variation that me might see from year to year.

myseries %>% autoplot(log(Turnover)) + labs(title = "Log Transformed:Victoria Household and Goods Retail Turnover")

EXERCISE 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

Since Box-Cox transformations rely on the value of \(\lambda\) we can use the features function, setting features to guerrero to find the best lambda. A good value of \(\lambda\) is one which makes the size of the seasonal variation about the same across the whole series, in turn making forcasting easier as well.

a. Tobacco from aus_production

tobacco <- aus_production %>% select(Tobacco)

tobacco %>% autoplot(Tobacco) + labs(title = "Australian Tobacco and Cigarette Production", x = "Tonnes")

lambda <- tobacco %>%
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

tobacco |>
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = paste0(
         "Transformed Tobacco Production with lambda = ", round(lambda, 4)))

In this scenario the labda is close to 1 so we don’t see much change in the shape of the time series but the values are shifted down.

b. Economy class passengers between Melbourne and Sydney from ansett

economy_mel_syd <- ansett %>% filter((Class == "Economy") & (Airports == "MEL-SYD"))

economy_mel_syd %>% autoplot(Passengers) + labs(title = "Number of Passengers in Economy Class Between Melbourne and Sydney")

lambda <- economy_mel_syd %>%
  features(Passengers, features = guerrero) |>
  pull(lambda_guerrero)

economy_mel_syd %>% 
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "",
       title = paste0(
         "Transformed Passengers with lambda = ", round(lambda, 4)))

The value of lambda beng close to 2 shifts the values up and we can see an attempt at creating more consistent variation across the series.

c. Pedestrian counts at Southern Cross Station from pedestrian.

southern_cross <- pedestrian %>% filter(Sensor == "Southern Cross Station")

southern_cross %>% autoplot(Count) + labs(title = "Pedestrian Counts in Melbourne")

lambda <- southern_cross %>%
  features(Count, features = guerrero) |>
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "",
       title = paste0(
         "Transformed Southern Cross Pedestrian Counts with lambda = ", round(lambda, 4)))

EXERCISE 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)
gas

head(gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas %>% autoplot(Gas) + labs(title = "Australian Gas Production", 
                             y = "Gas (petrajoules)", 
                             x = "Quarter")

Visualizing the time series, there is an upward trend in the amount that is being produced as well as seasonal fluctuations. It appears that more gas is produced during Q2 increasing to Q3, dropping back down in Q4. This seasonal fluctuations are consistent across the time series.

b. Use classical_decomposition with type: multiplicative to calculate the trend-cycle and seasonal indices.

classical_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>% 
  components()

classical_decomp %>% autoplot()

### c. Do the results support the graphical interpretation from part a?

Yes, the results of the multiplicative decomposition supports my interpretation above. We can see an increasing linear trend, and clear and consistent seasonality.

d. Compute and plot the seasonally adjusted data.

classical_decomp %>% autoplot(season_adjust)

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

set.seed(543)
row <- sample(x = nrow(gas), size = 1)

# Random row before adjustment 
gas[row, ]

gas[row, "Gas"] <- gas[row, "Gas"] + 300

# Random row after adjustment 
gas[row, ]

classical_decomp2 <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>% 
  components()

classical_decomp2 %>% autoplot(season_adjust)

The first noticable change in the seasonal adjustment is that we see a big spike at the specific year and quarter where 300 was added to that value. The upward trend in the seasonal adjustment plot is also eliminated from the graph. The outlier also effects the seasonal pattern. Before the outlier Q1 saw relatively low production of gas which was reversed when the outlier was added.

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Add outlier to the end of the series

gas <- tail(aus_production, 5*4) %>% select(Gas)

# Add an outlier to the beginning of the series 

gas[20, ]

gas[20, "Gas"] <- gas[20, "Gas"] + 300

# Random row after adjustment 
gas[20, ]

# Plot the seasonsal adjusted data. 
classical_decomp3 <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>% 
  components()

classical_decomp3 %>% autoplot(season_adjust)

Adding the outlier to the end of the series does make a difference. We do see a slight upward trendin the seasonally adjusted values as was the case without the outlier. It is hard to see since the limits of the y - axis have been increased to fit the value of the outlier. Another difference between adding the outlier to the end of the series is that the seasonal pattern is more or less preserved.

EXERCISE 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

# View the first 6 rows of the tsibble that was created from exercsie 3.4 that also referenced exercise 7 from 2.10
head(myseries)

myseries %>% autoplot(Turnover) + labs(title = "Victoria Household Goods Turnover", x = "Millions (AUD)")

myseries%>% gg_season(Turnover) + labs(title = "Victoria Household Goods Turnover by Year", x = "Millions (AUD)")

myseries %>% gg_subseries(Turnover) + labs(title = "Victoria Household Goods Turnover: Monthly Subseries")

myseries %>% gg_lag(Turnover, lags = 1:12, geom = "point") + labs(title = "Lag plot")

myseries %>% ACF(Turnover, lag_max = 24) %>% autoplot() + labs(title = "Victoria Household Goods Turnover: Autocorrelation Plot")

head(myseries)

# Decompose series using x11
x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Household Goods Retail Turnover using X-11.")

Decomposing this series using x11 does reveal some interesting findings that are not easily seen using autoplot, gg_season, gg_subseries, lag plots or the autocorrelation plot. There seems to be an unusually low peak in the early 2000’s seen in the time series plot. This point also correlates with a spike in the remaining data. What is interesting is that mush of a diffence in the seasonal pattern around this time.

EXERCISE 3.9

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

In the decomposition that is shown in figure 3.19 we see that the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995 is decomposed to visualize the trend, seasonal patterns and the remainder. There is a clear upward trend where the scale of the trend and time series values are similar as the trend line is a reflection of the global pattern over time. In the seasonal component we see regular peaks and valleys across years indicating strong seasonality. The scale of the seasonal component goes from approx -100 to 100 and using the sub series plot we can draw some conclusions. The months of February to May, September, December, and July (only in the second half of the series) we see that there are people being added to the civilian labor force. Since all other months are below 0, we can conclude that in these months the number of people is decreasing. Unlike the previous these values would need to be subtracted from the trend line to get the values in the actual data. Lastly, in the remainder component we see values that are between 100 and -100, indicating that a fair amount of the variability in the labor data can be explained by random noise.

b. Is the recession of 1991/1992 visible in the estimated components?

The recession of 1991/1992 is visible in the estimated remainder components. Since there are large negative values in the remainder during these years, versus any irregularities in the trend or seasonal component we can interpret this noise as an event other than seasonal fluctuations or the over all trend. In this case the event would be a recession.

DATA 624: Homework 2

Dirk Hartog

2025-02-13