DATA 624: Homework 2

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code

3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

For my plot of GDP per capita for each country over time, I removed the labels since there were too many and it prevented me from seeing the graph. I included a code to show a list of GDP per captia, and it looks like Monaco is the country with the highest GDP per capita followed by Liechtenstein. From the plot, we see that the GDP per capita of Monaco increases over time with an overall upward trend, including some periods of downward trends. The more prominent downward trends appear around the mid-1980s and early 2000s. Liechtenstein follows close behind, especially towards 2013 to 2014, even though their GDP per capita trailed quiet a bit in all the years prior.

ge <- global_economy

global_economy %>%
  autoplot(GDP/Population, show.legend = FALSE) +
  labs(title= "GDP per capita", y = "$US")

## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).

ge3<- global_economy  %>%
  mutate(gdp_pop = GDP/Population) %>%
  arrange(desc(gdp_pop))

## Warning: Current temporal ordering may yield unexpected results.
## ℹ Suggest to sort by `Country`, `Year` first.

ge3

## # A tsibble: 15,150 x 10 [1Y]
## # Key:       Country [263]
##    Country    Code   Year    GDP Growth   CPI Imports Exports Population gdp_pop
##    <fct>      <fct> <dbl>  <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>   <dbl>
##  1 Monaco     MCO    2014 7.06e9  7.18     NA      NA      NA      38132 185153.
##  2 Monaco     MCO    2008 6.48e9  0.732    NA      NA      NA      35853 180640.
##  3 Liechtens… LIE    2014 6.66e9 NA        NA      NA      NA      37127 179308.
##  4 Liechtens… LIE    2013 6.39e9 NA        NA      NA      NA      36834 173528.
##  5 Monaco     MCO    2013 6.55e9  9.57     NA      NA      NA      37971 172589.
##  6 Monaco     MCO    2016 6.47e9  3.21     NA      NA      NA      38499 168011.
##  7 Liechtens… LIE    2015 6.27e9 NA        NA      NA      NA      37403 167591.
##  8 Monaco     MCO    2007 5.87e9 14.4      NA      NA      NA      35111 167125.
##  9 Liechtens… LIE    2016 6.21e9 NA        NA      NA      NA      37666 164993.
## 10 Monaco     MCO    2015 6.26e9  4.94     NA      NA      NA      38307 163369.
## # ℹ 15,140 more rows

global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Monaco", y = "$USD")

## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).

3.2 For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

Below is a plot of the United States GDP from global_economy. A transformation did not seem approproate here since we do not see much variation. However, I did do a population adjustment by creating GDP per capita, since this tells us more about the economic situation because population size affects GDP. The overall trend of the plot does not change much, just the y axis - which is expected since we calculated GDP per captia. The adjusted plot also has an overall increasing upward trend. Both plots show the decrease in GDP per capita in the US around the recession of 2009.

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP)

#Transformed to GDP per capita

global_economy %>%
  filter(Country == "United States") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita", y = "$USD")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

Below is a plot of the count of slaughtered Victorian bulls, bullocks and steers in Australia from 1972 to 2018. A transformation did seem appropriate here because there is some variation. However, the variation is quiet similar throughout but I wanted to see if a transformation might help the variation. The original plot, logged transformation plot, and cubed transformation plot are all somewhat similar. These transformations did not do much for the variation. Looking at the trends, we see some type of visible cyclicity here. The trend starts off quiet high in the 1970s, with a drastic drop moving towards the 1980s. The rest of the time series fluctuates throughout, but there seems to be a decreasing trend overall with bouts of small increases, especially from 1990 to 2000 and in the mid-2010s.

aus <- aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  summarise(Count = sum(Count))

autoplot(aus, Count) #Original plot

aus %>% autoplot(log(Count)) # log tranformation

aus %>% autoplot(Count^(1/3)) # cubed tranformation

Victorian Electricity Demand from vic_elec.

Below is a plot of Electricity Demand from vic_elec in Victorian, Australia. A transformation did seem appropriate here, although it is hard to tell if there is variation or the half-hour data prevents us from seeing trends. The original plot, logged transformation plot, and cubed transformation plot are all quiet similar here as well. Furthermore, since it did not seem helpful for me to look at 30-minute data, I adjusted it because electricity data is influenced by other factors. I first looked at demand by month in autoplot where I could clearly see some type of seasonality. I went further by using gg_season where we can now see exactly where the seasonal trends are. The demand from April to September is much larger than the demand from September to December, and January is also fairly high. The electrical demand seasonality is most likely tied to the weather changes year round in Australia.

vic_elec %>%
  autoplot(Demand)

vic_elec %>% autoplot(log(Demand)) # log tranformation

vic_elec %>% autoplot(Demand^(1/3)) # cubed tranformation

monthly_vic_elec <- vic_elec %>%
  index_by(month = ~ yearmonth(.)) %>% 
  summarize(demand_bymonth = sum(Demand))

monthly_vic_elec  %>%
  autoplot(demand_bymonth)

monthly_vic_elec  %>%
  gg_season(demand_bymonth)

vic_el2ec <- vic_elec

Gas production from aus_production.

Below is a plot of the quarterly gas production in Australia and a Box-Cox transformation plot at lambda = 0.109517. I followed the text book’s example of the Box-Cox transformation because the data was a little skewed and the variation changes throughout the time series. The transformation helped us see the cyclicity in this data, but also helped to stabilize the variation. The trend in both plots are upward/increasing, but the Box-Cox transformation shows us where the drastic increase occurred, sometime the 1970s.

aus_production %>% 
  autoplot(Gas)

lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)
lambda

## [1] 0.1095171

aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

3.3. Why is a Box-Cox transformation unhelpful for the canadian_gas data?

A Box-Cox transformation is unhelpful for the canadian_gas data because the plot does not change much because generally Box-Cox transformations help with the variance in the data which it did not do here. A Box-Cox is supposed to somewhat normalize more skewed data, so it is possible that the data was not skewed given canadian_gas’ very small range for volume data.

cgas <- canadian_gas
canadian_gas %>% autoplot(Volume)

lambda2 <- canadian_gas %>%
  features(Volume, features = guerrero) |>
  pull(lambda_guerrero)
lambda2

## [1] 0.5767648

canadian_gas %>%
  autoplot(box_cox(Volume, lambda2))

3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

For my retail data in Section 2.10 from Exercise 7, I did a Box-Cox transformation because it helps us find the best fit lambda for the transformation which was 0.1119. This helped with making the size of seasonal variation similar throughout, in a very uniform way.

set.seed(18029)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>% autoplot(Turnover)

lambda3 <- myseries %>%
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

myseries %>%
  autoplot(box_cox(Turnover, lambda3))

3.5 For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco from aus_production: lambda=0.9264636
Economy class passengers between Melbourne and Sydney from ansett: lambda=1.999927
Pedestrian counts at Southern Cross Station from pedestrian. lambda=0.132791
- I changed the data to monthly to help see the trend better.

# Tobacco from aus_production,
aus_production %>%
  autoplot(Tobacco)

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

lambda4 <- aus_production %>%
  features(Tobacco,features = guerrero) %>%
  pull(lambda_guerrero)
lambda4

## [1] 0.9264636

aus_production %>%
  autoplot(box_cox(Tobacco,lambda4))

## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).

# Economy class passengers between Melbourne and Sydney from ansett
ansett %>%
  filter(Airports == "MEL-SYD" , Class == "Economy") %>%
  autoplot(Passengers)

lambda5 <- ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  features(Passengers,features = guerrero) %>%
  pull(lambda_guerrero)

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(box_cox(Passengers,lambda5))

lambda5

## [1] 1.999927

# Pedestrian counts at Southern Cross Station from pedestrian
pedestrian %>%
  filter(Sensor == "Southern Cross Station") %>%
  autoplot(Count)

pedestrian_month <- pedestrian %>%
   filter(Sensor == "Southern Cross Station") %>%
  index_by(month = ~ yearmonth(.)) %>% 
  summarize(ped_month = sum(Count))

lambda6 <- pedestrian_month %>%
  features(ped_month,features = guerrero) %>%
  pull(lambda_guerrero)
lambda6

## [1] 0.132791

pedestrian_month %>%
  autoplot(box_cox(ped_month,lambda6))

3.7 Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

a. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

From the plots of gas production in Australia, we see that there are prominent seasonal fluctuations and trend-cycle. Overall, there is a slight upward trend occurring here, however, for seasonality, we see that gas production is lower in Q1 and Q4 but increases towards Q2 and Q3, every year in this data. Even the trend-cycle seem to be approximately the same magnitude every year from 2006 to 2010.

gas %>%
  autoplot(Gas)

gg_season(gas, Gas)

b. Use classical_decomposition with type= multiplicative to calculate the trend-cycle and seasonal indices.

aus_decomp <- gas %>%
  model(
  classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

aus_decomp %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of total
                  Australian Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

c. Do the results support the graphical interpretation from part a?

The results do support the graphical interpretation from part a. We could tell from part a that the overall trend is upward and there are seasonal fluctuations with cyclicity which we also see in the plot above of classical_decomposition with type= multiplicative.

d. Compute and plot the seasonally adjusted data.

aus_decomp %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas Production (in petajoules)",
       title = "Quarterly production of Gas in Australia.") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

e. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

I changed observation 8 from 234 to 534 then re-plotted the classical decomposition multiplicative plot and the seasonally adjusted plot. We see that this one outlier affects the plots quite a bit, but mostly towards the middle of the plot where we put the outlier. We do not see an overall increasing trend anymore, but a plot that starts quiet low and then increases around the time we put the outlier. Then there is a decrease with the data leveling off without changing very much after. The seasonality is still there but different from the original data. The seasonally adjusted plot shows us that there is much more variation with the outlier compared to the data without the outlier.

outlier_beg <- gas
outlier_beg $Gas[8] <- outlier_beg$Gas[8] + 300

outlier_beg <- outlier_beg%>%
  model(
  classical_decomposition(Gas, type = "multiplicative")) %>%
  components()

outlier_beg %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of total
                  Australian Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

outlier_beg %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas Production (in petajoules)",
       title = "Quarterly production of Gas in Australia.") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

f. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

For this question, I changed observation 19 from 205 to 505. We can see that it does make a difference if the outlier is near the end rather than in the middle of the time series. The one outlier at the end changes the trend so that it is mostly the same throughout the time series and then increases quiet drastically at the end. The seasonality is still there once again, but also different from the original data and the beginning outlier plots as well. The seasonally adjusted plot shows us that the variation is smaller towards the beginning and middle of the plot but increases quiet a lot towards the end (where the outlier is).

outlier_end <- gas
outlier_end$Gas[19] <- outlier_end$Gas[19] + 300

outlier_end <- outlier_end %>%
  model(
  classical_decomposition(Gas, type = "multiplicative")) %>%
  components() 

outlier_end %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of total
                  Australian Gas Production")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

outlier_end %>%
  ggplot(aes(x = Quarter)) +
  geom_line(aes(y = Gas, colour = "Data")) +
  geom_line(aes(y = season_adjust,
                colour = "Seasonally Adjusted")) +
  geom_line(aes(y = trend, colour = "Trend")) +
  labs(y = "Gas Production (in petajoules)",
       title = "Quarterly production of Gas in Australia.") +
  scale_colour_manual(
    values = c("gray", "#0072B2", "#D55E00"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

The x-11 plot of myseries does in fact reveal outliers/unusual features. The irregular plot reveals a dramatic decreasing spike in the late 1980s and a large upward spike towards the mid-1990s - both of which are difficult to see in a regular plot of myseries. The trend component which shows a mostly upward trend does not reveal those details about the time series, similar to the seasonal component.

myseries %>%
  autoplot(Turnover)

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components() 

autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of total US retail employment using X-11.")

3.9 Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The results of the decomposition graphs tell us that there is an overall upward trend overtime in the number of persons in the civilian labour force. Generally the work force was increasing in Australia from February 1978 to August 1995. In terms of seasonality, the figure showing the seasonal component from the decomposition clearly tells us that throughout these years, there are certain times of the year where the labour force increases. This could be due to hiring trends, nonetheless, this seems to occurs in March, September, and December. In terms of outliers or unusual features, we see an outlier occurring sometime in the early 1990s from the remainder component of the graph but not in the other components. This could be due to the scale of the different components - the remainder component shows us the values in the hundreds while the value and trend components are in the thousands.

Is the recession of 1991/1992 visible in the estimated components?

The recession of 1991/1992 is visible in the estimated components. From the remainder component, we see whats looks like a pretty large decrease in the number of persons in the civilian labour force around this time, which is also reflected in the value component. However, it may not seem as drastic in the value component since the y-axis shows a different range of numbers. The decrease in people in the labour force could be due to lay-offs during the recession.

DATA 624: Homework 2

Nakesha Fray

2025-02-16