Excercise 3.1:

Consider the GDP information in global_economy.

Plot the GDP per capita for each country over time.
Which country has the highest GDP per capita?
How has this changed over time?

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

Plot the GDP per capita for each country over time.

 global_economy %>% 
   autoplot(GDP / Population, show.legend =  FALSE) +
  labs(title= "GDP per capita", y = "USD")

gp2011 <- global_economy %>%
  mutate(GDPpercp = GDP/Population) %>%
  group_by(Country) %>%
  arrange(desc(GDPpercp)) %>%
  filter(Year >= 1990, Code %in% c("MAC","USA", "LUX","SGP","QAT","DEU")) 
head(gp2011,25)

## # A tsibble: 25 x 10 [1Y]
## # Key:       Country [3]
## # Groups:    Country [3]
##    Country  Code   Year     GDP Growth   CPI Imports Exports Population GDPpercp
##    <fct>    <fct> <dbl>   <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>    <dbl>
##  1 Luxembo… LUX    2014 6.63e10  5.77  109.     174.    208.     556319  119225.
##  2 Luxembo… LUX    2011 6.00e10  2.54  103.     145.    178.     518347  115762.
##  3 Luxembo… LUX    2008 5.58e10 -1.28   97.4    156.    187.     488650  114294.
##  4 Luxembo… LUX    2013 6.17e10  3.65  108.     159.    191.     543360  113625.
##  5 Luxembo… LUX    2012 5.67e10 -0.353 106.     155.    186.     530946  106749.
##  6 Luxembo… LUX    2007 5.09e10  8.35   94.2    150.    183.     479993  106018.
##  7 Luxembo… LUX    2010 5.32e10  4.86  100      142.    175.     506953  104965.
##  8 Luxembo… LUX    2017 6.24e10  2.30  111.     194.    230.     599449  104103.
##  9 Luxembo… LUX    2009 5.14e10 -4.36   97.8    132.    164.     497783  103199.
## 10 Luxembo… LUX    2015 5.78e10  2.86  109.     187.    223.     569604  101447.
## # ℹ 15 more rows

gp2011%>%
  autoplot(GDPpercp)

#summary(gp2017)

library(dplyr)

# Process the dataset
gp1960 <- global_economy %>%
  mutate(GDP_per_capita = GDP / Population) %>%
  filter(Code %in% c("MAC", "USA", "LUX", "SGP", "QAT", "DEU")) %>%
  filter(Year >= as.Date("1960-01-01") & Year <= as.Date("1990-12-31")) %>%
  arrange(Country, Year)

# View the processed data
print(gp1960)

## # A tsibble: 348 x 10 [1Y]
## # Key:       Country [6]
##    Country Code   Year   GDP Growth   CPI Imports Exports Population
##    <fct>   <fct> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Germany DEU    1960    NA     NA  24.6      NA      NA   72814900
##  2 Germany DEU    1961    NA     NA  25.2      NA      NA   73377632
##  3 Germany DEU    1962    NA     NA  25.9      NA      NA   74025784
##  4 Germany DEU    1963    NA     NA  26.7      NA      NA   74714353
##  5 Germany DEU    1964    NA     NA  27.3      NA      NA   75318337
##  6 Germany DEU    1965    NA     NA  28.2      NA      NA   75963695
##  7 Germany DEU    1966    NA     NA  29.2      NA      NA   76600311
##  8 Germany DEU    1967    NA     NA  29.7      NA      NA   76951336
##  9 Germany DEU    1968    NA     NA  30.2      NA      NA   77294314
## 10 Germany DEU    1969    NA     NA  30.7      NA      NA   77909682
## # ℹ 338 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>

# Plot GDP per capita over time
ggplot(gp1960, aes(x = Year, y = GDP_per_capita, color = Country)) +
  geom_line() +
  labs(title = "GDP per Capita (1960-1990)", x = "Year", y = "GDP per Capita") +
  theme_minimal()

global_highest <- global_economy %>%
  group_by(Country, GDP, Population) %>%
  summarise(GDP_per_capita = GDP / Population) %>%
  arrange(desc(GDP_per_capita))

global_highest

## # A tsibble: 15,150 x 5 [1Y]
## # Key:       Country, GDP, Population [15,149]
## # Groups:    Country, GDP [11,965]
##    Country               GDP Population  Year GDP_per_capita
##    <fct>               <dbl>      <dbl> <dbl>          <dbl>
##  1 Monaco        7060236168.      38132  2014        185153.
##  2 Monaco        6476490406.      35853  2008        180640.
##  3 Liechtenstein 6657170923.      37127  2014        179308.
##  4 Liechtenstein 6391735894.      36834  2013        173528.
##  5 Monaco        6553372278.      37971  2013        172589.
##  6 Monaco        6468252212.      38499  2016        168011.
##  7 Liechtenstein 6268391521.      37403  2015        167591.
##  8 Monaco        5867916781.      35111  2007        167125.
##  9 Liechtenstein 6214633651.      37666  2016        164993.
## 10 Monaco        6258178995.      38307  2015        163369.
## # ℹ 15,140 more rows

Monaco has the highest GDPPC.

data(global_economy)

library(ggplot2)
global_economy %>%
  filter(Country == "Monaco") %>%
  autoplot(GDP/Population) +
  labs(title= "GDP per capita for Monaco", y = "USD")

Excercise 3.2:

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

United States GDP from global_economy

# Step 1: Filter Data for the United States
us_data <- global_economy %>%
  filter(Country == "United States") %>%
  mutate(GDP_per_capita = GDP / Population)
us_data

## # A tsibble: 58 x 10 [1Y]
## # Key:       Country [1]
##    Country       Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>         <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 United States USA    1960     5.43e11  NA     13.6    4.20    4.97  180671000
##  2 United States USA    1961     5.63e11   2.30  13.7    4.03    4.90  183691000
##  3 United States USA    1962     6.05e11   6.10  13.9    4.13    4.81  186538000
##  4 United States USA    1963     6.39e11   4.40  14.0    4.09    4.87  189242000
##  5 United States USA    1964     6.86e11   5.80  14.2    4.10    5.10  191889000
##  6 United States USA    1965     7.44e11   6.40  14.4    4.24    4.99  194303000
##  7 United States USA    1966     8.15e11   6.50  14.9    4.55    5.02  196560000
##  8 United States USA    1967     8.62e11   2.50  15.3    4.63    5.05  198712000
##  9 United States USA    1968     9.42e11   4.80  16.0    4.94    5.08  200706000
## 10 United States USA    1969     1.02e12   3.10  16.8    4.95    5.09  202677000
## # ℹ 48 more rows
## # ℹ 1 more variable: GDP_per_capita <dbl>

# Step 2: Convert Data to a Time Series Object
us_tsibble <- us_data %>%
  as_tsibble(index = Year) # Convert to a tsibble

# Step 3: Plot Using autoplot()
autoplot(us_tsibble, GDP_per_capita) +
  labs(title = "US GDP per Capita Over Time", y = "GDP per Capita (USD)") +
  theme_minimal()

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

Details: Meat production in Australia for human consumption

aus_livestock is a monthly tsibble with one value:

Count: Number of animals slaughtered. Each series is uniquely identified using two keys:

Animal: The animal slaughtered. State: The Australian state (or territory).

head(aus_livestock)

## # A tsibble: 6 x 4 [1M]
## # Key:       Animal, State [1]
##      Month Animal                     State                        Count
##      <mth> <fct>                      <fct>                        <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory  2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory  2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory  2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory  1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory  2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory  1800

view(aus_livestock)

#data("aus_livestock")
aus_livestock %>%
  filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
  autoplot(Count) +
  labs(title = "Slaughter of Victorian Bulls, bullocks and steers", y = "Number of animals slaughtered")

Victorian Electricity Demand from vic_elec Description: vic_elec is a half-hourly tsibble with three values:

Demand: Total electricity demand in MWh. Temperature: Temperature of Melbourne (BOM site 086071). Holiday: Indicator for if that day is a public holiday.

data(vic_elec)
head(vic_elec)

## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
##   Time                Demand Temperature Date       Holiday
##   <dttm>               <dbl>       <dbl> <date>     <lgl>  
## 1 2012-01-01 00:00:00  4383.        21.4 2012-01-01 TRUE   
## 2 2012-01-01 00:30:00  4263.        21.0 2012-01-01 TRUE   
## 3 2012-01-01 01:00:00  4049.        20.7 2012-01-01 TRUE   
## 4 2012-01-01 01:30:00  3878.        20.6 2012-01-01 TRUE   
## 5 2012-01-01 02:00:00  4036.        20.4 2012-01-01 TRUE   
## 6 2012-01-01 02:30:00  3866.        20.2 2012-01-01 TRUE

data("vic_elec")

vic_elec_new <- vic_elec %>%
  group_by(Date) %>%
  mutate(Demand = sum(Demand)) %>%
  distinct(Date, Demand)

vic_elec_new %>% 
  as_tsibble(index = Date) %>%
  autoplot(Demand) +
  labs(title= "Daily Victorian Electricity Demand", y = "$US (in trillions)")

is_tsibble(vic_elec)

## [1] TRUE

Gas production from aus_production.

aus_production %>%
  autoplot(Gas)+
  labs("gas production")

#install.packages("latex2exp")
library(latex2exp)
lambda <- aus_production |>
  features(Gas, features = guerrero) |>
  pull(lambda_guerrero)
aus_production |>
  autoplot(box_cox(Gas, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

Excercise 3.3:

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Data Description: Monthly Canadian gas production, billions of cubic metres, January 1960 - February 2005

canadian_gas %>%
  autoplot(Volume)+
  labs(title = "Monthly Canadian gas production without transformation")

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)

canadian_gas %>%
  autoplot(box_cox(Volume, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

Excercise 3.4:

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

(from Exercise 7 in Section 2.10: Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):)

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries,Turnover)+
  labs(title = "Retial Turnover", y = "Turnover")

library(latex2exp)
lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

myseries |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

For this seed, the Guerrero feature selected a lambda of 0.08 in order to make the variance stable.

Excercise 3.5:

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

Tobacco from aus_production
Economy class passengers between Melbourne and Sydney from ansett
and Pedestrian counts at Southern Cross Station from pedestrian.

autoplot(aus_production, Tobacco)+
  labs(title = "Tobacco and Cigarette Production")

lambda <- aus_production |>
  features(Tobacco, features = guerrero) |>
  pull(lambda_guerrero)

aus_production |>
  autoplot(box_cox(Tobacco, lambda)) +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed gas production with $\\lambda$ = ",
         round(lambda,2))))

Graph didn’t change here compare to previous one and also the lamda is almost 1. So, Box Cox is not usefull in very effective in this dataset.

Economy class passengers between Melbourne and Sydney from ansett

mel_syd <- ansett %>%
  filter(Class == "Economy",
         Airports == "MEL-SYD")

autoplot(mel_syd, Passengers)+
  labs(title = "Economy class Passengers Between Melbourne and Sydney")

lambda <- mel_syd %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

mel_syd %>%
  autoplot(box_cox(Passengers, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
         round(lambda,2))))

Here we can the variation clearly with transformed data with lamda = 2.

Pedestrian counts at Southern Cross Station from pedestrian.

data("pedestrian")

southern_cross <- pedestrian %>%
  filter(Sensor == "Southern Cross Station") 

southern_cross <- southern_cross %>%
  mutate(Week = yearweek(Date)) %>%
  index_by(Week) %>%
  summarise(Count = sum(Count))

autoplot(southern_cross, Count)+
  labs(title = "Weekly Pedestrian Counts at Southern Cross Station")

lambda <- southern_cross %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

southern_cross %>%
  autoplot(box_cox(Count, lambda)) +
  labs(y = "", title = latex2exp::TeX(paste0("Transformed Number of Passengers with $\\lambda$ = ",
        round(lambda,2))))

Excercise 3.7:

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
Do the results support the graphical interpretation from part a?
Compute and plot the seasonally adjusted data.
Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

gas <- tail(aus_production, 5*4) |> dplyr::select(Gas)
  
# Plot using autoplot from fable
autoplot(gas, Gas) +
  labs(title = "Gas Production Time Series",
       x = "Year",
       y = "Gas Production") +
  theme_minimal()

# Decompose the time series
decomposition <- gas |>
  model(STL(Gas ~ season() + trend()))

# Extract components
components <- components(decomposition)

# Plot components
autoplot(components) +
  labs(title = "Decomposition of Gas Production Time Series",
       y = "Gas Production") +
  theme_minimal()

So, I see here seasonal fluctuations in the above chart.

(Note:Seasonal Fluctuations: Look for regular patterns that repeat at consistent intervals (e.g., monthly or yearly). In the plot, these will appear as repeating cycles.) (Note:Trend-Cycle: Identify the underlying trend and long-term movement in the time series. The trend component will show if there’s a general upward or downward direction over time.)

Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

gas_decomp <- gas %>%
  model(classical_decomposition(Gas, type = "multiplicative")) 

components(gas_decomp) %>%
  autoplot() +
  labs(title = "Classical Multiplicative Decomposition of Gas Production")

# Inspect the components
print(components)

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend + season_year + remainder
##    .model                Quarter   Gas trend season_year remainder season_adjust
##    <chr>                   <qtr> <dbl> <dbl>       <dbl>     <dbl>         <dbl>
##  1 STL(Gas ~ season() +… 2005 Q3   221  193.        26.9     0.856          194.
##  2 STL(Gas ~ season() +… 2005 Q4   180  197.       -16.7    -0.109          197.
##  3 STL(Gas ~ season() +… 2006 Q1   171  200.       -25.7    -3.59           197.
##  4 STL(Gas ~ season() +… 2006 Q2   224  204.        15.4     4.86           209.
##  5 STL(Gas ~ season() +… 2006 Q3   233  207.        27.0    -1.03           206.
##  6 STL(Gas ~ season() +… 2006 Q4   192  210.       -16.7    -1.33           209.
##  7 STL(Gas ~ season() +… 2007 Q1   187  213.       -25.5    -0.550          213.
##  8 STL(Gas ~ season() +… 2007 Q2   234  216.        15.1     2.60           219.
##  9 STL(Gas ~ season() +… 2007 Q3   245  219.        27.0    -0.730          218.
## 10 STL(Gas ~ season() +… 2007 Q4   205  219.       -16.6     2.55           222.
## 11 STL(Gas ~ season() +… 2008 Q1   194  219.       -25.3     0.562          219.
## 12 STL(Gas ~ season() +… 2008 Q2   229  219.        14.8    -4.59           214.
## 13 STL(Gas ~ season() +… 2008 Q3   249  219.        27.0     2.98           222.
## 14 STL(Gas ~ season() +… 2008 Q4   203  220.       -16.6    -0.834          220.
## 15 STL(Gas ~ season() +… 2009 Q1   196  222.       -25.0    -0.740          221.
## 16 STL(Gas ~ season() +… 2009 Q2   238  223.        14.5     0.341          223.
## 17 STL(Gas ~ season() +… 2009 Q3   252  225.        27.1    -0.132          225.
## 18 STL(Gas ~ season() +… 2009 Q4   210  226.       -16.5     0.986          227.
## 19 STL(Gas ~ season() +… 2010 Q1   205  226.       -24.8     4.25           230.
## 20 STL(Gas ~ season() +… 2010 Q2   236  225.        14.2    -3.62           222.

Do the results support the graphical interpretation from part a? Yes the results do support graph a as we still see the same cycle pattern.

Compute and plot the seasonally adjusted data.

components(gas_decomp) %>%
  as_tsibble() %>%
  autoplot(Gas) +
  geom_line(aes(y=season_adjust), linetype = "dashed") +
  labs(title = "Gas Production (Seasonally Adjusted)")

Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas %>%
  mutate(Gas = ifelse(Gas == 249, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas) +
  geom_line(aes(y=season_adjust), linetype = "dashed") +
  labs(title = "Gas Production (Seasonally Adjusted + outlier)")

Does it make any difference if the outlier is near the end rather than in the middle of the time series?

gas %>%
  mutate(Gas = ifelse(Gas == 236, Gas + 300, Gas)) %>%
  model(classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(Gas) +
  geom_line(aes(y=season_adjust), linetype = "dashed") +
  labs(title = "Gas Production (Seasonally Adjusted + outlier)")

The placement of the outlier still caused the data to have a spike, one thing to note is that the seasonally adjusted data’s spike has increased compared to when the outlier was in the middle.

Excercise 3.8:

Recall your retail time series data (from Exercise 7 in Section 2.10).Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

#install.packages("seasonal")
library(seasonal)

Using the X-11 decomposition method, we see the that line is less smooth. This method captured seasonality better and is displaying more irregularities than previous one.

x11_decomp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()

autoplot(x11_decomp) +
  labs(title = "Decomposition of Retail Turnover (X-11)")

Excercise 3.9:

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

knitr::include_graphics("figure 3.19.png")

knitr::include_graphics("figure 3.20.png")

Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

We can say that increase in the number of persons in the civilian labor force in Australia.

Seasonality is also present despite its scale being of a lesser degree to that of the remainder component, indicating that seasonality is not as significant. They do share similarities as in both graphs we see dips in the early 1990s (which was due to the recession).

Is the recession of 1991/1992 visible in the estimated components? Yes, the recession of 1991/1992 is visible in the estimated components and we can see Incredible dips in the rest of the area.

Data 624 - HW2

Deepa Sharma

2024-09-13