Data 624 - HW2

1) Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

Qatar has the highest average GDP per capita in the data set. We can see via the second plot that Qatar’s GDP per capita was not very high earlier on in the dataset but sky rocketed around the year 2000.

autoplot(global_economy, GDP/Population) +
  guides(colour = FALSE)+
  labs(title = "Country GDP per Capita over time",
       y = "GDP per capita in $USD")

## Warning: Removed 3242 row(s) containing missing values (geom_path).

global_economy$gdp_cap<-round(global_economy$GDP/global_economy$Population,2)
global_economy_drop<-drop_na(global_economy)
avg_gdp<-aggregate(global_economy_drop$gdp_cap, list(global_economy_drop$Country), FUN=mean) 
avg_gdp<-avg_gdp[order(-avg_gdp$x),]
head(avg_gdp)

##                  Group.1        x
## 133                Qatar 62650.62
## 156          Switzerland 48994.33
## 97            Luxembourg 42046.82
## 168 United Arab Emirates 40469.09
## 72               Iceland 34919.83
## 98      Macao SAR, China 34561.99

autoplot(global_economy %>% 
           filter(Country == "Qatar") , GDP/Population) +
  guides(colour = FALSE)+
  labs(title = "QATAR GDP per Capita over time",
       y = "GDP per capita in $USD")

## Warning: Removed 10 row(s) containing missing values (geom_path).

2) For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.

For US GDP I transformed it with a population adjustment which adjusted to a plot on GDP per cpaita rather than just GDP.I also used the CPI in the dataset to adjust for inflation. We can see dips in the 1980’s where we do not see them on the standard plot which indicates these were likely times of hihg inflation.

autoplot(global_economy %>% 
           filter(Country == "United States") , GDP) +
  guides(colour = FALSE)+
  labs(title = "US GDP over time",
       y = "GDP in $USD")

autoplot(global_economy %>% 
           filter(Country == "United States") , ((GDP/CPI)*100)/Population) +
  guides(colour = FALSE)+
  labs(title = "US GDP per cpaita adjusted for inflation over time",
       y = "GDP per capita adjusted for inflation")

Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

I tried transforming with a box-cox transformation as discussed in the book to deal wiht the seasonality displayed in the original graph but it did not seem to help much.I suppose this did not help because the variance was relatively consistent over time.

aus_livestock_bbr<-aus_livestock%>% 
           filter(Animal == "Bulls, bullocks and steers",State=="Victoria") 

autoplot(aus_livestock_bbr , Count) +
  labs(title = "Count of Bulls, bullocks and steers slaughtered over time",
       y = "Count of Animals")

lambda <- aus_livestock_bbr %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)


aus_livestock_bbr %>%
  autoplot(box_cox(Count, lambda)) +
   labs(title = "Transformed Count of Bulls, bullocks and steers slaughtered with Lambda=-0.7")

Victorian Electricity Demand from vic_elec.

I transformed this one by making it demand of MW per degree of Temperature. This helped control for seasonality issues and heat spikes and showed where demand for electricity seams to peak agnostic of temperature (about mid year). It also shows an upword trend of these mid-year peaks.

autoplot(vic_elec , Demand) +
  labs(title = "Victorian Electricity Demand over half hour time",
       y = "Total Electricity Demand in MW")

autoplot(vic_elec , Demand/Temperature) +
  labs(title = "Victorian Electricity Demand over half hour time",
       y = "Total Electricity Demand in MW per degree of Temperature")

Gas production from aus_production.

Since the variance of the series seemed to increase over time I used a box-cox transformation to smooth out this variance taken form the example in the text.

autoplot(aus_production , Gas) +
  labs(title = "Australian Gas production",
       y = "Gas Production in petjoules")

lambda <- aus_production %>%
  features(Gas, features = guerrero) %>%
  pull(lambda_guerrero)
aus_production %>%
  autoplot(box_cox(Gas, lambda))  +
  labs(title = "Australian Gas production Box-Cox Tranformed with lambda = .12")

3) Why is a Box-Cox transformation unhelpful for the canadian_gas data?

Since the seasonal variance does not increase or decrease consistently (it increases and then decreases) the Box-Cox transformation will not be effective.

autoplot(canadian_gas , Volume)

lambda <- canadian_gas %>%
  features(Volume, features = guerrero) %>%
  pull(lambda_guerrero)
canadian_gas %>%
  autoplot(box_cox(Volume, lambda))

4) What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

Using the Guerrero feature establishes a lambda=0.0845 as the optimal value of lambda to smooth the seasonality of my chosen series from the aus_retail dataset.

set.seed(19865)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>%
  autoplot(box_cox(Turnover, lambda))  +
  labs(title = "Australian Gas production Box-Cox Tranformed with lambda = .0845")

lambda

## [1] 0.08450315

5) For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

For each of these I use the guerrero method for selecting the appropriate lambda for a box-cox transformation.

lambda <- aus_production %>%
  features(Tobacco, features = guerrero) %>%
  pull(lambda_guerrero)

aus_production %>%
  autoplot(box_cox(Tobacco, lambda))  +
  labs(title = paste("Australian Tobacco production Box-Cox Tranformed with lambda =",round(lambda,4)))

## Warning: Removed 24 row(s) containing missing values (geom_path).

mel_syd_ansett<-ansett%>% 
           filter(Airports == "MEL-SYD",Class=="Economy") 

lambda <- mel_syd_ansett %>%
  features(Passengers, features = guerrero) %>%
  pull(lambda_guerrero)

mel_syd_ansett %>%
  autoplot(box_cox(Passengers, lambda))  +
  labs(title = paste("Number of Passengers from Melbourne to Sydney \n Box-Cox Tranformed with lambda =",round(lambda,4)))

pedestrian_SCS<-pedestrian%>%filter(Sensor=="Southern Cross Station")

lambda <- pedestrian_SCS %>%
  features(Count, features = guerrero) %>%
  pull(lambda_guerrero)

pedestrian_SCS %>%
  autoplot(box_cox(Count, lambda))  +
  labs(title = paste("Pedestrian Count of Southern Cross Station \n Box-Cox Tranformed with lambda =",round(lambda,4)))

7) Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) %>% select(Gas)

a) Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?

There is a cycle of the prices dropping at the beginning of each quarter before sharply rising as the quarter progresses. There is a slight trend upwards overall.

autoplot(gas)

## Plot variable not specified, automatically selected `.vars = Gas`

b) Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

Via the output below we can see both the trend and seasonal indices in data form or in graph form.

gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components()

## # A dable: 20 x 7 [1Q]
## # Key:     .model [1]
## # :        Gas = trend * seasonal * random
##    .model                      Quarter   Gas trend seasonal random season_adjust
##    <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
##  1 "classical_decomposition(G… 2005 Q3   221   NA     1.13  NA              196.
##  2 "classical_decomposition(G… 2005 Q4   180   NA     0.925 NA              195.
##  3 "classical_decomposition(G… 2006 Q1   171  200.    0.875  0.974          195.
##  4 "classical_decomposition(G… 2006 Q2   224  204.    1.07   1.02           209.
##  5 "classical_decomposition(G… 2006 Q3   233  207     1.13   1.00           207.
##  6 "classical_decomposition(G… 2006 Q4   192  210.    0.925  0.987          208.
##  7 "classical_decomposition(G… 2007 Q1   187  213     0.875  1.00           214.
##  8 "classical_decomposition(G… 2007 Q2   234  216.    1.07   1.01           218.
##  9 "classical_decomposition(G… 2007 Q3   245  219.    1.13   0.996          218.
## 10 "classical_decomposition(G… 2007 Q4   205  219.    0.925  1.01           222.
## 11 "classical_decomposition(G… 2008 Q1   194  219.    0.875  1.01           222.
## 12 "classical_decomposition(G… 2008 Q2   229  219     1.07   0.974          213.
## 13 "classical_decomposition(G… 2008 Q3   249  219     1.13   1.01           221.
## 14 "classical_decomposition(G… 2008 Q4   203  220.    0.925  0.996          219.
## 15 "classical_decomposition(G… 2009 Q1   196  222.    0.875  1.01           224.
## 16 "classical_decomposition(G… 2009 Q2   238  223.    1.07   0.993          222.
## 17 "classical_decomposition(G… 2009 Q3   252  225.    1.13   0.994          224.
## 18 "classical_decomposition(G… 2009 Q4   210  226     0.925  1.00           227.
## 19 "classical_decomposition(G… 2010 Q1   205   NA     0.875 NA              234.
## 20 "classical_decomposition(G… 2010 Q2   236   NA     1.07  NA              220.

gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components() %>%
  autoplot() +
  labs(title = "Classical multiplicative decomposition of Australian Gas Production")

## Warning: Removed 2 row(s) containing missing values (geom_path).

c) Do the results support the graphical interpretation from part a?

Yes, there is a cycle of the prices dropping at the beginning of each quarter before sharply rising as the quarter progresses. There is a slight trend upwards overall.

d) Compute and plot the seasonally adjusted data.

gas %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas Production in petajoules",
       title = "Seasonally Adjusted Australian Gas Production")

e) Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas2<-gas
gas2[1, 1] = gas2[1, 1]+300

gas2 %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas Production in petajoules",
       title = "Seasonally Adjusted Australian Gas Production")

gas2<-gas
gas2[10, 1] = gas2[10, 1]+300

gas2 %>%
  model(
    classical_decomposition(Gas, type = "multiplicative")
  ) %>%
  components(dcmp) %>%
  as_tsibble() %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas Production in petajoules",
       title = "Seasonally Adjusted Australian Gas Production")

f) Does it make any difference if the outlier is near the end rather than in the middle of the time series?

I don’t think it makes a mathematical difference if the outlier is at the beginning or end but if the outlier is at the beginning the seasonally adjusted line remains about the same in relation to the actual data while if the outlier is in the middle the seasonally adjusted data gets thrown off and begins to follow the original data patterns. This may be because the classical decomposition assumes the same seasonality year over year and this assumption is disrupted by a greater magnitude with a mid-series outlier.

8) Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

Something that I had not noticed earlier was that the seasonal volatility sems to become less volatile towards the end of the time-series.

library(seasonal)

## 
## Attaching package: 'seasonal'

## The following object is masked from 'package:tibble':
## 
##     view

x11_dcmp <- myseries %>%
  model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
  components()
autoplot(x11_dcmp) +
  labs(title =
    "Decomposition of Australian Retail Turnover using X-11.")

## 9) Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

a) Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

Ther decomposition shows a relatively steady trend upwards with a steady seasonality. Something that may not be noticed with a plot of the data is the seasonality of the data. Since the seasonality is in much smaller numbers then the overall trend it may be easy to miss such a consistent seasonality without decomposition. It also shows very consistent seasonality within each month of the output.

b) Is the recession of 1991/1992 visible in the estimated components?

Teh recession can be identified via the remainder component. STL decomposition is robust towards outliers so things like a year long recession will not show up on the trend but it will be reflected in the remainder.