Data 624 HW #2

Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book.

1. Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

I attempted to use autoplot to show the gdp per capita for each country over time but that did not graph very nicely. Instead I filtered the data for world regions and displayed the gdp per capita for regions of the world. Over time, we can see that gdp per capita is increasing across all regions of the world. It is clear based on the graph that certain regions of the world outperformed others, particurly North America and the European Union.

global_economy %>%
    filter(Country  == c("Arab World", "Caribbean Small States", "Central Europe and the Baltics", "East Asia & Pacific", "European Union", "Latin America & Caribbean", "Middle East & North Africa", "North America", "Pacific island small states", "South Asia", "Sub-Saharan Africa")) %>%
    autoplot(GDP/Population)
## Warning in `==.default`(Country, c("Arab World", "Caribbean Small States", :
## longer object length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
## Warning: Removed 7 row(s) containing missing values (geom_path).

newFrame <- global_economy 

newFrame$gdpPerCap <- newFrame$GDP/newFrame$Population
newFrame <- newFrame[order(c(newFrame$gdpPerCap, newFrame$Year), decreasing = TRUE), ]
newFrame2 <- head(arrange(newFrame, desc(gdpPerCap)), n = 100)
newFrame2 %>% 
    select(Country, Year, gdpPerCap) -> nf3

ggplot(nf3, aes(x = Year, y = gdpPerCap, colour = Country)) + geom_line() + geom_point()

The country with the highest gdp per capita is Monaco, even though Liechenstein is very close and at one point was briefly higher than Monaco. The gdp per capita for the top 2 countries have a very positive trend.

2. For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

  1. United States GDP from global_economy
us_economy <- global_economy %>%
    filter(Country == "United States")
us_economy %>%
    autoplot(GDP)

The trend looks exponential, therefore a transformation might be useful.

us_economy %>%
    features(GDP, features = guerrero)
## # A tibble: 1 x 2
##   Country       lambda_guerrero
##   <fct>                   <dbl>
## 1 United States           0.282

Using guerrero, we get an predicted optimal lambda of 0.2819443

us_economy %>%
    autoplot(box_cox(GDP, 0.2819443))

Definitely more linear than our original plot.

  1. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
aus_livestock %>%
    filter(Animal %in% "Bulls, bullocks and steers", State %in% "Victoria") %>%
    autoplot(Count)

There is no consistent increase or decrease in variance with the levels, suggesting a transformation will not be useful.

aus_livestock %>%
    filter(Animal %in% "Bulls, bullocks and steers", State %in% "Victoria") %>%
    features(Count, features = guerrero)
## # A tibble: 1 x 3
##   Animal                     State    lambda_guerrero
##   <fct>                      <fct>              <dbl>
## 1 Bulls, bullocks and steers Victoria         -0.0720

lambda = -0.07197227

aus_livestock %>%
    filter(Animal %in% "Bulls, bullocks and steers", State %in% "Victoria") %>%
    autoplot(box_cox(Count, -0.07197227))

The transformed series does not show any significant change except the change of scale in the y-axis, proving that a transformation was not useful.

  1. Victorian Electricity Demand from vic_elec
vic_elec %>%
    autoplot(Demand)

Based on the plot above, I do not think a transformation will be useful. According to the text, “if the data shows variation that increases or decreases with the level of the series, then a transformation can be useful.”

In this case, the variation is pretty consistent. To illustrate this point, let’s go ahead with the box cox transformation anyway.

vic_elec %>%
    features(Demand, features = guerrero)
## # A tibble: 1 x 1
##   lambda_guerrero
##             <dbl>
## 1          0.0999
vic_elec %>%
    autoplot(box_cox(Demand, 0.09993089))

It made no difference, proving it was not useful.

  1. Gas production from aus_production
aus_production %>%
    autoplot(Gas)

A transformation is useful because there is increasing variation with the levels.

aus_production %>%
    features(Gas, features = guerrero)
## # A tibble: 1 x 1
##   lambda_guerrero
##             <dbl>
## 1           0.121
aus_production %>%
    autoplot(box_cox(Gas, 0.1205077))

Using the lambda level 0.1205077, variation is close to consistent.

3. Why is a Box-Cox transformation unhelpful for the canadian_gas data?

canadian_gas %>%
    autoplot(Volume)

A transformation is not useful in this case as the variation is not increasing or decreasing steadily with the levels. There is an increase in variation in the middle portion of the plot but then decreases in the last third.

4. What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?

set.seed(128)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)
## Plot variable not specified, automatically selected `.vars = Turnover`

lambda <- myseries %>%
    features(Turnover, features = guerrero) %>%
    pull(lambda_guerrero)

Box Cox Transformation with lambda = 0.09787286

myseries %>%
    autoplot(box_cox(Turnover, lambda))

The transformation makes the variation consistent throughout the series unlike in the original where there is a steady increase in variation with the levels.

5. For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.

Tobacco

aus_production %>%
    autoplot(Tobacco)
## Warning: Removed 24 row(s) containing missing values (geom_path).

lambda1 <- aus_production %>%
    features(Tobacco, features = guerrero) %>%
    pull(lambda_guerrero)
aus_production %>%
    autoplot(box_cox(Tobacco, lambda1))
## Warning: Removed 24 row(s) containing missing values (geom_path).

Despite using a lambda of 0.9289402, the transformation had no noticeable affect.

Passengers

ansett %>%
    filter(Airports=="MEL-SYD", Class == "Economy") %>%
    autoplot(Passengers)

lambda2 <- ansett %>%
    filter(Airports=="MEL-SYD", Class=="Economy") %>%
    features(Passengers, features = guerrero) %>%
    pull(lambda_guerrero)

lambda2 = 1.999927

ansett %>%
    filter(Airports=="MEL-SYD", Class == "Economy") %>%
    autoplot(box_cox(Passengers, lambda2))

The box cox transformation reduced the steep drop but at the same time increased the size of the drops at the end of the series. The scale of the y-axis was greatly increased.

Pedestrian

pedestrian %>%
    filter(Sensor == "Southern Cross Station") %>%
    autoplot(Count)

lambda3 <- pedestrian %>%
    filter(Sensor == "Southern Cross Station") %>%
    features(Count, features = guerrero) %>%
    pull(lambda_guerrero)

lambda = -0.2255423

pedestrian %>%
    filter(Sensor == "Southern Cross Station") %>%
    autoplot(box_cox(Count, lambda3))

The variance is constant and stable throughout the series, even though it is not easily read.

7. Consider the last five years of the Gas data from aus_production

gas <- tail(aus_production, 5*4) %>% select(Gas)
  1. Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
autoplot(gas)
## Plot variable not specified, automatically selected `.vars = Gas`

There is an upward trend, with a seasonal component. There are clear drops and increase in gas use. Peaks during winter and drops during the warmer months.

  1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
classicalDcmp <- gas %>% 
    model(
        classical_decomposition(Gas, type="multiplicative")) 

    components(classicalDcmp) %>%
    autoplot() +
    labs(title = "Classical multiplicative decomposition of last 5 years ")
## Warning: Removed 2 row(s) containing missing values (geom_path).

  1. Do the results support the graphical interpretation from part a? The results show an upward trend and a seasonal component, which support the graphical interpretation from part a.

  2. Compute and plot the seasonally adjusted data.

gas %>%
    autoplot(Gas, color='gray') +
    autolayer(components(classicalDcmp), season_adjust, color='blue') +
    xlab("Year") + ylab("Gas")

  1. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

Adding 300 to last observation

outliergas <- gas

outliergas[20,1] <- outliergas[20, 1] + 300

#outliergas <- outliergas %>%as_tsibble(index = 'Quarter')

Recomputing the seasonally adjusted data

classicalDcmpO <- outliergas %>% 
    model(
        classical_decomposition(Gas, type="multiplicative"))

 components(classicalDcmpO) %>%
    autoplot() +
    labs(title = "Classical multiplicative decomposition of last 5 years ")
## Warning: Removed 2 row(s) containing missing values (geom_path).

outliergas %>%
    autoplot(Gas, color='gray') +
    autolayer(components(classicalDcmpO), season_adjust, color='blue') +
    xlab("Year") + ylab("Gas")

f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?

Adding value to middle of time series

outliergas1 <- gas

outliergas1[10,1] <- outliergas1[10, 1] + 300

classicalDcmp1 <- outliergas1 %>% 
    model(
        classical_decomposition(Gas, type="multiplicative"))
outliergas1 %>%
    autoplot(Gas, color='gray') +
    autolayer(components(classicalDcmp1), season_adjust, color='blue') +
    xlab("Year") + ylab("Gas")

The location of the outlier does make a difference, as it is represented by a large spike wherever the outlier is located. Also it changes how the data is interpreted, in the middle of the series the outlier looks like a major spike in usage while at the end of the series it looks like the trend greatly accelerated.

8. Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

set.seed(128)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries %>%
    model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
    components()
autoplot(x11_dcmp) +
    labs(title = "Decomposition of Retail Time Series Data")

The irregular shows two outliers that I did not notice previously, one is before the 1990 Jan marker and the other is a little after 2010 Jan date.

9. Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

  1. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The Australian labour force data has been decomposed into 3 components; trend, seasonal, and remainder. Based on the decomposition, trend contributes largely to the overall data series, the trend is positive and consistent. The trend also matches the scale of the data judging by the small grey box. There is a seasonal component that changes over time (between the mid and late eighties). Based on the size of the box, seasonality is not a major component of the data. Finally, the remainder section shows an outlier in the early nineties.

  1. Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession is visible in the in the remainder components. There is a noticeable drop in the early nineties.