Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book.
I attempted to use autoplot to show the gdp per capita for each country over time but that did not graph very nicely. Instead I filtered the data for world regions and displayed the gdp per capita for regions of the world. Over time, we can see that gdp per capita is increasing across all regions of the world. It is clear based on the graph that certain regions of the world outperformed others, particurly North America and the European Union.
global_economy %>%
filter(Country == c("Arab World", "Caribbean Small States", "Central Europe and the Baltics", "East Asia & Pacific", "European Union", "Latin America & Caribbean", "Middle East & North Africa", "North America", "Pacific island small states", "South Asia", "Sub-Saharan Africa")) %>%
autoplot(GDP/Population)
## Warning in `==.default`(Country, c("Arab World", "Caribbean Small States", :
## longer object length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
## Warning: Removed 7 row(s) containing missing values (geom_path).
newFrame <- global_economy
newFrame$gdpPerCap <- newFrame$GDP/newFrame$Population
newFrame <- newFrame[order(c(newFrame$gdpPerCap, newFrame$Year), decreasing = TRUE), ]
newFrame2 <- head(arrange(newFrame, desc(gdpPerCap)), n = 100)
newFrame2 %>%
select(Country, Year, gdpPerCap) -> nf3
ggplot(nf3, aes(x = Year, y = gdpPerCap, colour = Country)) + geom_line() + geom_point()
The country with the highest gdp per capita is Monaco, even though Liechenstein is very close and at one point was briefly higher than Monaco. The gdp per capita for the top 2 countries have a very positive trend.
us_economy <- global_economy %>%
filter(Country == "United States")
us_economy %>%
autoplot(GDP)
The trend looks exponential, therefore a transformation might be useful.
us_economy %>%
features(GDP, features = guerrero)
## # A tibble: 1 x 2
## Country lambda_guerrero
## <fct> <dbl>
## 1 United States 0.282
Using guerrero, we get an predicted optimal lambda of 0.2819443
us_economy %>%
autoplot(box_cox(GDP, 0.2819443))
Definitely more linear than our original plot.
aus_livestock %>%
filter(Animal %in% "Bulls, bullocks and steers", State %in% "Victoria") %>%
autoplot(Count)
There is no consistent increase or decrease in variance with the levels, suggesting a transformation will not be useful.
aus_livestock %>%
filter(Animal %in% "Bulls, bullocks and steers", State %in% "Victoria") %>%
features(Count, features = guerrero)
## # A tibble: 1 x 3
## Animal State lambda_guerrero
## <fct> <fct> <dbl>
## 1 Bulls, bullocks and steers Victoria -0.0720
lambda = -0.07197227
aus_livestock %>%
filter(Animal %in% "Bulls, bullocks and steers", State %in% "Victoria") %>%
autoplot(box_cox(Count, -0.07197227))
The transformed series does not show any significant change except the change of scale in the y-axis, proving that a transformation was not useful.
vic_elec %>%
autoplot(Demand)
Based on the plot above, I do not think a transformation will be useful. According to the text, “if the data shows variation that increases or decreases with the level of the series, then a transformation can be useful.”
In this case, the variation is pretty consistent. To illustrate this point, let’s go ahead with the box cox transformation anyway.
vic_elec %>%
features(Demand, features = guerrero)
## # A tibble: 1 x 1
## lambda_guerrero
## <dbl>
## 1 0.0999
vic_elec %>%
autoplot(box_cox(Demand, 0.09993089))
It made no difference, proving it was not useful.
aus_production %>%
autoplot(Gas)
A transformation is useful because there is increasing variation with the levels.
aus_production %>%
features(Gas, features = guerrero)
## # A tibble: 1 x 1
## lambda_guerrero
## <dbl>
## 1 0.121
aus_production %>%
autoplot(box_cox(Gas, 0.1205077))
Using the lambda level 0.1205077, variation is close to consistent.
canadian_gas %>%
autoplot(Volume)
A transformation is not useful in this case as the variation is not increasing or decreasing steadily with the levels. There is an increase in variation in the middle portion of the plot but then decreases in the last third.
set.seed(128)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)
## Plot variable not specified, automatically selected `.vars = Turnover`
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
Box Cox Transformation with lambda = 0.09787286
myseries %>%
autoplot(box_cox(Turnover, lambda))
The transformation makes the variation consistent throughout the series unlike in the original where there is a steady increase in variation with the levels.
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 row(s) containing missing values (geom_path).
lambda1 <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda1))
## Warning: Removed 24 row(s) containing missing values (geom_path).
Despite using a lambda of 0.9289402, the transformation had no noticeable affect.
ansett %>%
filter(Airports=="MEL-SYD", Class == "Economy") %>%
autoplot(Passengers)
lambda2 <- ansett %>%
filter(Airports=="MEL-SYD", Class=="Economy") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
lambda2 = 1.999927
ansett %>%
filter(Airports=="MEL-SYD", Class == "Economy") %>%
autoplot(box_cox(Passengers, lambda2))
The box cox transformation reduced the steep drop but at the same time increased the size of the drops at the end of the series. The scale of the y-axis was greatly increased.
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(Count)
lambda3 <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
lambda = -0.2255423
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(box_cox(Count, lambda3))
The variance is constant and stable throughout the series, even though it is not easily read.
gas <- tail(aus_production, 5*4) %>% select(Gas)
autoplot(gas)
## Plot variable not specified, automatically selected `.vars = Gas`
There is an upward trend, with a seasonal component. There are clear drops and increase in gas use. Peaks during winter and drops during the warmer months.
classicalDcmp <- gas %>%
model(
classical_decomposition(Gas, type="multiplicative"))
components(classicalDcmp) %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of last 5 years ")
## Warning: Removed 2 row(s) containing missing values (geom_path).
Do the results support the graphical interpretation from part a? The results show an upward trend and a seasonal component, which support the graphical interpretation from part a.
Compute and plot the seasonally adjusted data.
gas %>%
autoplot(Gas, color='gray') +
autolayer(components(classicalDcmp), season_adjust, color='blue') +
xlab("Year") + ylab("Gas")
Adding 300 to last observation
outliergas <- gas
outliergas[20,1] <- outliergas[20, 1] + 300
#outliergas <- outliergas %>%as_tsibble(index = 'Quarter')
Recomputing the seasonally adjusted data
classicalDcmpO <- outliergas %>%
model(
classical_decomposition(Gas, type="multiplicative"))
components(classicalDcmpO) %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of last 5 years ")
## Warning: Removed 2 row(s) containing missing values (geom_path).
outliergas %>%
autoplot(Gas, color='gray') +
autolayer(components(classicalDcmpO), season_adjust, color='blue') +
xlab("Year") + ylab("Gas")
f.Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Adding value to middle of time series
outliergas1 <- gas
outliergas1[10,1] <- outliergas1[10, 1] + 300
classicalDcmp1 <- outliergas1 %>%
model(
classical_decomposition(Gas, type="multiplicative"))
outliergas1 %>%
autoplot(Gas, color='gray') +
autolayer(components(classicalDcmp1), season_adjust, color='blue') +
xlab("Year") + ylab("Gas")
The location of the outlier does make a difference, as it is represented by a large spike wherever the outlier is located. Also it changes how the data is interpreted, in the middle of the series the outlier looks like a major spike in usage while at the end of the series it looks like the trend greatly accelerated.
set.seed(128)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title = "Decomposition of Retail Time Series Data")
The irregular shows two outliers that I did not notice previously, one is before the 1990 Jan marker and the other is a little after 2010 Jan date.
The Australian labour force data has been decomposed into 3 components; trend, seasonal, and remainder. Based on the decomposition, trend contributes largely to the overall data series, the trend is positive and consistent. The trend also matches the scale of the data judging by the small grey box. There is a seasonal component that changes over time (between the mid and late eighties). Based on the size of the box, seasonality is not a major component of the data. Finally, the remainder section shows an outlier in the early nineties.
Yes, the recession is visible in the in the remainder components. There is a noticeable drop in the early nineties.