Qatar has the highest average GDP per capita in the data set. We can see via the second plot that Qatar’s GDP per capita was not very high earlier on in the dataset but sky rocketed around the year 2000.
autoplot(global_economy, GDP/Population) +
guides(colour = FALSE)+
labs(title = "Country GDP per Capita over time",
y = "GDP per capita in $USD")
## Warning: Removed 3242 row(s) containing missing values (geom_path).
global_economy$gdp_cap<-round(global_economy$GDP/global_economy$Population,2)
global_economy_drop<-drop_na(global_economy)
avg_gdp<-aggregate(global_economy_drop$gdp_cap, list(global_economy_drop$Country), FUN=mean)
avg_gdp<-avg_gdp[order(-avg_gdp$x),]
head(avg_gdp)
## Group.1 x
## 133 Qatar 62650.62
## 156 Switzerland 48994.33
## 97 Luxembourg 42046.82
## 168 United Arab Emirates 40469.09
## 72 Iceland 34919.83
## 98 Macao SAR, China 34561.99
autoplot(global_economy %>%
filter(Country == "Qatar") , GDP/Population) +
guides(colour = FALSE)+
labs(title = "QATAR GDP per Capita over time",
y = "GDP per capita in $USD")
## Warning: Removed 10 row(s) containing missing values (geom_path).
For US GDP I transformed it with a population adjustment which adjusted to a plot on GDP per cpaita rather than just GDP.I also used the CPI in the dataset to adjust for inflation. We can see dips in the 1980’s where we do not see them on the standard plot which indicates these were likely times of hihg inflation.
autoplot(global_economy %>%
filter(Country == "United States") , GDP) +
guides(colour = FALSE)+
labs(title = "US GDP over time",
y = "GDP in $USD")
autoplot(global_economy %>%
filter(Country == "United States") , ((GDP/CPI)*100)/Population) +
guides(colour = FALSE)+
labs(title = "US GDP per cpaita adjusted for inflation over time",
y = "GDP per capita adjusted for inflation")
I tried transforming with a box-cox transformation as discussed in the book to deal wiht the seasonality displayed in the original graph but it did not seem to help much.I suppose this did not help because the variance was relatively consistent over time.
aus_livestock_bbr<-aus_livestock%>%
filter(Animal == "Bulls, bullocks and steers",State=="Victoria")
autoplot(aus_livestock_bbr , Count) +
labs(title = "Count of Bulls, bullocks and steers slaughtered over time",
y = "Count of Animals")
lambda <- aus_livestock_bbr %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
aus_livestock_bbr %>%
autoplot(box_cox(Count, lambda)) +
labs(title = "Transformed Count of Bulls, bullocks and steers slaughtered with Lambda=-0.7")
I transformed this one by making it demand of MW per degree of Temperature. This helped control for seasonality issues and heat spikes and showed where demand for electricity seams to peak agnostic of temperature (about mid year). It also shows an upword trend of these mid-year peaks.
autoplot(vic_elec , Demand) +
labs(title = "Victorian Electricity Demand over half hour time",
y = "Total Electricity Demand in MW")
autoplot(vic_elec , Demand/Temperature) +
labs(title = "Victorian Electricity Demand over half hour time",
y = "Total Electricity Demand in MW per degree of Temperature")
Since the variance of the series seemed to increase over time I used a box-cox transformation to smooth out this variance taken form the example in the text.
autoplot(aus_production , Gas) +
labs(title = "Australian Gas production",
y = "Gas Production in petjoules")
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas, lambda)) +
labs(title = "Australian Gas production Box-Cox Tranformed with lambda = .12")
Since the seasonal variance does not increase or decrease consistently (it increases and then decreases) the Box-Cox transformation will not be effective.
autoplot(canadian_gas , Volume)
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume, lambda))
Using the Guerrero feature establishes a lambda=0.0845 as the optimal value of lambda to smooth the seasonality of my chosen series from the aus_retail dataset.
set.seed(19865)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
autoplot(box_cox(Turnover, lambda)) +
labs(title = "Australian Gas production Box-Cox Tranformed with lambda = .0845")
lambda
## [1] 0.08450315
For each of these I use the guerrero method for selecting the appropriate lambda for a box-cox transformation.
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(title = paste("Australian Tobacco production Box-Cox Tranformed with lambda =",round(lambda,4)))
## Warning: Removed 24 row(s) containing missing values (geom_path).
mel_syd_ansett<-ansett%>%
filter(Airports == "MEL-SYD",Class=="Economy")
lambda <- mel_syd_ansett %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
mel_syd_ansett %>%
autoplot(box_cox(Passengers, lambda)) +
labs(title = paste("Number of Passengers from Melbourne to Sydney \n Box-Cox Tranformed with lambda =",round(lambda,4)))
pedestrian_SCS<-pedestrian%>%filter(Sensor=="Southern Cross Station")
lambda <- pedestrian_SCS %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian_SCS %>%
autoplot(box_cox(Count, lambda)) +
labs(title = paste("Pedestrian Count of Southern Cross Station \n Box-Cox Tranformed with lambda =",round(lambda,4)))
gas <- tail(aus_production, 5*4) %>% select(Gas)
There is a cycle of the prices dropping at the beginning of each quarter before sharply rising as the quarter progresses. There is a slight trend upwards overall.
autoplot(gas)
## Plot variable not specified, automatically selected `.vars = Gas`
Via the output below we can see both the trend and seasonal indices in data form or in graph form.
gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend * seasonal * random
## .model Quarter Gas trend seasonal random season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "classical_decomposition(G… 2005 Q3 221 NA 1.13 NA 196.
## 2 "classical_decomposition(G… 2005 Q4 180 NA 0.925 NA 195.
## 3 "classical_decomposition(G… 2006 Q1 171 200. 0.875 0.974 195.
## 4 "classical_decomposition(G… 2006 Q2 224 204. 1.07 1.02 209.
## 5 "classical_decomposition(G… 2006 Q3 233 207 1.13 1.00 207.
## 6 "classical_decomposition(G… 2006 Q4 192 210. 0.925 0.987 208.
## 7 "classical_decomposition(G… 2007 Q1 187 213 0.875 1.00 214.
## 8 "classical_decomposition(G… 2007 Q2 234 216. 1.07 1.01 218.
## 9 "classical_decomposition(G… 2007 Q3 245 219. 1.13 0.996 218.
## 10 "classical_decomposition(G… 2007 Q4 205 219. 0.925 1.01 222.
## 11 "classical_decomposition(G… 2008 Q1 194 219. 0.875 1.01 222.
## 12 "classical_decomposition(G… 2008 Q2 229 219 1.07 0.974 213.
## 13 "classical_decomposition(G… 2008 Q3 249 219 1.13 1.01 221.
## 14 "classical_decomposition(G… 2008 Q4 203 220. 0.925 0.996 219.
## 15 "classical_decomposition(G… 2009 Q1 196 222. 0.875 1.01 224.
## 16 "classical_decomposition(G… 2009 Q2 238 223. 1.07 0.993 222.
## 17 "classical_decomposition(G… 2009 Q3 252 225. 1.13 0.994 224.
## 18 "classical_decomposition(G… 2009 Q4 210 226 0.925 1.00 227.
## 19 "classical_decomposition(G… 2010 Q1 205 NA 0.875 NA 234.
## 20 "classical_decomposition(G… 2010 Q2 236 NA 1.07 NA 220.
gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of Australian Gas Production")
## Warning: Removed 2 row(s) containing missing values (geom_path).
Yes, there is a cycle of the prices dropping at the beginning of each quarter before sharply rising as the quarter progresses. There is a slight trend upwards overall.
gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components(dcmp) %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas Production in petajoules",
title = "Seasonally Adjusted Australian Gas Production")
gas2<-gas
gas2[1, 1] = gas2[1, 1]+300
gas2 %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components(dcmp) %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas Production in petajoules",
title = "Seasonally Adjusted Australian Gas Production")
gas2<-gas
gas2[10, 1] = gas2[10, 1]+300
gas2 %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components(dcmp) %>%
as_tsibble() %>%
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Gas Production in petajoules",
title = "Seasonally Adjusted Australian Gas Production")
I don’t think it makes a mathematical difference if the outlier is at the beginning or end but if the outlier is at the beginning the seasonally adjusted line remains about the same in relation to the actual data while if the outlier is in the middle the seasonally adjusted data gets thrown off and begins to follow the original data patterns. This may be because the classical decomposition assumes the same seasonality year over year and this assumption is disrupted by a greater magnitude with a mid-series outlier.
Something that I had not noticed earlier was that the seasonal volatility sems to become less volatile towards the end of the time-series.
library(seasonal)
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Australian Retail Turnover using X-11.")
## 9) Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Ther decomposition shows a relatively steady trend upwards with a steady seasonality. Something that may not be noticed with a plot of the data is the seasonality of the data. Since the seasonality is in much smaller numbers then the overall trend it may be easy to miss such a consistent seasonality without decomposition. It also shows very consistent seasonality within each month of the output.
Teh recession can be identified via the remainder component. STL decomposition is robust towards outliers so things like a year long recession will not show up on the trend but it will be reflected in the remainder.