library(fpp3)
library(seasonal)
library(magrittr)
library(tidyverse)
library(patchwork)
library(lubridate)
library(forecast)
library(kableExtra)
library(patchwork)
setwd('C:\\Users\\seanc\\OneDrive\\Desktop\\624\\HW2')
Consider the GDP information in “global_economy”. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita?
The GDP per-capita can be calculated as the annual ratio of GDP:Population for each respective country. On this basis, Monaco recorded the highest overall GDP per-capita ($185,152) in 2014.
How has this changed over time?
Assuming ‘this’ refers to the country with the highest GDP’s per-capita as recorded each year, the following apply:
# get dataset and create initial GDP per-capita plot (note: global_economy is a tsibble)
<-global_economy%>%
max_gdpmutate(gdp_capita = GDP/Population)
%>%
(max_gdpautoplot(.vars=gdp_capita)+
labs(title = 'Figure 1. GDP Per_Capita for Countries in Global_Economy')+
theme_classic()+
theme(legend.position = "none"))
# retrieve the overall maximum GDP per-captia
<-max_gdp%>%
Mas.data.frame()%>%
select(gdp_capita)%>%
drop_na(gdp_capita)%>%
max()
%>%
(max_gdpfilter(gdp_capita == M)%>%
kable())
Country | Code | Year | GDP | Growth | CPI | Imports | Exports | Population | gdp_capita |
---|---|---|---|---|---|---|---|---|---|
Monaco | MCO | 2014 | 7060236168 | 7.179637 | NA | NA | NA | 38132 | 185152.5 |
# identify country with maximum per-capita GDP for each year
kable(global_economy %>%
as_tibble()%>%
mutate(gdp_capita = GDP/Population)%>%
group_by(Country, Year) %>%
summarise(Max_gdp = max(gdp_capita))%>%
ungroup()%>%
group_by(Year)%>%
slice(which.max(Max_gdp))) # slice returns one row per group
Country | Year | Max_gdp |
---|---|---|
United States | 1960 | 3007.123 |
United States | 1961 | 3066.563 |
United States | 1962 | 3243.843 |
United States | 1963 | 3374.515 |
United States | 1964 | 3573.941 |
Kuwait | 1965 | 4429.171 |
Kuwait | 1966 | 4556.463 |
United States | 1967 | 4336.427 |
United States | 1968 | 4695.923 |
United States | 1969 | 5032.145 |
Monaco | 1970 | 12479.725 |
Monaco | 1971 | 13813.301 |
Monaco | 1972 | 16733.622 |
Monaco | 1973 | 21422.841 |
Monaco | 1974 | 22707.456 |
Monaco | 1975 | 28254.276 |
United Arab Emirates | 1976 | 29698.169 |
United Arab Emirates | 1977 | 33245.836 |
Monaco | 1978 | 38353.806 |
Monaco | 1979 | 45838.162 |
Monaco | 1980 | 51528.547 |
Monaco | 1981 | 44366.295 |
Monaco | 1982 | 41385.356 |
Monaco | 1983 | 38887.766 |
Monaco | 1984 | 36381.697 |
Monaco | 1985 | 37553.358 |
Monaco | 1986 | 52174.842 |
Monaco | 1987 | 63043.178 |
Monaco | 1988 | 68434.228 |
Monaco | 1989 | 68576.585 |
Monaco | 1990 | 84286.696 |
Monaco | 1991 | 83732.701 |
Monaco | 1992 | 91654.121 |
Monaco | 1993 | 85421.727 |
Monaco | 1994 | 89404.075 |
Monaco | 1995 | 101993.122 |
Monaco | 1996 | 101328.795 |
Monaco | 1997 | 90882.923 |
Monaco | 1998 | 93093.260 |
Monaco | 1999 | 91383.940 |
Monaco | 2000 | 82534.874 |
Monaco | 2001 | 82552.567 |
Monaco | 2002 | 89061.051 |
Monaco | 2003 | 108978.489 |
Monaco | 2004 | 123382.016 |
Monaco | 2005 | 124374.268 |
Monaco | 2006 | 133195.429 |
Monaco | 2007 | 167124.741 |
Monaco | 2008 | 180640.125 |
Monaco | 2009 | 149221.362 |
Monaco | 2010 | 144569.176 |
Monaco | 2011 | 162155.499 |
Monaco | 2012 | 152000.362 |
Liechtenstein | 2013 | 173528.150 |
Monaco | 2014 | 185152.527 |
Liechtenstein | 2015 | 167590.608 |
Monaco | 2016 | 168010.915 |
Luxembourg | 2017 | 104103.037 |
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
# GDP from US global economy
<-global_economy%>%
fig2filter(Country == 'United States')%>%
autoplot(GDP, color='steelblue')+
labs(title = "Figure 2. United States GDP Over Time")+
theme_classic()
# adjust for population size
<-global_economy%>%
fig3filter(Country == 'United States')%>%
mutate(gdp_capita = GDP/Population)%>%
autoplot(gdp_capita, color='red')+
labs(title = "Figure 3. United States GD Per-Capita Over Time")+
theme_classic()
/fig3 fig2
The GDP from US Global Economy displays a curvilinear upward trend-cycle over the period of record. Seasonal variations are absent. On this basis, a transformation is not warranted. However, adjusting for changes in annual population may be prudent. Comparison of Figure 2 and 3 indicates a change in scale (Y axis) but not trend-cycle with a population adjustment.
# Australian livestock
%>%
aus_livestockfilter(Animal == 'Bulls, bullocks and steers' & State =='Victoria')%>%
autoplot(Count , color='steelblue')+
labs(title = "Figure 4. Monthly Slaughter of Australian Bulls, Bullocks and Steers over Time")+
theme_classic()
There do not appear to be any changes in the variance (amplitude) of slaughter counts over time that warrant a mathematical transformation (Figure 4). The slaughter counts are recorded as monthly totals, leaving room for calendar adjustment (e.g., daily avg.). However it is unlikely that the latter will lead to any consequential changes in this series.
# plot daily electrical consumption, vic_elect
<-vic_elec%>%
fig5autoplot(Demand, color='steelblue')+
labs(title = 'Figure 5. Daily Total Electricity Demand, Victoria', subtitle='2012-14')+
theme_classic()
# plot monthly electrical consumption, vic_elect
<-vic_elec%>%
fig6index_by(Year_Month = ~ yearmonth(.))%>%
summarise(Monthly_Totals = sum(Demand))%>%
autoplot(Monthly_Totals, color='steelblue')+
labs(title = 'Figure 6. Monthly Total Electricity Demand, Victoria', subtitle='2012-14')+
theme_classic()
/fig6 fig5
Daily totals for Victorian electricity demand (2012-14) are displayed in Figure 5 and include an obvious seasonal signal, with increased demand during summer/winter months. There do not appear to be any systematic variations in the seasonal signal across time. A plot of monthly totals (Figure 6) supports this view and further indicates that this series would not benefit from mathematical transformation
# Plot australian gas production
<-aus_production%>%
fig7 autoplot(Gas, color='steelblue')+
labs(title = "Figure 7. Australian Gas Production")+
theme_classic()
#Apply box-cox transformation. The following code comes directly from our textbook
<- aus_production %>%
lambda features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
<-aus_production %>%
fig8autoplot(box_cox(Gas, lambda), color = 'steelblue') +
labs(y = "", title = "Figure 8. Austrailian Gas Production", subtitle="Lambda Transformed: 0.12")+
theme_classic()
/fig8 fig7
The Australian gas production series shows increasing variance in the seasonal signal over time (Figure 7). A log transformation of the production totals is a good choice for reducing this variance. I’ve used a Box-Cox transformation (lambda=0.12) to achieve the same effect, Figure 8.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
Box-Cox is effective when there is a monotonic change in the variance of a time series. That is not the case for the Canadian gas data. The seasonal variance is moderate during the first and last 10-15 yrs of record and is higher between these periods (Figure 9). Applying a box-cox transformation to the data yields little change in the variance (Figure 10).
glimpse(canadian_gas)
## Rows: 542
## Columns: 2
## $ Month <mth> 1960 Jan, 1960 Feb, 1960 Mar, 1960 Apr, 1960 May, 1960 Jun, 196~
## $ Volume <dbl> 1.4306, 1.3059, 1.4022, 1.1699, 1.1161, 1.0113, 0.9660, 0.9773,~
<-canadian_gas%>%
fig9autoplot(Volume, color = 'steelblue')+
labs(title = "Canadian Gas Production: Untransformed")+
theme_classic()
# Apply Box Cox transformation
<- canadian_gas %>%
lambda features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
<-canadian_gas %>%
fig10autoplot(box_cox(Volume, lambda), color = 'steelblue') +
labs(y = "", title = "Figure 9. Canadian Gas Production", subtitle="Lambda Transformed: 0.39")+
theme_classic()
# try log as dbl check -- not printed
<-canadian_gas %>%
temp_logmutate(log_vol = log(Volume))%>%
autoplot(log_vol, color = 'steelblue') +
labs(y = "", title = "Canadian Gas Production", subtitle="Lambda Transformed: 0.39")+
theme_classic()
# try sqrt as dbl chk -- not printed
<-canadian_gas %>%
temp_sqrtmutate(sqrt_vol = sqrt(Volume))%>%
autoplot(sqrt_vol, color = 'steelblue') +
labs(y = "", title = "Canadian Gas Production", subtitle="Lambda Transformed: 0.39")+
theme_classic()
#print fig9
/fig10 fig9
What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
I would select a log transformation (Figure 11). It is the obvious choice for data in which the variance is increasing over time. A lambda calculation based on the data also results in a log transform (Figure 12).
# import retail data and create initial time series plot
set.seed(12345678)
<- aus_retail %>%
series filter(`Series ID` == sample(aus_retail$`Series ID`,1))
%>%
seriesautoplot(color='steelblue')+
labs(title="Retail Turnover ($Million AUD) by Month in Australia")+
theme_classic()
# Replot using the log of turnover
<-series%>%
fig11mutate(log_turnover = log(Turnover))%>%
autoplot(log_turnover, color = 'steelblue') +
labs(y = "", title = "Figure 11. Retail Turnover ($Million AUD) by Month in Australia", subtitle= "Log Transformed")+
theme_classic()
# look at boxcox
<- series %>%
t_lambda features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
<-series %>%
fig12autoplot(box_cox(Turnover, t_lambda), color = 'steelblue') +
labs(y = "", title = "Figure 12. Retail Turnover ($Million AUD) by Month in Australia", subtitle=" Box-Cox: lambda =-0.022")+
theme_classic()
/fig12 fig11
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from Ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
# plot each dataset
<-aus_production%>%
fig13select(Tobacco)%>%
autoplot(Tobacco, color='steelblue')+
labs(title='Figure 13. Austrailian Tobacco Production (Tonnes)')+
theme_classic()
<-ansett%>%
fig14filter(Class %in% 'Economy', Airports %in% 'MEL-SYD')%>%
autoplot(Passengers, color = 'steelblue')+
labs(title='Figure 14. Total Passengers Traveling Ansett Airlines')+
theme_classic()
<-pedestrian%>%
fig15filter(Sensor %in% "Southern Cross Station")%>%
autoplot(Count, color='steelblue', alpha=0.5)+
labs(title='Figure 15. Pedestrian Counts at the Southern Cross Station, Melbourne AU')+
theme_classic()
/fig14/fig15 fig13
The following Box Cox transformations balance the variance in our datasets
Australian Tobacco (Figure 16): lambda = 0.92 (close to 1: no transform).
Passengers on Ansett Airlines (Figure 17): lambda = 1.99 (quadratic transform).
Pedestrian Counts at Southern Cross Station (Figure 18): lambda = -0.225 (log transform or possibly a square root transform).
#1
<- aus_production%>%
aus_lambda features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
<-aus_production%>%
fig16autoplot(box_cox(Tobacco, aus_lambda), color = 'steelblue') +
labs(y = "", title = "Figure 16. Australian Tobacco Production", subtitle="Lambda = 0.92")+
theme_classic()
#2 why am I getting multiple values???
<- ansett%>%
ansett_lambda filter(Class %in% 'Economy', Airports %in% 'MEL-SYD')%>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
<-ansett%>%
fig17filter(Class %in% 'Economy', Airports %in% 'MEL-SYD')%>%
autoplot(box_cox(Passengers, ansett_lambda), color = 'darkred') +
labs(y = "", title = "Figure 17. Passengers on Ansett Airlines", subtitle="Lambda = 1.99")+
theme_classic()
#3
<- pedestrian%>%
ped_lambda filter(Sensor %in% "Southern Cross Station")%>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
<-pedestrian%>%
fig18filter(Sensor %in% "Southern Cross Station")%>%
autoplot(box_cox(Count, ped_lambda), color = 'steelblue') +
labs(y = "", title = "FIgure 18. Pedestrian Counts at Southern Cross Station", subtitle="Lambda = -0.225")+
theme_classic()
/fig17/fig18 fig16
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) %>% select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Yes there is a seasonal fluctuation with an annual period of rising (Q1&Q2) and falling (Q3&Q4) values. There is also an upward trend in the data over the period of record.
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part A?
Yes, the trend_cycle is clearly increasing. And there is an obvious pattern in the seasonal component of the decomposition.
Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Addition of an outlier in the middle of the time series results in a moderate loss of signal in the seasonal component and a significant loss of signal in the trend component.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Addition of an outlier near the end of the series results in loss of signal in the seasonal component and a very small signal in the trend component. It’s worth noting that classical decomposition does not include portions of the data (trend, random) at the beginning/end of the time series (in this instance: 2005-Q3/Q4 & 2010 - Q1/Q2), yet provides an estimate of the seasonal component in these positions.
#Collect data and build plots
<- tail(aus_production, 5*4) %>% select(Gas) # 4 represents quarters in this dataset
gas
<-gas%>%
fig19autoplot(color = 'steelblue')+
labs(title='Figure 19. Australian Gas Production')+
theme_classic() # definitely trend-cycle and seasonality
#classical decomposition
<-gas%>%
classicmodel(classical_decomposition(Gas ~ season(4), type = "mult"))%>% # from feasts package
components()
<-classic%>%
fig20autoplot(color='steelblue') +
labs(title = "Figure 20. Classical decomposition of Quarterly
Australian Gas Production", subtitle = 'Multiplicative')+
theme_classic()
#seasonal adjustment # note that classic computes trend, seasonal, random and season_adjust
<-classic%>%
adjust_seasonselect(Quarter, season_adjust)
<-adjust_season%>%
fig21autoplot(season_adjust, color='steelblue')+
labs(title='Figure 21. Seasonally Adjusted Austrailian Gas Production')+
theme_classic()
# Change one observation to be an outlier, and recompute the seasonally adjusted data
<- gas
mid_out $Gas[10] <- mid_out$Gas[10] + 300
mid_out
<-mid_out%>%
fig22model(classical_decomposition(Gas ~ season(4), type = "mult"))%>%
components()%>%
select(Quarter, season_adjust)%>%
autoplot(season_adjust, color='steelblue')+
labs(title='Figure 22. Seasonally Adjusted Australian Gas Production with Outlier', subtitle='Outlier at Center of Time Series')+
theme_classic()
# place outlier at end of series and recompute seasonally adjusted data
<- gas
end_out $Gas[20] <- end_out$Gas[20] + 300
end_out
<-end_out%>%
fig23model(classical_decomposition(Gas ~ season(4), type = "mult"))%>%
components()%>%
select(Quarter, season_adjust)%>%
autoplot(season_adjust, color='steelblue')+
labs(title='Figure 23. Seasonally Adjusted Australian Gas Production with Outlier', subtitle='Outlier at End of Time Series')+
theme_classic()
fig19
fig20
/fig22/fig23 fig21
Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
# Decompose using X11
<-aus_retail%>%
x_11filter(State == 'Victoria' & Industry == 'Pharmaceutical, cosmetic and toiletry goods retailing')%>%
model(test=X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
<-autoplot(x_11,color='steelblue') +
fig24labs(title =
"Decomposition of Total Australian Retail Employment Using X-11.",
subtitle = 'Figure 24. Pharmaceutical, Cosmetic and Toiletry Goods Retailing: Time Series Decomposition')+
theme_classic()
#identify specific outliers
<-x_11%>%as.data.frame()
df
<-df%>%ggplot(aes(y=irregular))+
fig25geom_boxplot(fill = "#0c4c8a", alpha=0.5)+
labs(title = 'Figure 25. Box plot of Irregular Component')+
theme_classic()
%>%
dffilter(irregular > 1.15) # july 1991, irregular=1.16, Turnover=65.1
|fig25 fig24
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
There is a relatively smooth, linear, upward trend in the data (value, trend). Variance within the seasonal component appears to be stationary. The residuals appear randomly distributed with the exception of several large outliers around 1991. Based on the graph scales, the seasonal component and residual components comprise a small fraction of the variation (~10%) in the time series.
Is the recession of 1991/1992 visible in the estimated components?
Yes, as indicated in the previous response, the large negative outliers in the residuals provide clear evidence of the 1991-92 recession.