library(fpp3)
library(seasonal)
library(magrittr)
library(tidyverse)
library(patchwork)
library(lubridate)
library(forecast)
library(kableExtra)
library(patchwork)
setwd('C:\\Users\\seanc\\OneDrive\\Desktop\\624\\HW2')Consider the GDP information in “global_economy”. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita?
The GDP per-capita can be calculated as the annual ratio of GDP:Population for each respective country. On this basis, Monaco recorded the highest overall GDP per-capita ($185,152) in 2014.
How has this changed over time?
Assuming ‘this’ refers to the country with the highest GDP’s per-capita as recorded each year, the following apply:
# get dataset and create initial GDP per-capita plot (note: global_economy is a tsibble)
max_gdp<-global_economy%>%
mutate(gdp_capita = GDP/Population)
(max_gdp%>%
autoplot(.vars=gdp_capita)+
labs(title = 'Figure 1. GDP Per_Capita for Countries in Global_Economy')+
theme_classic()+
theme(legend.position = "none"))# retrieve the overall maximum GDP per-captia
M<-max_gdp%>%
as.data.frame()%>%
select(gdp_capita)%>%
drop_na(gdp_capita)%>%
max()
(max_gdp%>%
filter(gdp_capita == M)%>%
kable())| Country | Code | Year | GDP | Growth | CPI | Imports | Exports | Population | gdp_capita |
|---|---|---|---|---|---|---|---|---|---|
| Monaco | MCO | 2014 | 7060236168 | 7.179637 | NA | NA | NA | 38132 | 185152.5 |
# identify country with maximum per-capita GDP for each year
kable(global_economy %>%
as_tibble()%>%
mutate(gdp_capita = GDP/Population)%>%
group_by(Country, Year) %>%
summarise(Max_gdp = max(gdp_capita))%>%
ungroup()%>%
group_by(Year)%>%
slice(which.max(Max_gdp))) # slice returns one row per group| Country | Year | Max_gdp |
|---|---|---|
| United States | 1960 | 3007.123 |
| United States | 1961 | 3066.563 |
| United States | 1962 | 3243.843 |
| United States | 1963 | 3374.515 |
| United States | 1964 | 3573.941 |
| Kuwait | 1965 | 4429.171 |
| Kuwait | 1966 | 4556.463 |
| United States | 1967 | 4336.427 |
| United States | 1968 | 4695.923 |
| United States | 1969 | 5032.145 |
| Monaco | 1970 | 12479.725 |
| Monaco | 1971 | 13813.301 |
| Monaco | 1972 | 16733.622 |
| Monaco | 1973 | 21422.841 |
| Monaco | 1974 | 22707.456 |
| Monaco | 1975 | 28254.276 |
| United Arab Emirates | 1976 | 29698.169 |
| United Arab Emirates | 1977 | 33245.836 |
| Monaco | 1978 | 38353.806 |
| Monaco | 1979 | 45838.162 |
| Monaco | 1980 | 51528.547 |
| Monaco | 1981 | 44366.295 |
| Monaco | 1982 | 41385.356 |
| Monaco | 1983 | 38887.766 |
| Monaco | 1984 | 36381.697 |
| Monaco | 1985 | 37553.358 |
| Monaco | 1986 | 52174.842 |
| Monaco | 1987 | 63043.178 |
| Monaco | 1988 | 68434.228 |
| Monaco | 1989 | 68576.585 |
| Monaco | 1990 | 84286.696 |
| Monaco | 1991 | 83732.701 |
| Monaco | 1992 | 91654.121 |
| Monaco | 1993 | 85421.727 |
| Monaco | 1994 | 89404.075 |
| Monaco | 1995 | 101993.122 |
| Monaco | 1996 | 101328.795 |
| Monaco | 1997 | 90882.923 |
| Monaco | 1998 | 93093.260 |
| Monaco | 1999 | 91383.940 |
| Monaco | 2000 | 82534.874 |
| Monaco | 2001 | 82552.567 |
| Monaco | 2002 | 89061.051 |
| Monaco | 2003 | 108978.489 |
| Monaco | 2004 | 123382.016 |
| Monaco | 2005 | 124374.268 |
| Monaco | 2006 | 133195.429 |
| Monaco | 2007 | 167124.741 |
| Monaco | 2008 | 180640.125 |
| Monaco | 2009 | 149221.362 |
| Monaco | 2010 | 144569.176 |
| Monaco | 2011 | 162155.499 |
| Monaco | 2012 | 152000.362 |
| Liechtenstein | 2013 | 173528.150 |
| Monaco | 2014 | 185152.527 |
| Liechtenstein | 2015 | 167590.608 |
| Monaco | 2016 | 168010.915 |
| Luxembourg | 2017 | 104103.037 |
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
# GDP from US global economy
fig2<-global_economy%>%
filter(Country == 'United States')%>%
autoplot(GDP, color='steelblue')+
labs(title = "Figure 2. United States GDP Over Time")+
theme_classic()
# adjust for population size
fig3<-global_economy%>%
filter(Country == 'United States')%>%
mutate(gdp_capita = GDP/Population)%>%
autoplot(gdp_capita, color='red')+
labs(title = "Figure 3. United States GD Per-Capita Over Time")+
theme_classic()
fig2/fig3 The GDP from US Global Economy displays a curvilinear upward trend-cycle over the period of record. Seasonal variations are absent. On this basis, a transformation is not warranted. However, adjusting for changes in annual population may be prudent. Comparison of Figure 2 and 3 indicates a change in scale (Y axis) but not trend-cycle with a population adjustment.
# Australian livestock
aus_livestock%>%
filter(Animal == 'Bulls, bullocks and steers' & State =='Victoria')%>%
autoplot(Count , color='steelblue')+
labs(title = "Figure 4. Monthly Slaughter of Australian Bulls, Bullocks and Steers over Time")+
theme_classic()There do not appear to be any changes in the variance (amplitude) of slaughter counts over time that warrant a mathematical transformation (Figure 4). The slaughter counts are recorded as monthly totals, leaving room for calendar adjustment (e.g., daily avg.). However it is unlikely that the latter will lead to any consequential changes in this series.
# plot daily electrical consumption, vic_elect
fig5<-vic_elec%>%
autoplot(Demand, color='steelblue')+
labs(title = 'Figure 5. Daily Total Electricity Demand, Victoria', subtitle='2012-14')+
theme_classic()
# plot monthly electrical consumption, vic_elect
fig6<-vic_elec%>%
index_by(Year_Month = ~ yearmonth(.))%>%
summarise(Monthly_Totals = sum(Demand))%>%
autoplot(Monthly_Totals, color='steelblue')+
labs(title = 'Figure 6. Monthly Total Electricity Demand, Victoria', subtitle='2012-14')+
theme_classic()
fig5/fig6Daily totals for Victorian electricity demand (2012-14) are displayed in Figure 5 and include an obvious seasonal signal, with increased demand during summer/winter months. There do not appear to be any systematic variations in the seasonal signal across time. A plot of monthly totals (Figure 6) supports this view and further indicates that this series would not benefit from mathematical transformation
# Plot australian gas production
fig7 <-aus_production%>%
autoplot(Gas, color='steelblue')+
labs(title = "Figure 7. Australian Gas Production")+
theme_classic()
#Apply box-cox transformation. The following code comes directly from our textbook
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
fig8<-aus_production %>%
autoplot(box_cox(Gas, lambda), color = 'steelblue') +
labs(y = "", title = "Figure 8. Austrailian Gas Production", subtitle="Lambda Transformed: 0.12")+
theme_classic()
fig7/fig8The Australian gas production series shows increasing variance in the seasonal signal over time (Figure 7). A log transformation of the production totals is a good choice for reducing this variance. I’ve used a Box-Cox transformation (lambda=0.12) to achieve the same effect, Figure 8.
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
Box-Cox is effective when there is a monotonic change in the variance of a time series. That is not the case for the Canadian gas data. The seasonal variance is moderate during the first and last 10-15 yrs of record and is higher between these periods (Figure 9). Applying a box-cox transformation to the data yields little change in the variance (Figure 10).
glimpse(canadian_gas)## Rows: 542
## Columns: 2
## $ Month <mth> 1960 Jan, 1960 Feb, 1960 Mar, 1960 Apr, 1960 May, 1960 Jun, 196~
## $ Volume <dbl> 1.4306, 1.3059, 1.4022, 1.1699, 1.1161, 1.0113, 0.9660, 0.9773,~
fig9<-canadian_gas%>%
autoplot(Volume, color = 'steelblue')+
labs(title = "Canadian Gas Production: Untransformed")+
theme_classic()
# Apply Box Cox transformation
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
fig10<-canadian_gas %>%
autoplot(box_cox(Volume, lambda), color = 'steelblue') +
labs(y = "", title = "Figure 9. Canadian Gas Production", subtitle="Lambda Transformed: 0.39")+
theme_classic()
# try log as dbl check -- not printed
temp_log<-canadian_gas %>%
mutate(log_vol = log(Volume))%>%
autoplot(log_vol, color = 'steelblue') +
labs(y = "", title = "Canadian Gas Production", subtitle="Lambda Transformed: 0.39")+
theme_classic()
# try sqrt as dbl chk -- not printed
temp_sqrt<-canadian_gas %>%
mutate(sqrt_vol = sqrt(Volume))%>%
autoplot(sqrt_vol, color = 'steelblue') +
labs(y = "", title = "Canadian Gas Production", subtitle="Lambda Transformed: 0.39")+
theme_classic()
#print fig9
fig9/fig10What Box-Cox transformation would you select for your retail data (from Exercise 8 in Section 2.10)?
I would select a log transformation (Figure 11). It is the obvious choice for data in which the variance is increasing over time. A lambda calculation based on the data also results in a log transform (Figure 12).
# import retail data and create initial time series plot
set.seed(12345678)
series <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
series%>%
autoplot(color='steelblue')+
labs(title="Retail Turnover ($Million AUD) by Month in Australia")+
theme_classic()# Replot using the log of turnover
fig11<-series%>%
mutate(log_turnover = log(Turnover))%>%
autoplot(log_turnover, color = 'steelblue') +
labs(y = "", title = "Figure 11. Retail Turnover ($Million AUD) by Month in Australia", subtitle= "Log Transformed")+
theme_classic()
# look at boxcox
t_lambda <- series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
fig12<-series %>%
autoplot(box_cox(Turnover, t_lambda), color = 'steelblue') +
labs(y = "", title = "Figure 12. Retail Turnover ($Million AUD) by Month in Australia", subtitle=" Box-Cox: lambda =-0.022")+
theme_classic()
fig11/fig12For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from Ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
# plot each dataset
fig13<-aus_production%>%
select(Tobacco)%>%
autoplot(Tobacco, color='steelblue')+
labs(title='Figure 13. Austrailian Tobacco Production (Tonnes)')+
theme_classic()
fig14<-ansett%>%
filter(Class %in% 'Economy', Airports %in% 'MEL-SYD')%>%
autoplot(Passengers, color = 'steelblue')+
labs(title='Figure 14. Total Passengers Traveling Ansett Airlines')+
theme_classic()
fig15<-pedestrian%>%
filter(Sensor %in% "Southern Cross Station")%>%
autoplot(Count, color='steelblue', alpha=0.5)+
labs(title='Figure 15. Pedestrian Counts at the Southern Cross Station, Melbourne AU')+
theme_classic()
fig13/fig14/fig15 The following Box Cox transformations balance the variance in our datasets
Australian Tobacco (Figure 16): lambda = 0.92 (close to 1: no transform).
Passengers on Ansett Airlines (Figure 17): lambda = 1.99 (quadratic transform).
Pedestrian Counts at Southern Cross Station (Figure 18): lambda = -0.225 (log transform or possibly a square root transform).
#1
aus_lambda <- aus_production%>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
fig16<-aus_production%>%
autoplot(box_cox(Tobacco, aus_lambda), color = 'steelblue') +
labs(y = "", title = "Figure 16. Australian Tobacco Production", subtitle="Lambda = 0.92")+
theme_classic()
#2 why am I getting multiple values???
ansett_lambda <- ansett%>%
filter(Class %in% 'Economy', Airports %in% 'MEL-SYD')%>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
fig17<-ansett%>%
filter(Class %in% 'Economy', Airports %in% 'MEL-SYD')%>%
autoplot(box_cox(Passengers, ansett_lambda), color = 'darkred') +
labs(y = "", title = "Figure 17. Passengers on Ansett Airlines", subtitle="Lambda = 1.99")+
theme_classic()
#3
ped_lambda <- pedestrian%>%
filter(Sensor %in% "Southern Cross Station")%>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
fig18<-pedestrian%>%
filter(Sensor %in% "Southern Cross Station")%>%
autoplot(box_cox(Count, ped_lambda), color = 'steelblue') +
labs(y = "", title = "FIgure 18. Pedestrian Counts at Southern Cross Station", subtitle="Lambda = -0.225")+
theme_classic()
fig16/fig17/fig18Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4) %>% select(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
Yes there is a seasonal fluctuation with an annual period of rising (Q1&Q2) and falling (Q3&Q4) values. There is also an upward trend in the data over the period of record.
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. Do the results support the graphical interpretation from part A?
Yes, the trend_cycle is clearly increasing. And there is an obvious pattern in the seasonal component of the decomposition.
Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
Addition of an outlier in the middle of the time series results in a moderate loss of signal in the seasonal component and a significant loss of signal in the trend component.
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
Addition of an outlier near the end of the series results in loss of signal in the seasonal component and a very small signal in the trend component. It’s worth noting that classical decomposition does not include portions of the data (trend, random) at the beginning/end of the time series (in this instance: 2005-Q3/Q4 & 2010 - Q1/Q2), yet provides an estimate of the seasonal component in these positions.
#Collect data and build plots
gas <- tail(aus_production, 5*4) %>% select(Gas) # 4 represents quarters in this dataset
fig19<-gas%>%
autoplot(color = 'steelblue')+
labs(title='Figure 19. Australian Gas Production')+
theme_classic() # definitely trend-cycle and seasonality
#classical decomposition
classic<-gas%>%
model(classical_decomposition(Gas ~ season(4), type = "mult"))%>% # from feasts package
components()
fig20<-classic%>%
autoplot(color='steelblue') +
labs(title = "Figure 20. Classical decomposition of Quarterly
Australian Gas Production", subtitle = 'Multiplicative')+
theme_classic()
#seasonal adjustment # note that classic computes trend, seasonal, random and season_adjust
adjust_season<-classic%>%
select(Quarter, season_adjust)
fig21<-adjust_season%>%
autoplot(season_adjust, color='steelblue')+
labs(title='Figure 21. Seasonally Adjusted Austrailian Gas Production')+
theme_classic()
# Change one observation to be an outlier, and recompute the seasonally adjusted data
mid_out <- gas
mid_out$Gas[10] <- mid_out$Gas[10] + 300
fig22<-mid_out%>%
model(classical_decomposition(Gas ~ season(4), type = "mult"))%>%
components()%>%
select(Quarter, season_adjust)%>%
autoplot(season_adjust, color='steelblue')+
labs(title='Figure 22. Seasonally Adjusted Australian Gas Production with Outlier', subtitle='Outlier at Center of Time Series')+
theme_classic()
# place outlier at end of series and recompute seasonally adjusted data
end_out <- gas
end_out$Gas[20] <- end_out$Gas[20] + 300
fig23<-end_out%>%
model(classical_decomposition(Gas ~ season(4), type = "mult"))%>%
components()%>%
select(Quarter, season_adjust)%>%
autoplot(season_adjust, color='steelblue')+
labs(title='Figure 23. Seasonally Adjusted Australian Gas Production with Outlier', subtitle='Outlier at End of Time Series')+
theme_classic()
fig19fig20fig21/fig22/fig23Recall your retail time series data (from Exercise 8 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
# Decompose using X11
x_11<-aus_retail%>%
filter(State == 'Victoria' & Industry == 'Pharmaceutical, cosmetic and toiletry goods retailing')%>%
model(test=X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
fig24<-autoplot(x_11,color='steelblue') +
labs(title =
"Decomposition of Total Australian Retail Employment Using X-11.",
subtitle = 'Figure 24. Pharmaceutical, Cosmetic and Toiletry Goods Retailing: Time Series Decomposition')+
theme_classic()
#identify specific outliers
df<-x_11%>%as.data.frame()
fig25<-df%>%ggplot(aes(y=irregular))+
geom_boxplot(fill = "#0c4c8a", alpha=0.5)+
labs(title = 'Figure 25. Box plot of Irregular Component')+
theme_classic()
df%>%
filter(irregular > 1.15) # july 1991, irregular=1.16, Turnover=65.1fig24|fig25Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
There is a relatively smooth, linear, upward trend in the data (value, trend). Variance within the seasonal component appears to be stationary. The residuals appear randomly distributed with the exception of several large outliers around 1991. Based on the graph scales, the seasonal component and residual components comprise a small fraction of the variation (~10%) in the time series.
Is the recession of 1991/1992 visible in the estimated components?
Yes, as indicated in the previous response, the large negative outliers in the residuals provide clear evidence of the 1991-92 recession.