Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
global_economy %>% autoplot(GDP/Population, show.legend = FALSE)
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
highest_GDP_country <- global_economy %>%
mutate(GDPC = GDP / Population) %>%
filter(GDPC == max(GDPC, na.rm = TRUE)) %>%
pull(Country)
highest_GDP_country
## [1] Monaco
## 263 Levels: Afghanistan Albania Algeria American Samoa Andorra ... Zimbabwe
global_economy |>
filter(Country == "Monaco")|>
autoplot(GDP / Population)
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
Monaco has experienced robust economic growth, demonstrating a strong
and resilient economy. During periods of global economic distress,
Monaco’s economy has managed to maintain its strength and even show
growth, reflecting its stability and effective economic management. In
2014 it recorded the highest GDP/capita of $ 185152.53.
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy. Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. Victorian Electricity Demand from vic_elec. Gas production from aus_production.
global_economy |>
filter(Code == "USA")|>
autoplot(GDP )
global_economy |>
filter(Code == "USA")|>
autoplot(GDP / Population )
In this case I chose to tranform the data into GDP / per capita by
dividing the GDP by total population. This transformation provide a
clear picture of the US econmy in relation to its people/.
aus_livestock|>
filter(Animal == "Bulls, bullocks and steers", State =="Victoria")|>
autoplot()
## Plot variable not specified, automatically selected `.vars = Count`
vic_elec %>% autoplot(Demand)
aus_production %>% autoplot(Gas)
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda,2))))
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
autoplot(canadian_gas)
## Plot variable not specified, automatically selected `.vars = Volume`
lamb_can_gas <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume,lamb_can_gas))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lamb_can_gas,2))))
This looks like a log function at this scale lets try:
canadian_gas |>
autoplot(log(Volume))+
labs(y = "Log Gas volume",
title =
"Transformed gas production with log")
What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(1399118)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
autoplot(Turnover)
my_lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries |>
autoplot(box_cox(Turnover,my_lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Turnover with = ",
round(my_lambda,2))))
By applying the Guerrero box_cox transformation I was able to reduce the
difference in variations.
For the following series, find an appropriate Box-Cox transformation in order to stabilize the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
aus_production |>
select(Tobacco)|>
autoplot()
## Plot variable not specified, automatically selected `.vars = Tobacco`
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda_tobacco <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Tobacco, lambda_tobacco))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tobacco with $\\lambda$ = ",
round(lambda_tobacco,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
ansett2<- ansett|>
filter(Class =="Economy",Airports == "MEL-SYD")
ansett2|>autoplot(Passengers)
ansett_lambda<- ansett2|>
features(Passengers, guerrero)|>
pull(lambda_guerrero)
ansett2|>
autoplot(box_cox(Passengers, ansett_lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Passengers with $\\lambda$ = ",
round(ansett_lambda,2))))
scs_ped<- pedestrian |>
filter(Sensor == "Southern Cross Station")
scs_ped|>autoplot(Count)
Consider the last five years of the Gas data from aus_production.
gas <- tail(aus_production, 5*4)
gas|>autoplot(Gas)
Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? The Production of of Gas shows a Yearly trend Where the peaks and Valleys always fall in the same at the same multiple of m. in this case m=4 and peaks take place in q3 of each year, while valleys occur in q1.
Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
gas|>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot() +
labs(title = "Classical Multiplicative Decomposition of AUS GAS Production 2006-2010"
,y= "Gas"
)
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Do the results support the graphical interpretation from part a? Yes the decomposition shows the analysis I described in part a.
Compute and plot the seasonally adjusted data. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
dcmp_gas <- gas |>
model(stl = STL(Gas))
components(dcmp_gas)
## # A dable: 20 x 7 [1Q]
## # Key: .model [1]
## # : Gas = trend + season_year + remainder
## .model Quarter Gas trend season_year remainder season_adjust
## <chr> <qtr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 stl 2005 Q3 221 193. 26.9 0.856 194.
## 2 stl 2005 Q4 180 197. -16.7 -0.109 197.
## 3 stl 2006 Q1 171 200. -25.7 -3.59 197.
## 4 stl 2006 Q2 224 204. 15.4 4.86 209.
## 5 stl 2006 Q3 233 207. 27.0 -1.03 206.
## 6 stl 2006 Q4 192 210. -16.7 -1.33 209.
## 7 stl 2007 Q1 187 213. -25.5 -0.550 213.
## 8 stl 2007 Q2 234 216. 15.1 2.60 219.
## 9 stl 2007 Q3 245 219. 27.0 -0.730 218.
## 10 stl 2007 Q4 205 219. -16.6 2.55 222.
## 11 stl 2008 Q1 194 219. -25.3 0.562 219.
## 12 stl 2008 Q2 229 219. 14.8 -4.59 214.
## 13 stl 2008 Q3 249 219. 27.0 2.98 222.
## 14 stl 2008 Q4 203 220. -16.6 -0.834 220.
## 15 stl 2009 Q1 196 222. -25.0 -0.740 221.
## 16 stl 2009 Q2 238 223. 14.5 0.341 223.
## 17 stl 2009 Q3 252 225. 27.1 -0.132 225.
## 18 stl 2009 Q4 210 226. -16.5 0.986 227.
## 19 stl 2010 Q1 205 226. -24.8 4.25 230.
## 20 stl 2010 Q2 236 225. 14.2 -3.62 222.
# Correctly modify the Gas value for a specific quarter
gas_modified <- gas |>
mutate(Gas = replace(Gas, Quarter == yearquarter("2007 Q3"),545))
dcmp_gas <- gas_modified |>
model(stl = STL(Gas))
components(dcmp_gas)|>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Volume ",
title = "Gas Produced in AUS 2006/2010")
Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_modified <- gas |>
mutate(Gas = replace(Gas, Quarter == yearquarter("2010 Q3"),545))
dcmp_gas <- gas_modified |>
model(stl = STL(Gas))
components(dcmp_gas)|>
as_tsibble() |>
autoplot(Gas, colour = "gray") +
geom_line(aes(y=season_adjust), colour = "#0072B2") +
labs(y = "Volume ",
title = "Gas Produced in AUS 2006/2010")
As we can see where the Outlier shows up in the TS does matter significantly in the analysis of the TS
Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
x11_dcmp <- myseries |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of Turnover using X-11.")
x11_dcmp |>
gg_subseries(seasonal,) +
labs(title = "X11 Seasonal Component")
Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation. Is the recession of 1991/1992 visible in the estimated components?
https://otexts.com/fpp3/fpp_files/figure-html/labour-1.png
https://otexts.com/fpp3/fpp_files/figure-html/labour2-1.png
The original plot displays a line graph with a general upward trajectory, featuring fluctuations but an overall positive trend. The Trend component reflects this positive trend and smooths out the data, thereby not capturing the recession periods. The Seasonal component also does not indicate any recession. However, the Remainder component highlights a significant downturn in 1991/92, effectively showing the recession data points.