global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?Remove rows where records have NA values in GDP, and plot the remaining records for all countries that recorded GDP over time.
Upon visual inspection of the graph above, Monaco has consistently the highest GDP per capita. However, since about 2010 Liechtenstein has been giving Monaco a run for the first place position, and in 2015 it briefy held the top spot.
ggplotly(
autoplot(global_economy, GDP/Population, na.rm = TRUE) +
theme(legend.position = "none"))
global_economy.aus_livestock.vic_elec.aus_production.The global_economy dataset was filtered for the United States GDP only, then the y-axis was transformed to scale in Billions of dollars.
US <-global_economy %>%
filter(Country == "United States") %>% # filter for just US GDP
mutate(GDP = GDP / 1e9) # divide Cost by 1B for y-axis scaling
autoplot(US,GDP) +
labs(y = "GDP in $US (B)",
title = "US GDP")
The aus_livestock dataset was filtered for “Bulls, bullocks and steers”, then the y-axis was transformed to scale in the thousands, and the graphics were individually scaled and facet wrapped for better readability within each Australian territory.
AU <-aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers") %>% # filter for Bulls, bullocks and steers
mutate(Count = Count / 1e3) # divide Cost by 1K for y-axis scaling
autoplot(AU,Count) +
labs(y = "Slaughtered ('000)",
title = "Slaughter of Australian bulls, bullocks and steers") +
theme(legend.position = "none") +
facet_wrap(State ~ ., nrow = 2, ncol = 4, scales = "free_y", strip.position = "bottom")
The vic_elec dataset was simply graphed to show the trend and seasonality of electricity demand over time.
VIC <-vic_elec
ggplotly(autoplot(VIC,Demand) +
labs(y = "Demand",
title = "Victoria electricity demand"))
The aus_production dataset was transformed with Box-Cox by finding the optimal lambda of 0.1205 for the transformation. 0.1205 is closest to 0.0, which indicates a log transform in Box-Cox. Then the y-axis was log transformed with Box-Cox and plotted.
lambda <- aus_production %>%
features(Gas, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Gas,lambda)) +
labs(y = "",
title = paste0("Transformed Australian gas production with lambda = ",round(lambda,4)))
canadian_gas data?Box-Cox transformation is unhelpful for the canadian-gas data because, as seen below, the variation in the data shows “bookends” where the variation was less from 1960 to about 1975, then greater from about 1975 until about 1990, and then greatly diminished from 1990 to the present. Because the variation went from small to large to small, the transformations have little overall effect in ‘scaling’ the data that is only larger in the middle.
lambda <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume,lambda)) +
labs(y = "Cubic meters (B)",
title = paste0("Monthly Canadian gas production with lambda = ",round(lambda,4)))
This refers to the aus_retail dataset showing retail turnover from 1982 to 2019. The original series is shown below, with an apparent increase in variation increasing from rougly 2008 onward.
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries,Turnover) +
labs(title = "Turnover by Month")
The Box-Cox transformation suggests that lambda is -0.1028, which is closest to a log transform, but since it is a negative lambda it would be a reciprocal log transform. The effect can be seen in the scale of the y-axis, and the seasonal variation does seem to be more uniformly distributed year-over-year.
set.seed(54321)
lambda <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1)) %>%
autoplot(box_cox(Turnover,lambda)) +
labs(y = "Turnover",
title = paste0("Monthly Australian retail turnover with lambda = ",round(lambda,4)))
Tobacco from aus_production raw data looks like this, note the scale from approx 4000 to 8500 tonnes.
aus_production %>%
autoplot(Tobacco) +
labs(y = "",
title = "Australian tobacco production")
The Box-Cox transformation shows a lambda of 0.9289, which suggests no transform is necessary, although there is a possibility that a square root transform could be applied since the lambda is less than 1.0. Plotting the transform with lambda of 0.9289 shows that the y-axis scale has been adjusted downward, albeit not by a square root method.
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco,lambda)) +
labs(y = "",
title = paste0("'Transformed' Australian tobacco production with lambda = ",round(lambda,4)))
Economy class passengers between Melbourne and Sydney from ansett
Economy passenger volumes between Melbourne (MEL) and Sydney (SYD) on Ansett airlines raw data looks like this, note the scale from approx 0 to 33000 passengers.
ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
autoplot(Passengers) +
labs(y = "# Passengers",
title = "Ansett airlines Economy volumes MEL-SYD")
The Box-Cox transformation shows a lambda of 1.9999, which suggests a \(y^2\) transformation is necessary. Plotting the transform with lambda of 1.9999 shows the effect on the y-axis.
lambda <- ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
filter(Class == "Economy", Airports == "MEL-SYD") %>%
autoplot(box_cox(Passengers,lambda)) +
labs(y = "",
title = paste0("Transformed Ansett airlines Economy volumes MEL-SYD with lambda = ",round(lambda,4)))
Pedestrian counts at Southern Cross Station from pedestrian.
Pedestrian counts at Southern Cross Station raw data looks like this, note the scale from approx 0 to 3500 pedestrians.
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(Count) +
labs(y = "# Pedestrians.",
title = "Hourly Pedestrian counts at Southern Cross Station")
The above graph is very hard to interpret with the total dataset “as is” so additional specificity is needed. Adding a field for Day of the week and setting it as an ordered factor Mon through Sun, the graph below is easier to interpret for Souther Cross Station pedestrian traffic by hour of the day (0 = midnight to 2300 = 11PM) faceted for each day of the week. It is possible to see that there are peak periods of the day during the morning and evening commutes during the week, as well as reduced traffic overall on Sat an Sun, with the exception of some outlying revelers Sat night into the wee hours of Sun morning. Note how the traffic volumes literally change the y-axis scales from weekday to weekend.
pedestrian <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
mutate(Day = wday(Date, label = TRUE)) %>%
mutate(Day = factor(Day, levels = levels(Day)[c(2:7, 1)]))
ggplot(pedestrian, aes(group = Time, x = Time, y = Count)) +
geom_boxplot() +
facet_wrap(Day ~ ., nrow = 4, ncol = 2, scales = "free_y", strip.position = "bottom") +
labs(title = "Southern Cross Station pedestrian traffic by hour each day", x = "Hour (2400 clock)")
The Box-Cox transformation shows a lambda of -0.2255 on the hourly by day Southern Cross Station data, which suggests a reciprocal log transformation. Plotting the transform with lambda of -0.2255 shows the variance more evenly distributed throughout each hour of each day.
lambda <- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
mutate(Day = wday(Date, label = TRUE)) %>%
mutate(Day = factor(Day, levels = levels(Day)[c(2:7, 1)])) %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
mutate(Day = wday(Date, label = TRUE)) %>%
mutate(Day = factor(Day, levels = levels(Day)[c(2:7, 1)])) %>%
autoplot(box_cox(Count,lambda)) +
labs(title = paste0("Hourly Pedestrian counts at Southern Cross Station with lambda = ",round(lambda,4))) +
facet_wrap(Day ~ ., nrow = 4, ncol = 2, scales = "free_y", strip.position = "bottom")
aus_production.gas <- tail(aus_production, 5*4) %>% select(Gas)
There are seasonal fluctuations in which Q1 starts low, Q2 ramps up, Q3 peaks and Q4 falls back off in terms of production. There is also an overall year-over-year upward trend showing each year starting higher than the prior year.
gas <- tail(aus_production, 5*4) %>% select(Gas)
autoplot(gas,Gas)
classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components() %>%
autoplot() +
labs(title = "Classical multiplicative decomposition of Australian gas production 2006-2010")
Yes the results support the graphical interpretation of an increasing trend and the shape of the seasonal cycles matches what was inferred from the original graphic.
Saving the first part of the decomposition code above to a variable, it is possible to extract the components of the decomposition from the variable as a tsibble object and then plot the Gas data in blue against the seasonally adjusted trend in red. The seasonal adjustment is similar to but slightly different from the trend in the decomposition above, since the seasonal adjustment is the trend combined with the data in the random component of the decomposition.
gas_dcmp <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative"))
components(gas_dcmp) %>%
as_tsibble() %>%
autoplot(Gas, color = "blue") +
geom_line(aes(y=season_adjust), color = "red")
Adding 300 to the 2007Q1 value and copying down the code to compute the seasonally adjusted data causes both the original data in blue and the seasonally adjusted line in red to spike simultaneously at the instance of the outlier.
gas$Gas[7] <- 300
gas_dcmp <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative"))
components(gas_dcmp) %>%
as_tsibble() %>%
autoplot(Gas, color = "blue") +
geom_line(aes(y=season_adjust), color = "red")
Having the outlier near the end rather than in the middle of the time series does not appear to make much difference. Both the original blue and seasonally adjusted red lines spike in either location.
gas <- tail(aus_production, 5*4) %>% select(Gas) # reset the data
gas$Gas[18] <- 300
gas_dcmp <- gas %>%
model(
classical_decomposition(Gas, type = "multiplicative"))
components(gas_dcmp) %>%
as_tsibble() %>%
autoplot(Gas, color = "blue") +
geom_line(aes(y=season_adjust), color = "red")
This again refers to the aus_retail dataset showing retail turnover from 1982 to 2019. The original series is shown once more below.
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries,Turnover) +
labs(title = "Turnover by Month")
Decomposing using the X-11 method improves upon the classical method above in terms of showing finer detail in the trend, seasonal and irregular components of the decomposition overall. The trend shows more jagged turns. Also there is a scalloped pattern at the top of the seasonal peaks which is typical of seasonality, and the irregularites such as the one seen just after Jan 2000 are localized to the time period in which they occurred, rather than allowing the randomness to bleed into other time periods as the classical decomposition appears to allow.
retail_dcmp <- myseries %>%
model(
x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(retail_dcmp) +
labs(title = "Decomposition of Australian retail turnover using X-11")
The Australian civilian labor force has seen continuous growth from Feb 1978 to Aug 1995 as evidenced by the steadily increasing trend line in the decomposition. The variance in annual seasonal cycles continues to increase year over year as seen in the season_year section, showing slightly lower drops in the bottom of the cycle and higher peaks nearer to the top of each cycle. The remainder of the data indicates random peaks and dips, with a particularly prominent dip in 1991/1992. The scale of the trend matches that of the total value of the labor force data. However, the scale of the seasonality is smaller than the remainder of the data, indicating that seasonality plays a very small role in the overall decomposition.
The recession of 1991/1992 is visible in the estimated components, specifically in the remainder section as noted in the above comments. It is also reflected both in the drop in the total value of the labor data and the pleateau in the trend during the same period.