All libraries needed for the Homework
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## -- Attaching packages -------------------------------------------- fpp3 1.0.0 --
## v tibble 3.2.1 v tsibble 1.1.5
## v dplyr 1.1.2 v tsibbledata 0.4.1
## v tidyr 1.3.0 v feasts 0.3.2
## v lubridate 1.9.2 v fable 0.3.4
## v ggplot2 3.5.1 v fabletools 0.4.2
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tsibble' was built under R version 4.3.3
## Warning: package 'tsibbledata' was built under R version 4.3.3
## Warning: package 'feasts' was built under R version 4.3.3
## Warning: package 'fabletools' was built under R version 4.3.3
## Warning: package 'fable' was built under R version 4.3.3
## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date() masks base::date()
## x dplyr::filter() masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## x tsibble::setdiff() masks base::setdiff()
## x tsibble::union() masks base::union()
library(forecast)
## Warning: package 'forecast' was built under R version 4.3.3
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(tidyverse)
## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v forcats 1.0.0 v readr 2.1.4
## v purrr 1.0.1 v stringr 1.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag() masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(tsibble)
library(pracma)
## Warning: package 'pracma' was built under R version 4.3.3
##
## Attaching package: 'pracma'
##
## The following object is masked from 'package:purrr':
##
## cross
3.1 - Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. How has this changed over time?
# I will take the first 20 countries and calculate their mean cap per GDP over a period of time
first_20_countries <- as_tibble(global_economy) %>%
mutate(GDP_per_cap = GDP/Population) %>%
group_by(Country) %>%
summarize(mean_GDP_per_cap = mean(GDP_per_cap)) %>%
arrange(desc(mean_GDP_per_cap)) %>%
head(20)
#I will now plot the GDP per capita over time of the first 20 countries
as_tibble(global_economy) %>%
filter(Country %in% first_20_countries[[1]]) %>%
mutate(GDP_per_cap = GDP/Population) %>%
group_by(Country) %>%
mutate(mean_GDP_per_cap = mean(GDP_per_cap)) %>%
ungroup() %>%
ggplot(aes(x=Year, y=GDP_per_cap, color=Country)) +
geom_line() +
labs(title = "GDP Per Capita For First 20 Countries",
x= "Year",
y = "GDP Per Capita")
a).Which country has the highest GDP per capita?
a).Netherlands had the highest GDP per capita.
b).How has this changed over time?
b).There was no country with a clear lead, however, since 1985 Netherlands started pulling away from other countries to take a significant lead as the country with the highest GDP per capita. It still maintained its lead into the 2010’s.
3.2 - For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
a). United States GDP from global_economy. b). Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. c). Victorian Electricity Demand from vic_elec. d). Gas production from aus_production.
a).United States GDP from global_economy
#filter for US and calculate GDP
global_economy %>% filter(Country == "United States") %>% autoplot(GDP)
#graph the original data
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP/Population) +
labs(title= "GDP per capita", y = "$US")
#transforming data to cube root and graphing it
global_economy %>%
filter(Country == "United States") %>%
autoplot( nthroot(GDP,3)) + ggtitle("Cube Root")
Plotting the original data, I saw that it took the shape of an upward parabola. After performing the cube root transformation, I observed that the parabola flattened somewhat, becoming a bit more linear. However, the change was not very significant.
b). Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock
#filter and graph the original data for aus_livestock
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
autoplot(Count)
#filter and transform the data, breaking it up into quarters and graphing it
aus_livestock %>%
filter(State == "Victoria", Animal == "Bulls, bullocks and steers") %>%
mutate(Quarter = yearquarter(Month)) %>%
index_by(Quarter) %>%
summarise(Count = sum(Count)) %>%
autoplot(Count)
Transforming this data did not really have a big effect. There were no further patterns that I have seen after I did the transformation.
c). Victorian Electricity Demand from vic_elec.
#graphing the original data
vic_elec %>% autoplot(Demand)
#transforming the time series from every half hour to every month for better interpretability
vic_elec %>%
mutate(month = yearmonth(Time)) %>%
index_by(month) %>%
summarise(Demand = sum(Demand)) %>%
autoplot(Demand)
Originally, the time series showed half-hourly demand for electricity. The time period was to frequent and everything was too close together so it was hard to locate a pattern. I then transformed the time period into months. The time series became more interpertable. I noticed that for every year after the month of January demand was on the rise and peaked at about half-way (June) before trending back down.
d). Gas production from aus_production.
#graphing the original data
aus_production %>% autoplot(Gas)
#transforming the time series using log transformation
aus_production %>%
autoplot(log(Electricity))
Originally, from 1960 to about 1970 the consumption of gas did not show
much of a consistent trend. However, after performing the log
transformation, we can clearly see an upward trend.
3.3 - Why is a Box-Cox transformation unhelpful for the canadian_gas data?
#graphing the original data
canadian_gas %>% autoplot (Volume)
#transforming the time series using Box-Cox transformation
lambda<- canadian_gas %>%
features(Volume,features = guerrero) %>%
pull(lambda_guerrero)
canadian_gas %>%
autoplot(box_cox(Volume,lambda))
I believe that performing the Box-Cox transformation is ineffective. There is no change in variance after the transformation. Furthermore, it still remains that there is higher variance between Jan 1980 and Jan 1990 than between other periods. In conclusion, the Box-Cox transformation does not give us any new information than the original one.
3.4 - What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
#plotting original data from the textbook
set.seed(5989)
book_series <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
book_series %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = Turnover`
#retrieving lambda from "guerrero" feature to identify which Box-Cox transformation is best
lambda <- book_series %>%
features(Turnover,features = guerrero) %>%
pull(lambda_guerrero)
#plotting with the lambda 0.4348939 of the Box-Cox transformation
book_series%>%
autoplot(box_cox(Turnover, lambda))
We can see that with lambda 0.434 variation is a good variation in
removing variation after performing the Box-Cox transformation
3.5 - For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
Tobacco from aus_production
#plotting original data
aus_production %>%
autoplot(Tobacco)
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
#Retrieving a lambda value for Box-Cox transformation
lambda_aus <- aus_production %>%
features(Tobacco,features = guerrero) %>%
pull(lambda_guerrero)
#plotting the transformed Box-Cox transformation with lambda:0.92
aus_production %>%
autoplot(box_cox(Tobacco,lambda_aus))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
It looks like lambda of 0.92 did not really stabalize or reduced the
variance.
Ansett - Economy Class Passengers
#plotting original data
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(Passengers)
#Retrieving a lambda value for Box-Cox transformation
lambda_ansett <- ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
features(Passengers,features = guerrero) %>%
pull(lambda_guerrero)
lambda_ansett
## [1] 1.999927
#plotting the transformed Box-Cox transformation with lambda:1.99
ansett %>%
filter(Airports == "MEL-SYD" & Class == "Economy") %>%
autoplot(box_cox(Passengers,lambda_ansett))
It looks like lambda of 1.99 did not really stabalize or reduced the
variance.
Pedestrian
#plotting original data
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(Count)
#Retrieving a lambda value for Box-Cox transformation
lambda_pedestrian<- pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
features(Count,features = guerrero) %>%
pull(lambda_guerrero)
lambda_pedestrian
## [1] -0.2501616
#plotting the transformed Box-Cox transformation with lambda:-0.25
pedestrian %>%
filter(Sensor == "Southern Cross Station") %>%
autoplot(box_cox(Count,lambda_pedestrian))
It looks like lambda of -0.25 variance is reduced but it is difficult to
interpert.
3.7 - Consider the last five years of the Gas data from aus_production.
a).Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle? b).Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices. c).Do the results support the graphical interpretation from part a? d).Compute and plot the seasonally adjusted data. e).Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier? f).Does it make any difference if the outlier is near the end rather than in the middle of the time series?
a).Plot the time series. Can you identify seasonal fluctuations and/or a trend-cycle?
#retrieving gas data for the last 5 years
gas_df<- tail(aus_production, 5*4) %>% select(Gas)
#plotting the gas time series for 5 years
gas_df %>%
autoplot() + ggtitle("Gas Production")
## Plot variable not specified, automatically selected `.vars = Gas`
The obvious pattern that is seen from this plot is that during the
middle of the year of this 5-year plot there is a great incline and at
the beginning of the following year a great decline.
b).Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.
cd_df <- gas_df %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
cd_df %>% autoplot() +
labs(title = "Decomposition of Australia Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
The plots above further verify the annual seasonality.
c).Do the results support the graphical interpretation from part a?
The results do in fact support the graphical interpretation from part a, more concretely that there is a small positive trend during the middle of the year and decline specifically in the beginning of the following year attributing to its seasonality.
d).Compute and plot the seasonally adjusted data.
as_tsibble(cd_df) %>%
autoplot(season_adjust) +
labs(title = "Adjusted Data")
As seen by the plot above, the adjustment allows for a more clear view
of the trend. Before the adjustment we saw more of the seaonality
component sharp up trend in the middle of the year and sharp decline at
the end of every year,where as now these fluctuations were removed and
an obvious trend appears.
e).Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas_df %>%
mutate(Gas = if_else(Quarter==yearquarter("2007Q2"), Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust)
The outlier created a sharp upward spike in 2007 Q2. Looking at the
plot, we can see that the trend looks different than before. Thus,we can
conclude that the spike significantly impacted the overall trend. In
conclusion, the outlier totally misled on the actual trend.
f).Does it make any difference if the outlier is near the end rather than in the middle of the time series?
gas_df %>%
mutate(Gas = if_else(Quarter==yearquarter("2010Q4"), Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(season_adjust)
We can see that the more toward the end of the time series we place the
outlier the bigger impact (the more it skews the initial trend). We see
that adding the outlier to Q2 made the one big spike where as adding it
to the end mislead even more by creating a more visible upward
trend.
3.8 - Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
t_series <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_decomposition <- t_series %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_decomposition) +
labs(title = "X11 Retail Decomposition.")
The decomposition does reveal certain outliers or unusual behavior particular for the time periods of beginning of 1990’s, mid 1990’s, early 2000’s and mid 2010’s. This might have been due to economic factors. The trend still remains to be positive and there is still apparent seasonality.
3.9 - Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
a). Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.
a). After decomposition the results show positive trends of each graph for each particular month. There is also seasonality as shown in the graph of the previous problem. I do notice a downward trend, in the month of August in 1995. This might point to higher unemployment rates.
b). Is the recession of 1991/1992 visible in the estimated components?
b).The recession of 1991/1992 is visible in estimated components because we can see that this is the point from which all of the downward trends started.