Homework 2
Do exercises 3.1, 3.2, 3.3, 3.4, 3.5, 3.7, 3.8 and 3.9 from the online Hyndman book. Please include your Rpubs link along with.pdf file of your run code
{r}# install.packages("seasonal")
library("fpp3")
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.3 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library("ggplot2")
library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.0.2 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library("seasonal")
##
## Attaching package: 'seasonal'
##
## The following object is masked from 'package:tibble':
##
## view
Question 3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?
data("global_economy")
global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # ℹ 15,140 more rows
global_economy |>
autoplot(GDP/Population, show.legend = FALSE) +
labs(title= "GDP per capita for each country overtime", y = "GDP per captia in $US")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
Overtime most countries GDP per captia is increasing Now to find the country with the highest GDP
global_economy |>
mutate(GDP_per_captia = GDP/Population) %>% #create colum for gpd per capita
filter(GDP_per_captia == max(GDP_per_captia, na.rm = TRUE)) %>% # look for highest gpd in the country set
select(Country,GDP_per_captia) # look for contry with max gdp
## # A tsibble: 1 x 3 [1Y]
## # Key: Country [1]
## Country GDP_per_captia Year
## <fct> <dbl> <dbl>
## 1 Monaco 185153. 2014
Monaco is contry with the highest GDP per captia, and it highest was in the year 2014
global_economy |>
filter(Country == "Monaco") |> # filter to get country Monaco
autoplot(GDP/Population) + #time series function for finding trend, cycle, and seasonalty in data
labs(title= "GDP per capita", y = "GDP Per Capita in $US")
## Warning: Removed 11 rows containing missing values or values outside the scale range
## (`geom_line()`).
Overall the graph is increasing overtime and does have some fluctuation over time, which we can derive form economic factors such as tourism, population growth, currency risk, industrial development and etc.
Question 3.2
For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.
United States GDP from global_economy.
global_economy |>
filter(Country == "United States") |> # filter to get country US
autoplot(GDP/Population) + #time series function for finding trend, cycle, and seasonality in data
labs(title= "United State GDP per capita Over Time", y = "GDP Per Capita in $US")
This graph visualizes the GDP per capita for the United States from 1960 to 2017. The growth of the GDP per capita can be contributed to the constant expansion of technological innovation, government investment, foriegn investment, and population growth. Advances in software developments and AI has helped drive companies to be efficient and productive. Investment from government increase business expansion and global trade help export technology and energy across the world. However, there is dip in constant increase due to the 2008 recession, which had impact of employment and production throughout the economy.
This graph does not need a transformation, as there is not enough variation within the data and the constant trend of increase overtime lacks any variation through the years.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
data("aus_livestock")
aus_livestock
## # A tsibble: 29,364 x 4 [1M]
## # Key: Animal, State [54]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
## 7 1977 Jan Bulls, bullocks and steers Australian Capital Territory 1800
## 8 1977 Feb Bulls, bullocks and steers Australian Capital Territory 1900
## 9 1977 Mar Bulls, bullocks and steers Australian Capital Territory 2700
## 10 1977 Apr Bulls, bullocks and steers Australian Capital Territory 2300
## # ℹ 29,354 more rows
help("aus_livestock")
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>% # look for bulls and bullocks in animal colum
autoplot(Count) + #time series function for number of animals killed
labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers",
y = "Animal Count")
This data shows the monthly meat production of Bulls, bullocks and steers in Victoria, Australia from July 1972 to December 2018. The data overall has been decreasing overtime, which fluctuations prevalent throughout. These variations make it difficult to observe the data. Using a transformation will make the data more even throughout each fluctuations become more orderly. We can simplify the variations by using a box cox transformation.
aus_livestock = aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers" & State == "Victoria") %>% # look for bulls and bullocks in animal colum
mutate(Count = as.numeric(Count)) # colum for meat proudction as numeric value
lambda = aus_livestock %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
aus_livestock %>%
autoplot(box_cox(Count,lambda)) + labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers After Box Cox Transformation",
y = "Animal Count")
This lambda has negative value, meaning the there are extreme values whitin in the data points which have extreme volatility
we can use the moving avergae to transform and make it easier to understand
aus_livestock<- aus_livestock |>
mutate(
`5-MA` = slider::slide_dbl(Count, mean,
.before = 2, .after = 2, .complete = TRUE)
)
aus_livestock |>
autoplot(Count) +
geom_line(aes(y = `5-MA`), colour = "red") +
labs(title = "Slaughter of Victorian Bulls, Bullocks and Steers After Box Cox Transformation",
y = "Animal Count")
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
Victorian Electricity Demand from vic_elec.
data("vic_elec")
vic_elec
## # A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
## 7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
## 8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
## 9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
## 10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
## # ℹ 52,598 more rows
help("vic_elec")
vic_elec %>%
autoplot(Demand) + labs(title = "Half-hourly electricity demand for Victoria, Australia" , y = "Demand in MWh")
This data visualizes the hourly demand for electricity in Victoria Australia in Mwh. This data is difficult to observe due to many fluctuations through the data which is influenced from the seasonal component and white noise throughout the data. Because the demand is measured hourly, we can focus our observation on monthly or weekly scale to interpret the data. We can transform this data using adjustment, we will adjust the hourly demand into a weekly demand to observe the data in clear visual
vic_elec = vic_elec %>%
index_by(Date = yearweek(Time)) %>% # turns time into week
summarise(Avg_Elec_Demand = mean(Demand)) # finds the avergae of the demand
vic_elec %>%
autoplot(Avg_Elec_Demand) + labs(title = "Weekly electricity demand for Victoria, Australia" , y = "Demand in MWh", x = "Weeks")
Gas production from aus_production.
data("aus_production")
aus_production
## # A tsibble: 218 x 7 [1Q]
## Quarter Beer Tobacco Bricks Cement Electricity Gas
## <qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1956 Q1 284 5225 189 465 3923 5
## 2 1956 Q2 213 5178 204 532 4436 6
## 3 1956 Q3 227 5297 208 561 4806 7
## 4 1956 Q4 308 5681 197 570 4418 6
## 5 1957 Q1 262 5577 187 529 4339 5
## 6 1957 Q2 228 5651 214 604 4811 7
## 7 1957 Q3 236 5317 227 603 5259 7
## 8 1957 Q4 320 6152 222 582 4735 6
## 9 1958 Q1 272 5758 199 554 4608 5
## 10 1958 Q2 233 5641 229 620 5196 7
## # ℹ 208 more rows
help("aus_production")
aus_production %>%
autoplot(Gas) %>%
labs(title = "Quarterly production of selected commodities in Australia", y = "Gas in Petajoules")
## [[1]]
##
## $y
## [1] "Gas in Petajoules"
##
## $title
## [1] "Quarterly production of selected commodities in Australia"
##
## attr(,"class")
## [1] "labels"
This data visualizes the quarterly gas production in petajoules in Australia. Overall, there is an increase in data throughout the graph, with fluctuation throughout that can be transformed. This transformation will allows to change the variations throughout the data, and make it even throughout. We can use the Guerrero feature to choose the lambda that will fit our box cox function.
lambda1 <- aus_production |>
features(Gas, features = guerrero) |> # fuction to find lambda
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda1)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda1,2))))
Question 3.3
Why is a Box-Cox transformation unhelpful for the canadian_gas data?
data("canadian_gas")
canadian_gas
## # A tsibble: 542 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
## 7 1960 Jul 0.966
## 8 1960 Aug 0.977
## 9 1960 Sep 1.03
## 10 1960 Oct 1.25
## # ℹ 532 more rows
help("canadian_gas")
canadian_gas%>%
autoplot(Volume) %>%
labs(title = "Quarterly production of selected commodities in Australia")
## [[1]]
##
## $title
## [1] "Quarterly production of selected commodities in Australia"
##
## attr(,"class")
## [1] "labels"
lambda2 <- canadian_gas |>
features(Volume, features = guerrero) |> # fuction to find lambda
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda2)) + labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed gas production with $\\lambda$ = ",
round(lambda2,2))))
The Box cox transformation is not helpful in this data, because the variance is stable enough to be observed and trend looks the same after the transformation. Because of the strong seasonality and trend component. The main issue of variance is not address in this data set using the box cox transformation. In order to address this issue a seasonality decomposition would be allows us to observe the seasonality characteristics throughout the gas production.
Question 3.4 What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
autoplot(Turnover) %>%
labs(title = "Retail Data Turnover")
## [[1]]
##
## $title
## [1] "Retail Data Turnover"
##
## attr(,"class")
## [1] "labels"
lambda3 <- myseries |>
features(Turnover, features = guerrero) |> # fuction to find lambda
pull(lambda_guerrero)
myseries |>
autoplot(box_cox(Turnover, lambda3)) + labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Retail Data Transformation with $\\lambda$ = ",
round(lambda3,2))))
After using the Guerrero feature, we can create a box transformation
transformation that will calculate a lambda of 0.8 for our graph.
Question 3.5
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. Tobacco from aus_production, Economy class passengers between Melbourne and Sydney from ansett, and Pedestrian counts at Southern Cross Station from pedestrian.
Tobacco from aus_production
help("aus_production")
aus_production %>%
autoplot(Tobacco) %>%
labs(title = "Quarterly production of Tobacco in Australia")
## [[1]]
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
##
## $title
## [1] "Quarterly production of Tobacco in Australia"
##
## attr(,"class")
## [1] "labels"
lambda4 <- aus_production |>
features(Tobacco, features = guerrero) |> # fuction to find lambda
pull(lambda_guerrero)
aus_production %>%
autoplot(box_cox(Tobacco, lambda4)) + labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tabocco production with $\\lambda$ = ",
round(lambda4,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
Melbourne and Sydney from ansett
data("ansett")
ansett
## # A tsibble: 7,407 x 4 [1W]
## # Key: Airports, Class [30]
## Week Airports Class Passengers
## <week> <chr> <chr> <dbl>
## 1 1989 W28 ADL-PER Business 193
## 2 1989 W29 ADL-PER Business 254
## 3 1989 W30 ADL-PER Business 185
## 4 1989 W31 ADL-PER Business 254
## 5 1989 W32 ADL-PER Business 191
## 6 1989 W33 ADL-PER Business 136
## 7 1989 W34 ADL-PER Business 0
## 8 1989 W35 ADL-PER Business 0
## 9 1989 W36 ADL-PER Business 0
## 10 1989 W37 ADL-PER Business 0
## # ℹ 7,397 more rows
help("ansett")
ansett= ansett %>%
filter ( Class == "Economy" & Airports == "MEL-SYD")
ansett %>%
autoplot(Passengers) +
labs(title = "Economy class passengers between Melbourne and Sydney")
lambda5 = ansett %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
ansett %>%
autoplot(box_cox(Passengers, lambda5)) +
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Tabocco production with $\\lambda$ = ",
round(lambda5,2)))) +ylab("Passengers")
Pedestrian counts at Southern Cross Station from pedestrian
data("pedestrian")
pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [4]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
## 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44
## 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56
## 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166
## # ℹ 66,027 more rows
help("pedestrian")
pedestrian %>%
filter( Sensor == "Southern Cross Station") %>%
autoplot(Count) +
labs(title = "Pedestrian counts in the city of Melbourne
")
need to index into weekly data
ped_week = pedestrian %>%
mutate(Weekly = yearweek(Date)) %>%
index_by(Weekly) %>%
summarise(Count = sum(Count, na.rm = TRUE))
ped_week %>%
autoplot(Count) + labs(title = "Pedestrian counts in the city of Melbourne")
lambda6 <- ped_week %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
ped_week %>%
autoplot(box_cox(Count,lambda6)) + labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Pedestrian counts with $\\lambda$ = ",
round(lambda6,2))))
Question 3.7
Consider the last five years of the Gas data from aus_production.
help("aus_production")
gas <- tail(aus_production, 5*4) |> select(Gas)
gas %>%
autoplot(Gas) + labs(title = "Gas Production in Australia" , y = "Gas in Petajoules")
There are some clear seasonal fluctuations repeating every year in a
pattern. Gas production will begun at its lowest production in quarter 1
of each year and increasing gradualy throughout quarter 3 and hit its
peak in quarter 3 and the decreasing of gas production occurs in quarter
4. This cycle repeats every year throughout the data.
gas_decomp = gas %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
gas_decomp %>%
autoplot() +xlab("Quarter") + ggtitle("Classical Multiplicative. Decompostion of Gas Production")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Comparing a our graph to the classical decomposition graph, shows similarities. The decomposition will expose the seasonality, trend, and remainder variables, which we can explore separately. All three variables show an increasing trend, just the original time series trend in A.
gas_tsibble = as_tsibble(gas_decomp)
gas_tsibble %>%
autoplot(season_adjust) + labs(title = "Seasonal Adjusted Data for Gas Production", y = "Gas Production in Petajoules")
gas2 = gas %>%
mutate( Gas = if_else(Quarter ==yearquarter("2009Q3"), Gas + 300, Gas)) %>%
model(
classical_decomposition(Gas, type = "multiplicative")
) %>%
components()
gas2tssible = as_tsibble(gas2)
gas2tssible%>%
autoplot(season_adjust) + labs(title = "Seasonal Adjusted Data for Gas Production in Quarter 3 of 2009 When 300 + ", y = "Gas Production in Petajoules")
It does make a difference when an outlier is added near the end rather than the middle of the time series, because it sticks out and it develops peak towards the end of the data. it make the data visually skewed, when 300 more petajoules of gas is added to quarter 3 in 2009. The outlier at the end make it difficult to understand any trend or pattern that will occur in the data, because the huge peak created has impact on how the data is interpreted.
Question 3.8 Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title =
"Decomposition of total US retail employment using X-11.")
X-11 model can allows us to observe the robust outlier drawn thorughout
the graph. Some unusual patterns are shown to occur in in the begining
and throughout of the 1990s and this pattern is observed again in the
end of 2000s. These fluctuations could be result of economic downsides
like recession from dot.com bubble in the 90s and housing crash in 2008.
These outlier highlight irregular trends in the employment data.
Question 3.9
Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.
In figure 3.19, we can observe a steady increase in the number of person in the civilian labor force from 1978 to 1995. There is noticeable in the beginning of the 1990s, due to some economic recessions. When we observe figure 3.20, there some strong seasonal fluctuation that occur in the Australian civilian labor force. The labor force hits its peak in two months March and December, due to hiring programs for holidays. In the remainder component the sharp decline in 1991 visualizes the impact of the 1991/1992 recession.
Is the recession of 1991/1992 visible in the estimated components?
The recession is clearly visible in the remainder component. This recession caused a decline the civilian labour force, and is detailed to have strong impacted the upwards trends of labour force growth.