?global_economy
## No documentation for 'global_economy' in specified packages and libraries:
## you could try '??global_economy'
#install the libraries needed
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
global_economy %>%
autoplot(GDP / Population, show.legend = FALSE) +
labs(title= "GDP per capita", y = "GDP per Capita")
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
global_economy %>%
mutate(gdp_capita = GDP / Population) %>%
filter(gdp_capita == max(gdp_capita, na.rm = TRUE)) %>%
select(Country, gdp_capita)
## # A tsibble: 1 x 3 [1Y]
## # Key: Country [1]
## Country gdp_capita Year
## <fct> <dbl> <dbl>
## 1 Monaco 185153. 2014
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP) +
labs(title = "US GDP")
- In this plot, the y axis values for the GDP values are in scientific
notation since the values are so large, so looking at the amount of
numbers in each number, I think transforming the GDP to trillions would
help the graph be more understanding for the GDP values
global_economy %>%
filter(Country == "United States") %>%
autoplot(GDP / 10 ^ 12) +
labs(title= "US GDP", y = "GDP (in trillions)")
aus_livestock %>%
filter(Animal == "Bulls, bullocks and steers", State == "Victoria") %>%
autoplot(Count) +
labs(title = "Slaughter Count of Victorian Bulls, Bullocks and Steers")
vic_elec %>%
autoplot(Demand) +
labs(title = "Victorian Electricity Demand")
- Taking a look at this graph, it is harder to see the data as it is
clusters half hourly, so there does need to be some transformation done
to see a better and more clear trend. I decided to go with a monthly
basis to see a clearer trend of electricity demand on a monthly
basis
elec_demand <- vic_elec %>%
group_by(Date) %>%
mutate(Demand = sum(Demand)) %>%
distinct(Date, Demand)
elec_demand %>%
mutate(Date = yearmonth(Date)) %>%
group_by(Date) %>%
summarise(Demand = sum(Demand)) %>%
as_tsibble(index = Date) %>%
autoplot(Demand) +
labs(title= "Monthly Victorian Electricity Demand")
- first I wanted to grpup the data by the date variable and then it sums
up the demand values for each of those date grouped values and after
calculating the sum for the demand it removes duplicated rows so each
date onlt appears once with the total demand for each date - then to
transform the data further, i transformed the date variable to a year
month format since i want a monthly scale for the visual and then
grouped the data by the monthly date. Then it sums the daily demand
values for each of the months to get the monthly demand value and then
we convert the data into a tsibble with the date as the time index - by
doing this we can see that the summer months tend to have the highest
demand for electricity
aus_production %>%
autoplot(Gas) +
labs(title = "Gas Production")
?canadian_gas
## starting httpd help server ... done
canadian_gas
## # A tsibble: 542 x 2 [1M]
## Month Volume
## <mth> <dbl>
## 1 1960 Jan 1.43
## 2 1960 Feb 1.31
## 3 1960 Mar 1.40
## 4 1960 Apr 1.17
## 5 1960 May 1.12
## 6 1960 Jun 1.01
## 7 1960 Jul 0.966
## 8 1960 Aug 0.977
## 9 1960 Sep 1.03
## 10 1960 Oct 1.25
## # ℹ 532 more rows
canadian_gas %>%
autoplot(Volume) +
labs(title = "Gas Production - Raw Data")
- to do the box cox transformation i decided to use the Guerreros method
to calculate the most optimal value of lamda that minimizes the
coefficient of variation of the new transformed data
lambda_val <- canadian_gas %>%
features(Volume, features = guerrero) %>%
pull(lambda_guerrero)
lambda_val
## [1] 0.5767648
lambda_retail <- aus_retail %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda_retail
## [1] 0.505437358 0.455334875 -0.001548504 -0.142211494 0.239724139
## [6] -0.026472613 0.302765950 -0.347859628 -0.079800145 0.488891084
## [11] 0.170177523 0.008546483 0.258373687 -0.278913616 0.085422753
## [16] 0.216735747 -0.186680060 0.121172855 0.349057788 0.358373811
## [21] 0.194644303 0.097852391 0.014794694 -0.029183970 0.219320743
## [26] 0.015030705 0.087866199 -0.088668882 0.233295271 0.794736203
## [31] 0.409200236 -0.034414504 -0.233638227 -0.402606263 0.048725122
## [36] 0.170462269 -0.271677387 0.088105269 0.157152361 0.002144737
## [41] 0.265927881 0.224713740 0.284977808 0.083036312 0.127694817
## [46] -0.062802562 -0.162564723 0.346569384 0.434893851 0.338398250
## [51] 0.260980929 -0.026468501 0.066365207 -0.071665181 -0.022160487
## [56] 0.137501308 0.142266795 -0.009186879 -0.078532522 0.121434869
## [61] 0.022846394 0.076369281 -0.216330466 0.221481127 0.480753465
## [66] 0.231063552 0.132569588 0.043833072 -0.202826332 0.111909872
## [71] 0.205315650 0.285948322 0.113040418 0.209030359 0.197041795
## [76] 0.441087014 0.371315460 -0.102361315 -0.154344946 0.035435040
## [81] -0.022985108 0.084733215 -0.187787216 0.142978026 0.466262913
## [86] 0.166374070 0.016104851 -0.360022768 -0.639897411 0.083802504
## [91] 0.097715830 -0.342658577 0.149896211 0.155555534 0.189590792
## [96] 0.325699726 0.319622078 0.001893373 -0.093951032 0.155626888
## [101] 0.090231492 -0.376863610 0.159984903 0.444917891 0.287535970
## [106] 0.704385146 0.128944957 -0.532982237 -0.899926773 -0.037340959
## [111] 0.222986858 0.330815717 0.173409612 0.176110328 0.030210246
## [116] 0.004465470 0.172748557 -0.033916722 0.050680165 -0.078768077
## [121] 0.083552351 0.554245382 0.215164121 -0.014119612 -0.353784988
## [126] -0.265806646 0.034427160 0.116131283 -0.354627576 0.134290108
## [131] 0.153561415 0.088721960 0.312135155 0.271087477 0.078757767
## [136] 0.002190358 0.081555041 -0.033484940 0.117911536 -0.069445135
## [141] 0.031003706 0.464098968 0.203250692 0.037661525 -0.053178552
## [146] -0.210144461 0.143890246 0.223362268 -0.371484018 0.140920582
## [151] 0.205193769 0.129945930
summary(lambda_retail)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.89993 -0.02715 0.09778 0.08354 0.21738 0.79474
?aus_production
autoplot(aus_production, Tobacco)+
labs(title = "Tobacco Production")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
lambda <- aus_production %>%
features(Tobacco, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.9264636
aus_production %>%
autoplot(box_cox(Tobacco, lambda)) +
labs(title = "Tobacco Production with lambda = 0.93",
round(lambda,2))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
?ansett
ansett
## # A tsibble: 7,407 x 4 [1W]
## # Key: Airports, Class [30]
## Week Airports Class Passengers
## <week> <chr> <chr> <dbl>
## 1 1989 W28 ADL-PER Business 193
## 2 1989 W29 ADL-PER Business 254
## 3 1989 W30 ADL-PER Business 185
## 4 1989 W31 ADL-PER Business 254
## 5 1989 W32 ADL-PER Business 191
## 6 1989 W33 ADL-PER Business 136
## 7 1989 W34 ADL-PER Business 0
## 8 1989 W35 ADL-PER Business 0
## 9 1989 W36 ADL-PER Business 0
## 10 1989 W37 ADL-PER Business 0
## # ℹ 7,397 more rows
filtered_airports <- ansett %>%
filter(Class == "Economy",
Airports == "MEL-SYD")
filtered_airports
## # A tsibble: 282 x 4 [1W]
## # Key: Airports, Class [1]
## Week Airports Class Passengers
## <week> <chr> <chr> <dbl>
## 1 1987 W26 MEL-SYD Economy 20167
## 2 1987 W27 MEL-SYD Economy 20161
## 3 1987 W28 MEL-SYD Economy 19993
## 4 1987 W29 MEL-SYD Economy 20986
## 5 1987 W30 MEL-SYD Economy 20497
## 6 1987 W31 MEL-SYD Economy 20770
## 7 1987 W32 MEL-SYD Economy 21111
## 8 1987 W33 MEL-SYD Economy 20675
## 9 1987 W34 MEL-SYD Economy 22092
## 10 1987 W35 MEL-SYD Economy 20772
## # ℹ 272 more rows
autoplot(filtered_airports, Passengers)+
labs(title = "Economy class Passengers Between Melbourne and Sydney")
lambda <- filtered_airports %>%
features(Passengers, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 1.999927
filtered_airports %>%
autoplot(box_cox(Passengers, lambda)) +
labs(title = "Transformed # of Passengers Btwn Mel-Syd with Lambda = 2")
?pedestrian
pedestrian
## # A tsibble: 66,037 x 5 [1h] <Australia/Melbourne>
## # Key: Sensor [4]
## Sensor Date_Time Date Time Count
## <chr> <dttm> <date> <int> <int>
## 1 Birrarung Marr 2015-01-01 00:00:00 2015-01-01 0 1630
## 2 Birrarung Marr 2015-01-01 01:00:00 2015-01-01 1 826
## 3 Birrarung Marr 2015-01-01 02:00:00 2015-01-01 2 567
## 4 Birrarung Marr 2015-01-01 03:00:00 2015-01-01 3 264
## 5 Birrarung Marr 2015-01-01 04:00:00 2015-01-01 4 139
## 6 Birrarung Marr 2015-01-01 05:00:00 2015-01-01 5 77
## 7 Birrarung Marr 2015-01-01 06:00:00 2015-01-01 6 44
## 8 Birrarung Marr 2015-01-01 07:00:00 2015-01-01 7 56
## 9 Birrarung Marr 2015-01-01 08:00:00 2015-01-01 8 113
## 10 Birrarung Marr 2015-01-01 09:00:00 2015-01-01 9 166
## # ℹ 66,027 more rows
southern_cross_ped <- pedestrian %>%
filter(Sensor == "Southern Cross Station")
autoplot(southern_cross_ped, Count)+
labs(title = "Pedestrian Counts at Southern Cross Station")
lambda <- southern_cross_ped %>%
features(Count, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] -0.2501616
southern_cross_ped %>%
autoplot(box_cox(Count, lambda)) +
labs(title = "Pedestrian Counts with Lambda = -0.23")
- the graphs here, since they are following a the date_time on an hour
scale, looks too compact together to get any real feel of what is going
on. I would transform this data to either a daily or weekly time
gas <- tail(aus_production, 5*4) |> select(Gas)
gas
## # A tsibble: 20 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 221 2005 Q3
## 2 180 2005 Q4
## 3 171 2006 Q1
## 4 224 2006 Q2
## 5 233 2006 Q3
## 6 192 2006 Q4
## 7 187 2007 Q1
## 8 234 2007 Q2
## 9 245 2007 Q3
## 10 205 2007 Q4
## 11 194 2008 Q1
## 12 229 2008 Q2
## 13 249 2008 Q3
## 14 203 2008 Q4
## 15 196 2009 Q1
## 16 238 2009 Q2
## 17 252 2009 Q3
## 18 210 2009 Q4
## 19 205 2010 Q1
## 20 236 2010 Q2
autoplot(gas, Gas)
- There seems to be an increasing trend over the course of one year. It
happens to peak the highest for the quater during the midding of the
year and then drops, and then peaks again
gas_decomp <- gas %>%
model(classical_decomposition(Gas, type = "multiplicative"))
components(gas_decomp) %>%
autoplot() +
labs(title = "Classical Decomposition of Gas Production - Multiplicative")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
components(gas_decomp) %>%
as_tsibble() %>%
autoplot(Gas, color = "blue") +
geom_line(aes(y=season_adjust), color = "green") +
labs(title = "Adjusted Gas Production - Seasonally")
gas %>%
mutate(Gas = ifelse(Gas == 224, Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(Gas, color = "blue") +
geom_line(aes(y=season_adjust), color = "green") +
labs(title = "Adjusted Gas Production with an Outlier")
- In 2006 Quarter 2 I added 300 to the original gas value which is 224,
but now it has shot up to 524. This is a very strong increase where the
actual raw data and the adjusted data increased.
gas %>%
mutate(Gas = ifelse(Gas == 236, Gas + 300, Gas)) %>%
model(classical_decomposition(Gas, type = "multiplicative")) %>%
components() %>%
as_tsibble() %>%
autoplot(Gas, color = "blue") +
geom_line(aes(y=season_adjust), color = "green") +
labs(title = "Adjusted Gas Production with an Outlier Towards the End")
- By adding the outlier towards the end of the data rather than the
middle, it doesnt seem to make any noticeable different opposed to when
we had the outlier in the middle
library(seasonal)
##
## Attaching package: 'seasonal'
## The following object is masked from 'package:tibble':
##
## view
set.seed(8675309)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
x11_dcmp <- myseries %>%
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) %>%
components()
autoplot(x11_dcmp) +
labs(title = "Decomposition of Retail Turnover - X-11.")
- Compared to the other models, the X-11 decomposition has fewer curved
lines and becomes more cut. This method seems to be able to showcase
more noise and outliers that we were not able to see originally