The country with the highest GDP per capita is Monaco. This has remained true for some time, with the country havign the highest by far up until what seems to be around 2008. After 2008, the country had a sharp downturn where the second place contender - Liechtenstein - was very close behind, if not at the same level at times.
# ?global_economy
## Too many subplots switching to ggplot
global_economy$gdp_per_cap <- as.numeric(global_economy$GDP)/as.numeric(global_economy$Population)
global_economy
## # A tsibble: 15,150 x 10 [1Y]
## # Key: Country [263]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
## 7 Afghanistan AFG 1966 1399999967. NA NA 18.6 8.57 10152331
## 8 Afghanistan AFG 1967 1673333418. NA NA 14.2 6.77 10372630
## 9 Afghanistan AFG 1968 1373333367. NA NA 15.2 8.90 10604346
## 10 Afghanistan AFG 1969 1408888922. NA NA 15.0 10.1 10854428
## # ℹ 15,140 more rows
## # ℹ 1 more variable: gdp_per_cap <dbl>
## Need to set the legend to none b/c of the amount of countries.
ggplot(global_economy, aes(x = Year, y = gdp_per_cap, color = Country)) +
geom_line()+
theme(legend.position='none')
## Warning: Removed 3242 rows containing missing values or values outside the scale range
## (`geom_line()`).
### After viewing the chart pulling the distinct countries with any value higher than 100,000 to replot neater.
limited_countries <- global_economy |> filter(gdp_per_cap>=100000) |> select('Country')
limited_countries <- unique(limited_countries$Country)
## Filtering the df for new viz
lim_global_economy <- global_economy |> filter(Country %in% limited_countries)
# New limited Viz
ggplot(lim_global_economy, aes(x = Year, y = gdp_per_cap, color = as.factor(Country))) +
geom_line() +
labs(title = "GDP Per Capita Over Time", y = "GDP Per Capita", x = "Year")
## Warning: Removed 22 rows containing missing values or values outside the scale range
## (`geom_line()`).
a- United States GDP from global_economy. b- Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock. c- Victorian Electricity Demand from vic_elec. d- Gas production from aus_production.
## United States GDP from global_economy
#limiting for the United States
us_econ<- global_economy |> filter(Country =='United States')
autoplot(us_econ, gdp_per_cap) +
labs(title = "US GDP Per Capita Over Time", x = "Year", y = "GDP Per Capita")
#### b- Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock.
## Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
#Checking the data. The Month is already formatted into proper data type
head(aus_livestock)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
# Limited to what we want to show
lim_aus_livestock <- aus_livestock |>
filter(Animal=="Bulls, bullocks and steers")|>
summarise(Count = sum(Count))
## Plotting the new limited tsibble
autoplot(lim_aus_livestock, Count) +
labs(title = "Slaughter of Victorian Bulls, bullocks and steers", x = "Year", y = "Count of Slaughter")
## Still pretty noisey going to aggregate up to qtr.
qtr_aus_livestock <- lim_aus_livestock |>
index_by(Quarter = yearquarter(Month)) |>
summarize(Count = sum(Count))
## Plotting the qtr tsibble. Much smoother, and is more readable.
autoplot(qtr_aus_livestock, Count) +
labs(title = "Slaughter of Victorian Bulls, bullocks and steers", x = "Year", y = "Count of Slaughter")
### Cehcking the Decomp. Trend Results just ot see.
## The data seems fairly flat, as in the over all trend is subtle with smaller seasonal and cyclical-type variations. Going to use an STL transformation to take a look at the year to year trend with 5 qutrs. Making the trend window 5 in order to get just about a year, but maintain the derivative moving avg with a "middle" point.
qtr_aus_livestock |>
model(
STL(Count ~ trend(window = 5) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
## After the above transformation and decomp, the chart shows that the trend is fairly flat with some larger variations mainly at the beginning of the timeline. The seasonal chart seems like a pretty consistent seasonal pattern.
##Victorian Electricity Demand from vic_elec
#Taking a look; The time is 30 Min Increments.
head(vic_elec)
## # A tsibble: 6 x 5 [30m] <Australia/Melbourne>
## Time Demand Temperature Date Holiday
## <dttm> <dbl> <dbl> <date> <lgl>
## 1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
## 2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
## 3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
## 4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
## 5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
## 6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
#initial Plot of the data
autoplot(vic_elec, Demand) +
labs(title = "Victoria Electricity Demand Over Time", x = "Year", y = "Demand")
### Going to try an STL Transformation again here. The pattern in the data is pretty consistent with respect to the annual cyclical nature of the data. The overall trend of the data seems to be fairly linear, by this I mean the overall demand doesnt seem to be increasing or decreasing outside of the cycles. Using STL, the main goal is to apply lesser weights to the outliers seen in 2014 and 2013.
## The Tsibble is pretty granular. Im going to aggregate a bit to get less subplots in the STL transformation charts.
day_agg_vic_elec <-vic_elec |>
index_by(Date = as_date(Date)) |>
summarize(Demand = mean(Demand))
# When averaging to the day level it seems to bee a bit clearer of a cycle trend.
autoplot(day_agg_vic_elec, Demand) +
labs(title = "Victoria Electricity Demand Over Time", x = "Year", y = "Demand")
### Cehcking the Decomp. Trend Results just ot see.
### Plotting with STL. Window is set to 31 days for roughly a month period.
day_agg_vic_elec |>
model(
STL(Demand ~ trend(window = (31)) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
### The overall trend, with a few excetions has remained flat from begining to end. There were variations, and what seems to be seasonal patters for each year in the data. powerusage is generally higher in the southern hemisphere's summer months.
# Gas production from aus_production.
# ?aus_production
## Plot shows the data definitely needs a transformation. The trend is increasing overall, or at least from the 1970s to the most recent data. The seasonal varations also seem to be getting more extreme as we move from left to right.
autoplot(aus_production, Gas)+
labs(title = "Gas Production Over Time", x = "Year", y = "Gas Production in Petajoules")
### using the BoxCox Method from the text book to find the ideal lambda transformation in order to transform.
lambda <- aus_production |>
features(Gas, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Gas, lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Australian Gas Produciton in Petajoules $\\lambda$ = ",
round(lambda,2))))
## This transformation smoothed out both the seasonal variations to make them more consistent and predictable, while also blunting the drastic increase in the linear model. The lamda found was .11, which is larger than a log but less than a Square Root Transformation.
After the transformation, the chart looks extremely similar to the initial chart. The Box Cox isnt too helpful here, the lambda value is clost to .6, which is over .5 and closer to one. The closer to 1 that the lambda value gets the more minimal the transformation. Additionally, with the exception of the middle piece of the data where the seasonal variance seems to get more exaggerated, the begining and the ends of the data seem to have a fairly stable variance.
# ?canadian_gas #Production Volume in billions of cubic metres
autoplot(canadian_gas, Volume)+
labs(title = "Gas Volume Over Time", x = "Month", y = "Production Volume in billions of cubic metres")
## Applying BoxCox
lambda <- canadian_gas |>
features(Volume, features = guerrero) |>
pull(lambda_guerrero)
canadian_gas |>
autoplot(box_cox(Volume, lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Canadian Gas Production Volume with $\\lambda$ = ",
round(lambda,2))))
The lambda value found for the this BoxCox transformation was .19, which is a little higher than a log transformation at 0. This lambda value leveled out the variance in the seasonality of the data by a lot.
### Pulling in tsibble from #7 in 2.10:
# ?aus_retail #Retail turnover in $Million AUD
### Transforming, so I can just deal with Avg Turnover regardless of Geog. or Retail Type
aus_agg <- aus_retail |>
summarize(Turnover = mean(Turnover))
#Taking a look at the data. Trend is in
autoplot(aus_agg,Turnover)+
labs(title = "Turnover Over Time", x = "Month", y = "Retail turnover in $Million AUD")
## Applying BoxCox
lambda <- aus_agg |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
aus_agg |>
autoplot(box_cox(Turnover, lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Austrailian Retail Turnover(\\$AUD) $\\lambda$ = ",
round(lambda,2))))
The BOXCOX Transformation yeilded next to no transformation, as the lambda value is .93. A value of 1 would be no transformation. This seems to be such a high value because the variance is already fairly stable.
# Tobacco from aus_production
# ?aus_production
#Taking a look at the data. Trend is in
autoplot(aus_production,Tobacco)+
labs(title = "Tobacco Production Over Time", x = "Quarter", y = "Tobacco and cigarette production in tonnes")
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Applying BoxCox
lambda <- aus_production |>
features(Tobacco, features = guerrero) |>
pull(lambda_guerrero)
aus_production |>
autoplot(box_cox(Tobacco, lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Austrailian Tobacco Production $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 24 rows containing missing values or values outside the scale range
## (`geom_line()`).
The Boxcox returned a Lambda value of 2 means that the data was transformed via an exponential transformation in order to stablize the variance.
#Economy class passengers between Melbourne and Sydney from ansett
## Prepping the Tsibble for the plotting and transformations.
limited_econ <- ansett |>
filter(Class=="Economy",
Airports == "MEL-SYD") |>
summarize(Passengers = sum(Passengers))
#Taking a look at the data. Trend is in
autoplot(limited_econ,Passengers)+
labs(title = "Economy Class Flights Over Time", x = "Week", y = "Number of Passengers")
## Applying BoxCox
lambda <- limited_econ |>
features(Passengers, features = guerrero) |>
pull(lambda_guerrero)
#Plot
limited_econ |>
autoplot(box_cox(Passengers, lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Economy Class Flights Over Time $\\lambda$ = ",
round(lambda,2))))
After aggregating up to the weekly level for counts of pedestrians, the BoxCox lambda transformation that works best for this data set is .11, which is a little more than a log transformation of the data.
# Pedestrian counts at Southern Cross Station from pedestrian
#First attempt at limiting
limited_pedestrian <- pedestrian |>
filter(Sensor == "Southern Cross Station")|>
summarise(Count = sum(Count))
#Taking a look at the data. Trend is in
autoplot(limited_pedestrian,Count)+
labs(title = "Hourly Pedestrian Counts Over Time", x = "Hour", y = "Number of Pedestrians")
## Too crowded, aggregating up to Month
limited_pedestrian <- pedestrian |>
filter(Sensor == "Southern Cross Station")|>
index_by(Date= as_date(Date)) |>
summarise(Count = sum(Count))
autoplot(limited_pedestrian,Count)+
labs(title = "Pedestrian Counts Over Time", x = "Date", y = "Number of Pedestrians")
## Still too crowded, aggregating up to Week
limited_pedestrian <- pedestrian |>
filter(Sensor == "Southern Cross Station")|>
index_by(Week= yearweek(Date)) |>
summarise(Count = sum(Count))
autoplot(limited_pedestrian,Count)+
labs(title = "Pedestrian Counts Over Time", x = "Week", y = "Number of Pedestrians")
## Applying BoxCox
lambda <- limited_pedestrian |>
features(Count, features = guerrero) |>
pull(lambda_guerrero)
limited_pedestrian |>
autoplot(box_cox(Count, lambda))+
labs(y = "",
title = latex2exp::TeX(paste0(
"Transformed Economy Class Flights Over Time $\\lambda$ = ",
round(lambda,2))))
gas <- tail(aus_production, 5*4) |> select(Gas)
The seasonal fluctuations in this timeseries show that the Gas production in Australia from 2005 through 2010 increases and peaks at the end of every Q2. The production then decreases from there to the end of the year. The seasonal patter continues year after year. The longer term trend within the data is a slow and steady increase year after year. In other words, the peaks are slightly higher with each consecutive year, just as the valleys are also relatively higher each year.
autoplot(gas, Gas)+
labs(title = "Quarterly Gas Production Australia",
x="Quarter",
y="Gas Produced")
As said before the trend is a slow and steady increase over time. Similarly, the seasonal trend, now parsed out of the initial time series, increases to peak towards the end of Q2 and then declines until the beginning of Q1 in the following year.
gas |>
model(
classical_decomposition(Gas, type = "multiplicative")
) |>
components() |>
autoplot() +
labs(title = "Classical Mult. decomposition of Gas Production Australia")
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).
Yes, as i’ve outlined above both of the parsed out trend and seasonality charts align which my initial interpretation of the trends and seasonal patterns
### Doing an STL Decomposition
gas |>
model(
STL(Gas ~ trend(window = (7)) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
The impact of the one edited outlier shifts the main chart as well as the remainder chart. This makes sense, as the trend plot along with the seasonal adjustment plot should be unaffected by this outlier. The remainder, which is where the seasonality and the trend are removed from the chart shoudl be the only chart showing the outlier.
#Edited Gas1
altered_g <-gas
## Selecting Where im updating the number
altered_g |> filter(Quarter == yearquarter('2008 Q2'))
## # A tsibble: 1 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 229 2008 Q2
altered_g <- altered_g |> mutate(Gas = ifelse(Quarter == yearquarter("2008 Q2"), Gas+300, Gas))
## Checking
altered_g |> filter(Quarter == yearquarter('2008 Q2'))
## # A tsibble: 1 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 529 2008 Q2
#plotting the new altered
altered_g |>
model(
STL(Gas ~ trend(window = (7)) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
In the plot above i placed the outlier in the middle of the time series. This time the edited outlier was placed towards the end of the time series. Overall, when the edited outlier is at the end of the timeseries, as expected the main edited plot has shows the outlier, as does the remainder plot. However, i dont see any other drastic changes in the data.
#Edited Gas2
altered_g2 <-gas
## Selecting Where im updating the number
altered_g2 |> filter(Quarter == yearquarter('2010 Q1'))
## # A tsibble: 1 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 205 2010 Q1
altered_g2 <- altered_g2 |> mutate(Gas = ifelse(Quarter == yearquarter("2010 Q1"), Gas+300, Gas))
## Cehcking
altered_g2 |> filter(Quarter == yearquarter('2010 Q1'))
## # A tsibble: 1 x 2 [1Q]
## Gas Quarter
## <dbl> <qtr>
## 1 505 2010 Q1
#plotting the new altered
altered_g2 |>
model(
STL(Gas ~ trend(window = (7)) +
season(window = "periodic"),
robust = TRUE)) |>
components() |>
autoplot()
In the remainder subplot there is an outlier that i did not see before, but see now with the X11 decomposition. There is a super high turnover value in the year 2000.
## Agg code from before
aus_agg <- aus_retail |>
summarize(Turnover = mean(Turnover))
# Plotting Original
autoplot(aus_agg,Turnover)+
labs(title = "Turnover Over Time", x = "Month", y = "Retail turnover in $Million AUD")
## X11 Decomp
aus_retail_x11 <- aus_agg |>
model(x11 = X_13ARIMA_SEATS(Turnover ~ x11())) |>
components()
# Plotting X11
autoplot(aus_retail_x11,Turnover)+
labs(title = "Turnover Over Time", x = "Month", y = "Retail turnover in $Million AUD")
#### Figure 3.20:
Seasonal component from the decomposition shown in the previous figure.
Firstly, overall the the main plot of the civilian labor force of Australia has been trending upward for the years shown, which is 1978 through 1995. It’s safe to say the Australian Civilian labor force grew during that time. However,I did not control for population changes, so these numbers are absolute and not relative to the larger population. In addition to that there is a decrease in the number of civilian employment numbers around 1991/1992. It can be seen most drastically in the remainder subplot, particularly towards the end of 1991.Lastly, the plots show that December and March are the two portions of the year that the seasonal patterns dictate that the workforce is the largest. January and August are the months that have consisistently had the lowest employment numbers.
The recession of 1991/1992 is most definitely visible. Particularly it is most visible in the August portion of the Seasonal Component chart and in the remainder portion of the Decomposition plot towards the latter portion of 1991. These areas have the lowest drops in the civilian labor force numbers on the whole chart.