Homework 2

Excercise 3.1

Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time?

3.1 How data looks like

kable(head(global_economy))

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population
Afghanistan	AFG	1960	537777811	NA	NA	7.024793	4.132233	8996351
Afghanistan	AFG	1961	548888896	NA	NA	8.097166	4.453443	9166764
Afghanistan	AFG	1962	546666678	NA	NA	9.349593	4.878051	9345868
Afghanistan	AFG	1963	751111191	NA	NA	16.863910	9.171601	9533954
Afghanistan	AFG	1964	800000044	NA	NA	18.055555	8.888893	9731361
Afghanistan	AFG	1965	1006666638	NA	NA	21.412803	11.258279	9938414

kable(tail(global_economy))

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population
Zimbabwe	ZWE	2012	17114849900	16.6654288	107.3283	48.99928	25.16325	14710826
Zimbabwe	ZWE	2013	19091020000	1.9894928	109.0795	36.66874	21.98776	15054506
Zimbabwe	ZWE	2014	19495519600	2.3769293	108.8472	33.74147	20.93015	15411675
Zimbabwe	ZWE	2015	19963120600	1.7798727	106.2245	37.58864	19.16018	15777451
Zimbabwe	ZWE	2016	20548678100	0.7558693	104.5606	31.27549	19.94353	16150362
Zimbabwe	ZWE	2017	22040902300	4.7040354	105.5118	30.37027	19.65802	16529904

3.1 Autoplot all GDP per capita

global_economy %>%
  #R studio takes the GDP and pop from the global_economy dataset and divides them 
  autoplot(GDP/Population,show.legend = FALSE) +
  labs(x = "Year", y = "GDP Per Capita")

3.1 Getting the GDP per capita

global_economy <- global_economy %>%
  mutate(GDP_per_Capita = GDP / Population)

3.1 Highest GDP per capita in data set

Monaco in 2014 has the highest GDP per capita, followed by Liechtenstein in 2014.

global_economy_sorted <- global_economy %>%
  arrange(desc(GDP_per_Capita))

kable(head(global_economy_sorted))

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population	GDP_per_Capita
Monaco	MCO	2014	7060236168	7.1796368	NA	NA	NA	38132	185152.5
Monaco	MCO	2008	6476490406	0.7318007	NA	NA	NA	35853	180640.1
Liechtenstein	LIE	2014	6657170923	NA	NA	NA	NA	37127	179308.1
Liechtenstein	LIE	2013	6391735894	NA	NA	NA	NA	36834	173528.2
Monaco	MCO	2013	6553372278	9.5707988	NA	NA	NA	37971	172588.9
Monaco	MCO	2016	6468252212	3.2138488	NA	NA	NA	38499	168010.9

3.1 GDP over time

The data suggests that Monaco, Liechtenstein, and Luxembourg were primarily the top GDP per capita in 2012-2016.

highest_gdp_2012 <- global_economy_sorted %>%
  filter(Year == 2012) %>%          
  arrange(desc(GDP_per_Capita))  %>%           
  slice(1:5)                        

highest_gdp_2013 <- global_economy_sorted %>%
  filter(Year == 2013) %>%           
  arrange(desc(GDP_per_Capita))  %>%           
  slice(1:5)                          

highest_gdp_2014 <- global_economy_sorted %>%
  filter(Year == 2014) %>%          
  arrange(desc(GDP_per_Capita))  %>%           
  slice(1:5)                          

highest_gdp_2015 <- global_economy_sorted %>%
  filter(Year == 2015) %>%           
  arrange(desc(GDP_per_Capita))  %>%           
  slice(1:5)                        

highest_gdp_2016 <- global_economy_sorted %>%
  filter(Year == 2016) %>%          
  arrange(desc(GDP_per_Capita))  %>%           
  slice(1:5)

kable(highest_gdp_2012)

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population	GDP_per_Capita
Monaco	MCO	2012	5743029680	0.9849603	NA	NA	NA	37783	152000.36
Liechtenstein	LIE	2012	5456009385	NA	NA	NA	NA	36545	149295.65
Luxembourg	LUX	2012	56677961787	-0.3525194	106.1643	155.41771	186.44431	530946	106749.01
Norway	NOR	2012	510229136227	2.7216268	101.9908	27.54178	40.57434	5018573	101668.17
Qatar	QAT	2012	186833516484	4.6872592	103.4801	29.27387	76.47241	2109568	88564.82

kable(highest_gdp_2013)

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population	GDP_per_Capita
Liechtenstein	LIE	2013	6391735894	NA	NA	NA	NA	36834	173528.15
Monaco	MCO	2013	6553372278	9.570799	NA	NA	NA	37971	172588.88
Luxembourg	LUX	2013	61739352212	3.654370	108.0053	158.61336	190.62858	543360	113625.13
Norway	NOR	2013	523502127660	1.044389	104.1535	28.35444	39.13876	5079623	103059.25
Macao SAR, China	MAC	2013	51552075902	11.200190	118.4497	30.40292	90.63795	575841	89524.84

kable(highest_gdp_2014)

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population	GDP_per_Capita
Monaco	MCO	2014	7060236168	7.179637	NA	NA	NA	38132	185152.53
Liechtenstein	LIE	2014	6657170923	NA	NA	NA	NA	37127	179308.08
Luxembourg	LUX	2014	66327344189	5.771916	108.6841	174.06108	208.23038	556319	119225.38
Norway	NOR	2014	499338534779	1.975117	106.2800	29.78328	38.78287	5137232	97199.92
Macao SAR, China	MAC	2014	55347998648	-1.201113	125.6103	31.62667	84.93632	588781	94004.39

kable(highest_gdp_2015)

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population	GDP_per_Capita
Liechtenstein	LIE	2015	6268391521	NA	NA	NA	NA	37403	167590.61
Monaco	MCO	2015	6258178995	4.942330	NA	NA	NA	38307	163369.07
Luxembourg	LUX	2015	57784495265	2.861675	109.20011	187.46843	222.70321	569604	101446.79
Switzerland	CHE	2015	679289166858	1.226384	98.17176	50.62988	62.13828	8282396	82016.02
Isle of Man	IMN	2015	6792417112	-0.900000	NA	NA	NA	83167	81672.02

kable(highest_gdp_2016)

Country	Code	Year	GDP	Growth	CPI	Imports	Exports	Population	GDP_per_Capita
Monaco	MCO	2016	6468252212	3.213849	NA	NA	NA	38499	168010.91
Liechtenstein	LIE	2016	6214633651	NA	NA	NA	NA	37666	164993.19
Luxembourg	LUX	2016	58631324559	3.082643	109.5177	186.1633	221.26778	582014	100738.68
Switzerland	CHE	2016	668745279605	1.375884	97.7451	54.5889	65.81131	8373338	79866.03
Isle of Man	IMN	2016	6592627599	7.400000	NA	NA	NA	83737	78730.16

Excercise 3.2

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect.

United States GDP from global_economy.
Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.
Victorian Electricity Demand from vic_elec.
Gas production from aus_production.

3.2 United States GDP from global_economy

United_States_GDP <- global_economy %>%
  filter(Country == 'United States') %>% 
  arrange(desc(GDP_per_Capita)) 


plot <- autoplot(United_States_GDP) +
  ggtitle("United States GDP") +            
  xlab("Year") +        
  ylab("GDP") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14)  
  )

print(plot)

3.2 Slaughter of Victorian “Bulls, bullocks and steers” in aus_livestock.

bulls_bullocks_steers <- aus_livestock %>%
  filter(Animal == 'Bulls, bullocks and steers', State == "Victoria") 

plot <- autoplot(bulls_bullocks_steers) +
  ggtitle("Bulls, Bullocks,and Steers in Victoria") +            
  xlab("Year") +        
  ylab("Count") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14), 
    legend.title = element_text(size = 10),     # Size of legend title
    legend.text = element_text(size = 8),       # Size of legend text
    legend.key.size = unit(0.5, "cm")           # Size of legend keys
  )

print(plot)

3.2 Victorian Electricity Demand from vic_elec

vic_elec_monthly <- vic_elec %>%
  mutate(month = floor_date(Time, "month")) %>%  
  group_by(month) %>%                           
  summarize(Demand = sum(Demand, na.rm = TRUE)) 


view(vic_elec_monthly)

plot <- autoplot(vic_elec_monthly) +
  ggtitle("Victorian Electricity Demand") +            
  xlab("Year") +        
  ylab("Count") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14), 
    legend.position = "none"  # Remove the legend
  )

print(plot)

3.2 Gas production from aus_production

# Extract gas production data
gas_production <- aus_production %>%
  select(Quarter, Gas)

plot <- autoplot(gas_production) +
  ggtitle("Gas Prodiction") +            
  xlab("Quarter") +        
  ylab("Count") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14), 
    legend.position = "none"  # Remove the legend
  )

print(plot)

Excercise 3.3

Why is a Box-Cox transformation unhelpful for the canadian_gas data?

head(canadian_gas)

## # A tsibble: 6 x 2 [1M]
##      Month Volume
##      <mth>  <dbl>
## 1 1960 Jan   1.43
## 2 1960 Feb   1.31
## 3 1960 Mar   1.40
## 4 1960 Apr   1.17
## 5 1960 May   1.12
## 6 1960 Jun   1.01

plot <- autoplot(canadian_gas) +
  ggtitle("Gas Prodiction") +            
  xlab("Month") +        
  ylab("Volume") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14), 
    legend.position = "none"  # Remove the legend
  )

print(plot)

#?features()
?box_cox

lambda_gas <- canadian_gas %>%
  features(Volume,features = guerrero) %>%
  pull(lambda_guerrero)
  
lambda_gas

## [1] 0.5767648

#base_plot <- autoplot(box_cox(canadian_gas$Volume, lambda_gas))

#plot <- base_plot +
#  ggtitle("Gas Production") + 
##  xlab("Month") +
#  ylab("Volume") +
#  theme_minimal() +
#  theme(
#    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
#    axis.title.x = element_text(face = "bold", size = 14), 
#    axis.title.y = element_text(face = "bold", size = 14), 
#    legend.position = "none"  # Remove the legend
#  )

#print(plot)

The analysis can be unhelpful because the data from this period almost look identical, particularly between 1970 and 1980. During this time, there is a noticeable positive trend accompanied by a significant and swollen event, which can make it difficult to distinguish between different underlying patterns or trends.

Excercise 3.4

What Box-Cox transformation would you select for your retail data (from Exercise 7 in Section 2.10)?

To effectively use the Box-Cox transformation, start by thoroughly understanding the nature of your data, including its trends, seasonality, and variance. Estimate the optimal lambda (\(\lambda\)) value using methods such as Guerrero’s, and then apply the Box-Cox transformation with this lambda to stabilize variance and improve normality. Finally, visualize the transformed data to assess the effectiveness of the transformation. It is essential to validate the lambda value and ensure that the transformation meets the needs of your data in practice.

Excercise 3.5

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance. - - Tobacco from aus_production, - Economy class passengers between Melbourne and Sydney from ansett - Pedestrian counts at Southern Cross Station from pedestrian.

3.5 Tobacco

tobacco_ts <- ts(aus_production$Tobacco, frequency = 12)

lambda <- BoxCox.lambda(tobacco_ts)

transformed_tobacco <- BoxCox(tobacco_ts, lambda)

plot <- autoplot(transformed_tobacco) +
  ggtitle("Quarterly Tobacco Prodiction") +            
  xlab("Year and Quarters") +        
  ylab("Tobacco") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14), 
    legend.position = "none"  # Remove the legend
  )

print(plot)

3.5 Economy Class Passengers

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(Passengers) + labs(title = "Passengers on Ansett Flights")

# Come back and center title. Bold, and increase size of titles.

lambda1 <- ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  features(Passengers,features = guerrero) %>%
  pull(lambda_guerrero)

ansett %>%
  filter(Airports == "MEL-SYD" & Class == "Economy") %>%
  autoplot(box_cox(Passengers,lambda1)) + labs(title = "BoxCox of Passengers")

Excercise 3.7

Consider the last five years of the Gas data from aus_production.

gas <- tail(aus_production, 5*4) |> select(Gas)

kable(gas)

Gas	Quarter
221	2005 Q3
180	2005 Q4
171	2006 Q1
224	2006 Q2
233	2006 Q3
192	2006 Q4
187	2007 Q1
234	2007 Q2
245	2007 Q3
205	2007 Q4
194	2008 Q1
229	2008 Q2
249	2008 Q3
203	2008 Q4
196	2009 Q1
238	2009 Q2
252	2009 Q3
210	2009 Q4
205	2010 Q1
236	2010 Q2

3.7 A. Plot the time series.

plot <- autoplot(gas) +
  ggtitle("5 years of Gas production") +            
  xlab("Quarters") +        
  ylab("Gas Production") +        
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5), 
    axis.title.x = element_text(face = "bold", size = 14), 
    axis.title.y = element_text(face = "bold", size = 14), 
    legend.position = "none"  # Remove the legend
  )

print(plot)

3.7 A. Can you identify seasonal fluctuations and/or a trend-cycle?

Yes definitly we can spot the highs for the middle of the year and the lows for at the begging. Positive trend starting the year the negative trend toward the end of the year.

3.7 B. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices.

?classical_decomposition()

decompostion_gas <- gas %>%
  model(
  classical_decomposition(Gas,type = "multiplicative")) %>%
  components() 


decompostion_gas %>%  
  autoplot() + labs(title = "Classical Decomposition of Gas")

3.7 C. Do the results support the graphical interpretation from part a?

Yes the results seem to follow the original data.

3.7 D. Compute and plot the seasonally adjusted data.

decompostion_gas %>%
  ggplot(aes(x = Quarter)) + 
  geom_line(aes(y = Gas,color = "Data")) +
  geom_line(aes(y = season_adjust,color = "Seasonally Adjusted")) +
  geom_line(aes(y = trend,color = "Trend")) + 
  labs(title = "Quarterly production of Gas") +
   scale_colour_manual(
    values = c("#999999", "green", "red"),
    breaks = c("Data", "Seasonally Adjusted", "Trend")
  )

3.7 E. Change one observation to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?

gas_addition <- gas
gas_addition$Gas[1] <- gas_addition$Gas[1] + 300
kable(gas_addition)

Gas	Quarter
521	2005 Q3
180	2005 Q4
171	2006 Q1
224	2006 Q2
233	2006 Q3
192	2006 Q4
187	2007 Q1
234	2007 Q2
245	2007 Q3
205	2007 Q4
194	2008 Q1
229	2008 Q2
249	2008 Q3
203	2008 Q4
196	2009 Q1
238	2009 Q2
252	2009 Q3
210	2009 Q4
205	2010 Q1
236	2010 Q2

Adding 300 to a single observation in the data significantly skews its normal distribution. Most of the added values do not reach the peak of 300; they only go up to around 250. Therefore, this addition makes a substantial difference in the data.

3.7 F. Does it make any difference if the outlier is near the end rather than in the middle of the time series?

In both cases, it would be incorrect, but it would cause a significant increase in the positive trend shown on the chart. Initially, it would create a positive trend at the beginning, but by the end, it would result in a negative trend.

Excercise 3.8

Recall your retail time series data (from Exercise 7 in Section 2.10). Decompose the series using X-11. Does it reveal any outliers, or unusual features that you had not noticed previously?

?x11
aus_production

## # A tsibble: 218 x 7 [1Q]
##    Quarter  Beer Tobacco Bricks Cement Electricity   Gas
##      <qtr> <dbl>   <dbl>  <dbl>  <dbl>       <dbl> <dbl>
##  1 1956 Q1   284    5225    189    465        3923     5
##  2 1956 Q2   213    5178    204    532        4436     6
##  3 1956 Q3   227    5297    208    561        4806     7
##  4 1956 Q4   308    5681    197    570        4418     6
##  5 1957 Q1   262    5577    187    529        4339     5
##  6 1957 Q2   228    5651    214    604        4811     7
##  7 1957 Q3   236    5317    227    603        5259     7
##  8 1957 Q4   320    6152    222    582        4735     6
##  9 1958 Q1   272    5758    199    554        4608     5
## 10 1958 Q2   233    5641    229    620        5196     7
## # ℹ 208 more rows

myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot() + labs(title = "Trade turnover")

myseries %>%
  gg_season() +labs(title = "Trade turnover")

Excercise 3.9

Figures 3.19 and 3.20 show the result of decomposing the number of persons in the civilian labour force in Australia each month from February 1978 to August 1995.

3.9 A. Write about 3–5 sentences describing the results of the decomposition. Pay particular attention to the scales of the graphs in making your interpretation.

The line graph of the values shows an increase from the 1980s to the mid-1990s. There appears to be a small disturbance around 1992, where the value line dips slightly. After this point, the value clearly drops below a count of 400. The seasonal pattern remains consistent with no notable deviations. The trend line indicates a positive overall trend.

3.9 B. Is the recession of 1991/1992 visible in the estimated components?

Yes, the recession is clearly visible in the graph. The small disturbance is evident during the 1991/1992 period.