Assignment #2

# DO NOT FORGET TO CALL THESE 3 packages FIRST
library(fpp3)       # forecasting package
library(tidyverse)  # graphs and tidy
library(readxl)     # reading excel data

PART A:

A1: Get tourism data as given below from the link. Then convert the data into tsibble format by using tsibble functions. Save it as my_tourism

# download the data from the web page.
tourism_file <- tempfile(fileext = ".xlsx")
download.file("http://OTexts.com/fpp3/extrafiles/tourism.xlsx", tourism_file, mode = "wb")
# reads the downloaded tourism data by excel
## Your turn: create a tsibble format of the data below, and rename it as my_tourism:

# A1.Answer:
my_tourism <- readxl::read_excel(tourism_file) %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(
    index = Quarter,
    key = c(Region, State, Purpose)
  )


my_tourism

## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State           Purpose  Trips
##      <qtr> <chr>    <chr>           <chr>    <dbl>
##  1 1998 Q1 Adelaide South Australia Business  135.
##  2 1998 Q2 Adelaide South Australia Business  110.
##  3 1998 Q3 Adelaide South Australia Business  166.
##  4 1998 Q4 Adelaide South Australia Business  127.
##  5 1999 Q1 Adelaide South Australia Business  137.
##  6 1999 Q2 Adelaide South Australia Business  200.
##  7 1999 Q3 Adelaide South Australia Business  169.
##  8 1999 Q4 Adelaide South Australia Business  134.
##  9 2000 Q1 Adelaide South Australia Business  154.
## 10 2000 Q2 Adelaide South Australia Business  169.
## # ℹ 24,310 more rows

A2. To analyze the data, first view my_tourism set, and see the dates of the data.

1. Is it annual or quarterly data? Which year does it start from?

2. Use table(State,Purpose) command to see the cross-table. How many States are there in the data? How many Purpose of trips category?

3. Group the data by Region and Purpose by using group() command.To eliminate time effect, use tibble format. (i.e as_tibble() %>% group_by(Region, Purpose))

4. After grouping the data in #3, use summarize() function to get the average of Trips for each combination, and assign it as Trips( i.e., summarise(Trips = mean(Trips)).

5. Find which region has Purpose had the maximum number of overnight trips on average. ungroup the data to find the result( i.e ungroup() %>% filter(Trips == max(Trips)))

# A2.Answer:
my_tourism %>%
  as_tibble()%>%
  group_by(Region,Purpose)%>%
  summarise(Trips = mean(Trips)) %>%
  ungroup()%>%
  filter(Trips==max(Trips))

## # A tibble: 1 × 3
##   Region Purpose  Trips
##   <chr>  <chr>    <dbl>
## 1 Sydney Visiting  747.

A3.Question: Create a new tsibble which combines the Purposes and Regions,and just has total trips by State. (i.e. group_by(State) %>%summarise(Trips = sum(Trips)) ). Then ungroup() the data. Save it as state_tourism.

# A3.Answer:
state_tourism<-my_tourism %>%
  group_by(State) %>% 
  summarise(Trips=sum(Trips))%>%
  ungroup()

PART B:

B1.Question: Create plots of the following time series. Analyze it visually. What do you see? comment on the below in Answer.

library(fpp3)
# Bricks from aus_production 
data("aus_production")
aus_production %>%
  autoplot(Bricks)+
  labs(title = "Quaterly Brick Production in Australia",
       x="Year",
       y="Millionsof Bricks")

#suggests a growing demand in the construction industry by displaying seasonal patterns and an overall upward trend.

# Lynx from pelt 
pelt%>%
  autoplot(Lynx)

#indicates that ecological or environmental variables may be influencing lynx populations over time by exhibiting distinct cyclical swings
# Close from gafa_stock 
gafa_stock%>%
  autoplot(Close)

#exhibits obvious ups and downs that signal investor reactions and market shifts, reflecting the dynamic character of the stock market.


# Demand from vic_elec
vic_elec%>% autoplot(Demand)

#illustrates how seasonal variations in electricity demand can cause variations in consumption patterns throughout the year.

# B1.Answer: 
#Every plot highlights the significance of comprehending time series data in diverse industries by telling a tale of fluctuation and change.

PART C:

C1. Question: Use my_tourism data you created as tsibble in A1. Filter Region for Snowy_Mountains and save it as snowy.

snowy <- tourism %>%
filter(Region == "Snowy Mountains")

C2. Question: Use autoplot(), gg_season() and gg_subseries() to explore the snowy data. What do you observe? What type of pattern do you see. Write your comment on Answer below:

Question: Take snowy data. Then sums up all trips in State and Purpose by each quarter every year by using summarizer() commands. Then Use autoplot(), gg_season() and gg_subseries() to explore the quarterly trips of snowy data. What do you observe? What type of pattern do you see. Write your comment on Answer below:

# C2.Answer: 
snowy %>%
  autoplot(Trips)

#shows patterns, seasonality, and outlines by giving a broad overview of trip counts over time
snowy%>%
  gg_season(Trips)

#aids in the identification of annual seasonal patterns by emphasizing the peaks and troughs of travel.
snowy%>%
  gg_subseries(Trips)

#facilitates the comparison of seasonal trends throughout various time periods by breaking down seasonal patterns by year.

#Auto-plot(), gg_season(), and gg_sub series() work together to make it easier to analyse quarterly trip data in-depth. The aforementioned graphs provide significant insights into the dynamics of the data set across time by revealing recurrent seasonal trends, variations, and patterns that influence travel behavior and demand throughout different quarters and years.

PART D:

D1.Question: Use these two functions 1) gg_lag 2) ACF to explore the following time series: i)Bricks from aus_production ii)Lynx from pelt iii) Victorian Electricity Demand from aus_elec. Write your comments about each graphs you created

# D1.Answer:
data("aus_production")
aus_production %>%
  gg_lag(Bricks,lags=9)+
  labs(title = "lag plot for Brick Production ")

aus_production%>%
  ACF(Bricks)%>%
  autoplot()+
  labs(title= "ACF plot for Brick Production")

  # In conclusion, studying the Bricks time series' lag plot and ACF plot helps to understand production cycles and possible causes impacting production trends by revealing temporal dependencies and recurrent patterns in brick manufacturing.

D2.Question: after using these functions, Can you spot any seasonality, cyclicity and trend? What do you learn about the series? Write your comments below for each series:

i) Bricks

# D2.Answer:
#1) Examine the ACF plot for recurring trends or peaks that appear on a regular basis.
#2) Examine the ACF plot for consistent variations across longer timeframes.
#3) Assess the lag plot to see if it generally shows an upward or declining trend.

ii) Lynx from pelt:

# D2.Answer:
#1) Examine the ACF map for cyclical trends in the lynx populations.
#2) Search for recurring peaks that show lynx population cycles.
#3)Evaluate the general pattern of lynx populations during the course of the lag plot.

iii) Electricty

# D2.Answer: 
#1) Use the ACF plot to identify seasonal variations in the demand for power.
#2) Examine the electrical demand for any recurrent trends or cycles.
#3) Evaluate the lag plot's long-term trend of electricity demand.

PART E: See the data below for Google Stock price from the gafa_stock data set.

E1: Calculate the first difference of the “goog” series.

# E1. Answer:
# dgoog = goog %>% # get google daily data(>2018)
library(tsibble)
library(dplyr)
dgoog <- gafa_stock %>%
  filter(Symbol == "GOOG", year(Date) >= 2018)%>%
  mutate(trading_day = row_number()) %>% #missing dates, create rownumber()-trading days!
  update_tsibble(index = trading_day, regular = TRUE) %>% 
   mutate(diff = difference(Close))#calculates the first difference of a series with difference() command. it calculates the daily changes in the stock price.

E2. Question: Does “diff” (difference of the series) look like white noise? Recall the definition of white noise. Recall what function do you use to check if a series is white noise or not. Use the necessary graph that shows if a series is white noise? Comment based on the graph.

# E2.Answer:
dgoog%>%
  autoplot(diff)

 #The difference series may resemble white noise if the ACF plot reveals that autocorrelation values are nearly nil for all lags and that there are no notable peaks.
#On the other hand, the presence of strong autocorrelation peaks at specific lags implies that the difference series still exhibits non-white noise behavior, implying the presence of patterns or dependencies.
#The difference series' ACF plot can be visually examined to see whether any underlying patterns are still visible or if it is just showing signs of white noise.