Homework 1

All libraries needed for the Homework

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## -- Attaching packages -------------------------------------------- fpp3 1.0.0 --

## v tibble      3.2.1     v tsibble     1.1.5
## v dplyr       1.1.2     v tsibbledata 0.4.1
## v tidyr       1.3.0     v feasts      0.3.2
## v lubridate   1.9.2     v fable       0.3.4
## v ggplot2     3.5.1     v fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## -- Conflicts ------------------------------------------------- fpp3_conflicts --
## x lubridate::date()    masks base::date()
## x dplyr::filter()      masks stats::filter()
## x tsibble::intersect() masks base::intersect()
## x tsibble::interval()  masks lubridate::interval()
## x dplyr::lag()         masks stats::lag()
## x tsibble::setdiff()   masks base::setdiff()
## x tsibble::union()     masks base::union()

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(tidyverse)

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v forcats 1.0.0     v readr   2.1.4
## v purrr   1.0.1     v stringr 1.5.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter()     masks stats::filter()
## x tsibble::interval() masks lubridate::interval()
## x dplyr::lag()        masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(lubridate)
library(tsibble)

2.1 - Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec.

a). Use ? (or help()) to find out about the data in each series. What is the time interval of each series?

#aus_production time series
help("aus_production")

## starting httpd help server ... done

#pelt time series
help("pelt")

#gafa_stock time series
help("gafa_stock")

#vic_elec time series
help("vic_elec")

aus_production - The time interval is quarterly.

pelt - The time interval is yearly.

gafa_stock - The time interval is every business day (Mon - Fri) each day from when the market opens to when it closes.

vic_elec - The time interval is every 30 minutes.

b). Use autoplot() to produce a time plot of each series. For the last plot, modify the axis labels and title.

#aus_production time plot
aus_production %>% 
  autoplot(Bricks)

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

#pelt time plot
pelt %>% 
  autoplot(Lynx)

#gafa_stock time plot
gafa_stock %>% 
  autoplot(Close)

#vic_elect time plot
vic_elec %>% 
  autoplot(Demand)

#vic_elect time plot with axis labels and title modification
vic_elec %>% 
  autoplot(Demand) +
  labs(x = "Date", y = "Demand") +
  ggtitle("Electricity Demand Every Half Hour")

2.2 - Use filter() to find what days corresponded to the peak closing price for each of the four stocks in gafa_stock.

gafa_stock %>% group_by(Symbol) %>%
  filter(Close==max(Close)) %>%
  select(Symbol,Date, Close)

From the results above, we can see that Amazon had the highest peak price at closing of about 20240. It was followed by Google whose peak closing price was just over 1268. Finally Apple and Facebook had the smallest peak price of the group of about 232 and 218 respectively. What is interesting is that the highest peaks for all these tech titans came in the same year and within the same or a few months of one another. There must have been some event(s) to have caused this.

2.3 - Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation

# Downloaded from the book website and uploaded to my Github. Reading in the data and print the first 20 rows from my Github
tute1 <- read.csv("https://raw.githubusercontent.com/Data-Vlad/Data-Science/main/Data%20624%20-%20Predictive%20Analytics/Homework%201/tute1.csv")
head(tute1,20)

#converting the data to time series 
mytimeseries <- tute1 |>
  mutate(Quarter = yearquarter(Quarter)) |>
  as_tsibble(index = Quarter)

#constructing time series plots of each of the three series (with facet grid)
mytimeseries |>
  pivot_longer(-Quarter) |>
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y")+
   ggtitle("Contains facet_grid")

#constructing time series plots of each of the three series (without facet grid)
mytimeseries %>%
  pivot_longer(-Quarter) %>%
  ggplot(aes(x = Quarter, y = value, colour = name)) +
  geom_line() +
  ggtitle("No facet_grid")

The difference between including a facet_grid and not including one is that including a facet_grid has horizontal scales which makes it look like a swim lane diagram clearly showing how one variable compares to another —- everything is perfectly aligned and scaled to detail.Not including the facet_grid takes away that element of percison and simply treats each variable as an individual graph.

2.4 - The USgas package contains data on the demand for natural gas in the US. a).Install the USgas package. b).Create a tsibble from us_total with year as the index and state as the key. c).Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).

# a).installing the Usgas
library(USgas)

## Warning: package 'USgas' was built under R version 4.3.3

# b). creating a tsibble from us_total with year as the index and key as the state.
us_total <- us_total %>%
  as_tibble(key = state,index = year)

#c). plotting the annual natural gas consumption by state for the New England Area.
us_total %>%
  filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) %>%
  ggplot(aes(x = year, y = y, colour = state)) +
  geom_line() +
  facet_grid(state ~., scales = "free_y") +
  labs(title = "Annual Natural Gas Consumption in  New England states ",y = "Consumption")

Looking at the charts, we can see both upward and downward trends. The states which have an upward trend include: Connecticut,Massachusetts as well as Vermont. The states which have a downward trend include: Maine, New Hampshire and Rhode Island.

2.5 - a).Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). b).Create a tsibble which is identical to the tourism tsibble from the tsibble package. c).Find what combination of Region and Purpose had the maximum number of overnight trips on average. d).Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.

# a).downloaded the tourism.xlsx from book website,saved it as csv, uploaded to my Github and read it from my Github.
tourism_df <-  read.csv('https://raw.githubusercontent.com/Data-Vlad/Data-Science/main/Data%20624%20-%20Predictive%20Analytics/Homework%201/tourism.csv')


#b).created a tsibble which is identical to the tourism tsibble from the tsibble package
tourism_tsibble <- tourism_df %>%
  mutate(Quarter = yearquarter(Quarter)) %>%
  as_tsibble(key = c(Region, State, Purpose),index = Quarter)

#c). identified the combination of Region and Purpose which had the maximum number of overnight trips on average
tourism_tsibble %>%
  group_by(Region, Purpose) %>%
  mutate(average_trips = mean(Trips)) %>%
  ungroup() %>%
  filter(average_trips == max(average_trips)) %>%
  distinct(Region, Purpose)

#d).created a new tsibble which combines Purposes and Regions and has only total trips by state
tourism_tsibble %>%
  group_by(State) %>%
  summarise(Trips = sum(Trips), .groups = "drop") %>%
  as_tsibble(key = State, index = Quarter)

We see the combination of region and purpose which had the maximum number of overnight trips is Sydney-Visiting.

2.8 - Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF() and explore features from the following time series: “Total Private” Employed from us_employment, Bricks from aus_production, Hare from pelt, “H02” Cost from PBS, and Barrels from us_gasoline.

Exploring features of Employed:

#using autoplot()
us_employment  %>% 
  filter(Title == "Total Private") %>% 
  autoplot(Employed) + 
  ggtitle("Employment Autoplot")

#using gg_season()
us_employment  %>%filter(Title == "Total Private") %>% gg_season(Employed) +
  ggtitle("Seasonal Employment")

#using gg_subseries()
us_employment  %>% 
  filter(Title == "Total Private") %>% 
  gg_subseries(Employed) +
  ggtitle("Employment Subseries")

#using gg_lag()
us_employment %>% 
  filter(Title == "Total Private") %>% 
  gg_lag(Employed) +
  ggtitle("Employment Lag")

#using ACF()
us_employment  %>%
  filter(Title == "Total Private") %>%
  ACF(Employed) %>%
  autoplot() +
  ggtitle("Employment Autocorrelation")

1).Can you spot any seasonality, cyclicity and trend?

1).Based on the Autoplot we can see a general upward trend from 1940 to 2020. In terms of seasonality, the summer months seem to have the highest employment.

2).What do you learn about the series?

2).What I learned about the series is that for every month of the years there is a similar increase from 25000 to about 110000 and then a small decrease, followed by an increase to about 125000.

3).What can you say about the seasonal patterns?

3).One thing I can say about “seasonal patterns” is that they are relatively similar for every month of each of the years. Also, certain months like the summer months(June-August) and December there seems to be a trend upward.

4).Can you identify any unusual trend?

4).I would say that it is interesting how,according the “Employment Lag” sub-plots, lag 1 to 3, lag 4 to 6 and lag 7 to 9 all show identical upward trends.

Exploring features of Bricks:

#using autoplot()
aus_production %>% 
  autoplot(Bricks) +
  ggtitle("Bricks Autoplot")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

#using gg_season()
aus_production  %>% gg_season(Bricks) +
  ggtitle("Seasonal Bricks")

## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_line()`).

#using gg_subseries()
aus_production %>% 
  gg_subseries(Bricks) +
  ggtitle("Bricks Subseries")

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

#using gg_lag()
aus_production %>% 
  gg_lag(Bricks) +
  ggtitle("Bricks Lag")

## Warning: Removed 20 rows containing missing values (gg_lag).

#using ACF()
aus_production  %>%
  ACF(Bricks) %>%
  autoplot() +
  ggtitle("Bricks Autocorrelation")

1).Can you spot any seasonality, cyclicity and trend?

1).Based on the “Bricks Autoplot” there is no obvious trend. In terms of seasonality,the cyclicity is more obvious. From Q1 to Q3 we see an upward trend and from Q3 to Q4 a downward trend for all the years.

2).What do you learn about the series?

2). One thing I learned from the series is it peaks in about 1981 and has its lowest point at about 1955.

3).What can you say about the seasonal patterns?

3). I can say that seasonal patterns are in fact cyclical. Without exception, from Q1 to Q3 we see a rise in Bricks followed by a fall in Bricks from Q3 to Q4.

4).Can you identify any unusual trend?

4).One unusual trend I see is in the “Bricks Lag” subplots - lags 2 to 6 a V-shape is formed in Quarter 1.

Exploring features of Pelt:

#using autoplot()
pelt %>% 
  autoplot(Hare) +
  ggtitle("Hare Autoplot")

#using gg_subseries()
pelt %>% 
  gg_subseries(Hare) +
  ggtitle("Hare Subseries")

#using gg_lag()
pelt%>% 
  gg_lag(Hare) +
  ggtitle("Hare Lag")

#using ACF()
pelt  %>%
  ACF(Hare) %>%
  autoplot() +
  ggtitle("Hare Autocorrelation")

1).Can you spot any seasonality, cyclicity and trend?

1). I am not able to spot any particular trend. However, I am able to see some seasonality and cyclicity. This is evident by the sort of sinusoidal curve.

2).What do you learn about the series?

2). I learned that the series show a sinusoidal pattern.

3). What can you say about the seasonal patterns?

3). There are sharp fluctuations of pelts traded going upward then downward in a sinusoidal manner.

Exploring features of PBS:

#using autoplot()
PBS %>% 
  filter(ATC2 == "G02")  %>% 
  autoplot(Cost) +
  ggtitle("Cost Autoplot")

1).Can you spot any seasonality, cyclicity and trend?

1). The plots were not executing for a very long time. I only managed to execute the autoplot. I did this only for “G02” for that reason.I noticed that for both: “General Co-payments” and “Concessional Co-payments” there is a sort of pattern of an upward-downward-upward- downward and so fourth trend. For”Concessional Safety” and “General Safety” we see a upward trend followed by a peak followed by a downward trend and that pattern continues.

Exploring features of Gasoline:

#using autoplot()
us_gasoline %>% 
  autoplot(Barrels) +
  ggtitle("Barrels Autoplot")

#using gg_season()
us_gasoline  %>% gg_season(Barrels) +
  ggtitle("Seasonal Barrels")

#using gg_subseries()
us_gasoline %>% 
  gg_subseries(Barrels) +
  ggtitle("Barrels Subseries")

#using gg_lag()
us_gasoline%>% 
  gg_lag(Barrels) +
  ggtitle("Barrels lag")

#using ACF()
us_gasoline  %>%
  ACF(Barrels) %>%
  autoplot() +
  ggtitle("Barrels Autocorrelation")

1).Can you spot any seasonality, cyclicity and trend?

1).In terms of trend, there seems to be generally upward trend.I would say that there is very little seasonality and cyclicity.

2).What do you learn about the series?

2). The series has an upward trend for the most part.

3). What can you say about the seasonal patterns?

3). I can see a pattern of decline in the months of February,April and October. We see an upward pattern in June.

4).Can you identify any unusual trend?

4). I found it interesting that all barrel lags (1-9) had an overall similiar shape.

Homework 1

Vladimir Nimchenko

2024-09-05